Data entry
Self-coding or pre-coded questionnaires are preferable. Data is input as text, multiple choice, numeric,
date and time, and yes/no responses. In double entry techniques, 2 data entry clerks enter the same data and a check is made
by computer on items on which they differ. Data in the computer can be checked manually against the original questionnaire.
Interactive data entry enables detection and correction of logical and entry errors immediately.
Data editing
Data editing is the process of correcting data collection and data entry errors. The data is 'cleaned'
using logical, statistical, range, and consistency checks. All values are at the same level of precision (number of decimal
places) to make computations consistent and decrease rounding off errors. The kappa statistic is used to measure inter-rater
agreement. Data editing identifies and corrects errors
such as invalid or inconsistent values.
Data validation
Data is validated and its consistency is
tested. The main data problems are missing data, coding
and entry errors, inconsistencies, irregular patterns, digit preference, out-liers, rounding-off / significant figures, questions
with multiple valid responses, and record duplication.
Data transformation
Data transformation is the process
of creating new derived variables preliminary to analysis and includes mathematical operations such as division, multiplication,
addition, or subtraction; mathematical transformations such as logarithmic, trigonometric, power, and z-transformations.
Preliminary data analysis
Data analysis consists of data summarization,
estimation and interpretation. Simple manual inspection of the data is needed before
statistical procedures. Preliminary examination consists
of looking at tables and graphics. Descriptive statistics are used to detect errors, ascertain the normality of the data,
and know the size of cells. Missing values may be imputed or incomplete observations may be eliminated.
Tests for association and effect
Tests for association, effect, or trend involve construction and testing of hypotheses. The tests for
association are the t, chi-square, linear correlation, and logistic regression tests or coefficients. The common effect measures
Odds Ratio, Risk Ratio, Rate difference. Measures of trend can discover relationships that are not picked up by association
and effect measures. The probability, likelihood, and regression models are used in analysis.
Analytic procedures and computer programs vary for continuous and discrete data, for person-time and
count data, for simple and stratified analysis, for univariate, bivariate and multivariate analysis, and for polychotomous
outcome variables. Procedures are different for large samples and small samples.