What do we mean when we say ‘better data’?

Statistics play an important role in improving human lives, not least through helping to identify potential policy needs and by measuring the impact of policy implementations. This was noted more than a month ago, on 20 October, when we celebrated World Statistics Day – which sought to promote the message that better data helps create better lives. But what do we mean by ‘better data’?

It doesn’t just mean ‘more data’ or ‘faster delivery’, though this is often what policy makers demand. From our perspective, as statisticians working for the Statistics Centre of Abu Dhabi, the statistical institute of the Emirate of Abu Dhabi, ‘better data’ means data that adequately measure what they purport to measure, that are free from processing error, and that appropriate estimation methods are used to generalise from the sample to the population.

Various quality frameworks exist for statistics. Eurostat (the statistical agency of the European Commission) proposes five dimensions of statistical output quality: relevance, accuracy, timeliness and punctuality, accessibility and clarity, and comparability and coherence.1

Accuracy is a key quality dimension that is probably of most concern to users and it has generated an extensive field of research. A popular framework classifying threats to accuracy is Total Survey Error (TSE). In their 2010 journal paper, Total Survey Error: Past, Present and Future, Robert M. Groves and Lars Lyberg discuss a TSE framework which groups errors arising from various sources and processes into two broad categories.

1. Measurement and processing error
TSE has developed within the context of questionnaire-based data collection and much evidence has accumulated to show that different ways of collecting ostensibly the same data can lead to different responses. Differences include question wording, question order and the context of other questions asked earlier in the survey instrument, as well as many other design choices.

Consequently, when a person responds to a question, the response will be influenced partly by their true position in relation to the question and partly by factors relating to measurement conditions. This is not necessarily problematic in instances where the effects of measurement conditions are random (e.g. mood) as they are likely to cancel out across individuals. However, if the effect of measurement systematically changes a respondent’s true position in response to the question asked then this leads to bias in the estimation.2 One example would be a leading question designed to elicit a specific response – but not all threats are so easy to spot.

These challenges of questionnaire design are well known and much work has gone into developing best practice. Consequently, it is usual for questions to be harmonised across different survey vehicles within a statistical organisation, and sometimes across different data collection agencies both nationally and internationally. Not only does this practice help to protect against measurement error, it promotes comparability by providing consistent definitions for key topic outcomes.

Statistical offices often aim to keep an unchanged design for ongoing surveys to help mitigate any effect of systematic error.3 However, methodological change is sometimes desirable, particularly when it is required to reflect real-world changes to the phenomena under study. For example, the so-called ‘basket of goods’ used in consumer price indices needs to be updated regularly to reflect new products and services.

Meanwhile, surveys that started as paper and pencil interviews (PAPI) have gradually evolved to become computer assisted personal interviews (CAPI), computer assisted telephone interviews (CATI) and computer assisted web interviews (CAWI). The use of computer-based instruments allows for better control over some potential error sources, such a question routing, validation checks and classification, which can be done automatically. However, CAPI and CATI approaches remain fairly common in social surveys and some statistical institutes still use PAPI for business establishment surveys whilst others use CAWI.

2. Sampling and representation error
Groves and Lyburg, in their 2010 paper, identify three components of sampling and representation error: (i) coverage error, where the sampling frame does not cover the target population; (ii) sampling error; and (iii) non-response error, where respondents and non-respondents systematically differ in their survey outcome scores.

With appropriate design, sampling error should be random, but coverage and non-response errors may lead to systematic errors that can be difficult to detect. Key population sub-groups may be missing entirely or proportionately thanks to under-coverage on the sampling frame, or through differential rates of non-response.

In such cases, bias is likely to enter the estimation procedure and the published results will not accurately reflect the underlying true population value. For these reasons, statisticians and survey organisations typically use appropriate sampling frames, random sampling techniques, and often adjust their estimation procedures to try to compensate for selection effects.4

Balancing quality dimensions
We have emphasised accuracy in this discussion at the expense of other quality dimensions. This is primarily because a statistic high on other quality dimensions still loses credibility if it does not measure what it intends to measure.

That is not to say that accuracy is the only important quality dimension. Some users may, for example, prefer a less accurate statistic delivered earlier to one that is more accurate but delivered later. This balancing of quality dimensions can be called ‘fitness for use’, but users need to be aware of what they are sacrificing.

Therefore an important role for the statistician is to raise awareness not only of the ways in which better data lead to better lives, but to help people appreciate what ‘better data’ truly means.

 


Footnotes

  1. Processing quality indicators are also proposed but space limitations prevent their discussion here. However, any quality assessment should take these into account.
  2. A popular mathematical model is the mean square error, which sums the squared bias and the variance of the estimator. We can reduce the variance through appropriate sample design and increasing the sample size, resources permitting.
  3. If systematic error is constant it cancels through subtraction from itself when taking the difference between two time points.
  4. Calibration is a technique that adjusts the survey population expansion weights in such a way that they sum to known population totals. This procedure reduces any imbalances in the survey proportions in the calibration groups relative to their population distribution. Typically, this reduces the variance of the estimator and improves consistency of estimation (for the calibration model variables) across different survey sources referring to the same time period. Calibration, however, cannot adjust for selection effects that occur independently from the variables included in the calibration model and is not a ‘magic wand’.