Measurement Issues: Observational Bias and the Streetlight Effect
An old story tells of a drunk who lost his keys at night in the street and only searched under a lamp for his keys but could not find them. When asked why he only looked under the lamp he replied that is where the light is.
This describes the streetlight effect: a form of observational bias (flaw in measuring data that results in false conclusions or different quality and accuracy of data between comparison groups) where folks only look for whatever they are searching by looking where it is easiest or accessible. While the old saw that you cannot improve and manage what you cannot measure is true - what you decide to measure and not measure matters a great deal for understanding complex static, situational and fluid reality.
Data scientists must always ask if they are defining and measuring the right thing(s) to obtain understanding? Was only the easy and accessible measured because the important could not be measured? What was not measured? If not measured because of insufficient data, could you obtain the right data in the future to measure?
Today we have more and more data to measure things and organizations are increasingly dependent on data - both internal and external - to optimally operate to compete and win. It is also about taking a process or activity that was previously invisible and turning it into data. That data can then be measured, tracked and monitored to optimize processes, make better decisions and innovate.
Good quality data, scientific methods and the right tools can help us find valuable, actionable insights in data. Collecting the right smart data from multiple sources - forming data science teams to combine, slice and dice this data to create valuable, actionable insights from the data - is the key to developing durable competitive advantage.
But we have a problem. Raw data sets - both large and small - are not objective - they are selected, collected, filtered, structured and analyzed by human design. What was measured, in what manner, with what devices and to what purpose? What was not measured and why? Was only low-hanging fruit measured because the important could not be measured? What was the quality of the data?
Humans then interpret meaning from data in different ways. Experts can be shown the same sets of data and reasonably come to different conclusions. Naked and hidden biases in selecting, collecting, structuring and analyzing data present serious risks. How we decide to slice and dice data and what elements to emphasize or ignore influences the types and quality of measurements.
The scientific method requires observations and measurements to formulate and test hypotheses consisting of seven (7) basic steps:
- Asking a question about a phenomenon
- Making observations of the phenomenon
- Hypothesizing an explanation for the phenomenon
- Predicting a logical consequence of the hypothesis
- Testing the hypothesis by an experiment
- Creating a conclusion from the experiment
- Replicating the experiment to verify results
Observations and measurements play a key role in the second and fifth steps. Yet human interpretations of observations and measurements are subjective and usually flawed, especially considering real or subconscious biases including confirmation bias (tendency to favor data that confirms beliefs or hypotheses). Different data scientists can observe the same facts and evidence yet arrive at very different interpretations and conclusions.
Moreover, different groups of data scientists each have their own real or subconscious biases and are susceptible to group think. As a result, it is prudent for data science teams to have both internal and external checks and balances to expose potential biases and better understand objective reality.
The danger for professional data science practitioners is providing clients and employers with flawed data science results leading to bad business and policy decisions. Data scientists must create robust check and balance processes to guard against observational and measurement flaws. This means using scientific methods to measure the important - not just the easy and accessible - and weigh all the evidence fairly.
The Data Science Code of Professional Conduct of the Data Science Association provides ethical guidelines to help the data science practitioner. Here are a few of the issues to be aware of:
Data selection bias: skewing selection of data sources to most available, convenient and cost-effective, in contrast to being most valid and relevant for specific study. Data scientists have budget, data source and time limits - and thus may introduce unconscious bias in data sets able to select and those excluded.
Cognitive bias: skewing decisions based on pre-existing cognitive and heuristic factors (e.g., intuition) rather than on data and evidence. Biases in judgment or decision-making can also result from motivation, such as when beliefs are distorted by wishful thinking. Some biases have a variety of cognitive ("cold") or motivational ("hot") explanations.
Confirmation bias: tendency to favor data that confirms beliefs or hypotheses.
Omitted-variable bias: appears in estimates of parameters in a regression analysis when the assumed specification is incorrect, in that it omits an independent variable that should be in the model.
Sampling bias: systematic error due to a non-random sample of a population, causing some members of the population to be less likely to be included than others, resulting in a biased sample - skewing the sampling of data sets toward subgroups of the population most relevant to the initial scope of data science project, thereby making it unlikely that you will uncover any meaningful correlations that may apply to other segments.
Data dredging bias: using regression techniques that may find correlations in small or some samples - but that may not be statistically significant in the wider population.
Modeling bias: skewing models by starting with a biased set of project assumptions that drive selection of the wrong variables, the wrong data, the wrong algorithms and the wrong metrics of fitness - including overfitting of models to past data without regard for predictive lift and failure to score and iterate models in a timely fashion with fresh observational data.
Reporting bias: skewing availability of data, such that observations of a certain kind may be more likely to be reported and consequently used in future.
Observation selection bias: data is filtered not only by study design and measurement, but by the necessary precondition that there has to be someone doing a study. In situations where the existence of the observer or the study is correlated with the data observation selection effects occur, and anthropic reasoning is required.
Data scientists must have clarity about what we are attempting to define, measure and understand. Without observing and measuring the right things to gain real understanding, flawed data science results may cause decision makers to make wrong or sub-optimal decisions.