Ten Simple Rules for Initial Data Analysis

DSA ADS Course - 2022

Discuss simple rules for data analysis, traps and techniques to mitigate errors. Discuss quality and veracity of data/evidence.

Discuss counterfactuals, causality and applied probability theory.

Discuss scenario planning for applied data science using near real-time data to adjust tactics to achieve specific goals.

Discuss strategy vs. tactics and methods to set appropriate goals. Introduction to strategy.

Understand difference between research data science and applied data science.

February, 2022


Data is the new oil, and analyzing data has been described as the number 1 profession of the 21st century. But an appropriate analysis of data is also one of the most challenging tasks—a lot can go wrong at any step. Researchers should always remember, but often forget, that data are numbers with context. When properties and context are not appropriately taken into account, data can speak through lies and riddles as preconditions for meaningful statistics are not met, often leading to harm.

Initial data analysis (IDA) provides a framework for researchers to work with data responsibly. IDA has the following phases:

(1) metadata setup;

(2) data cleaning;

(3) data screening;

(4) initial data reporting;

(5) refining and updating the research analysis plan; and

(6) documenting and reporting IDA.

These phases are core activities for all researchers who analyze data for primary or secondary use, e.g., data analysis of designed experiments, observational studies, patient registries, biobanks, or biomedical databases. Indeed, IDA can be described as the first data analysis step—the step to check if the observed data correspond to expectations about these data. Typically, researchers do not perform IDA in a systematic way, if at all, or mix IDA activities with subsequent data analysis tasks such as hypothesis generation or exploration, formal analysis, and interpretation of conclusions. As a consequence, researchers have many “degrees of freedom”, or may miss the “gorilla in the room”.

However, disciplined and systematic IDA practice can provide researchers with the necessary context about data properties and structures to avoid pitfalls.


Resource Type: