Are You a Complete Data Scientist?

The Washington Post reported that scientists discovered skipping breakfast leads to weight loss after all, not to weight gain as previously believed. What lead scientists astray previously was relying on observational studies, a.k.a. Quasi-Experimental Design. Only a randomized trial, the "gold standard", can establish causality.

Designing and conducting a randomized trial is extremely expensive, and many data scientists have not had the privilege of being involved in one. But by relying on already-collected data, which incidentally often falls in the category of Big Data, data scientists can get led astray as easily as the scientists were about skipping breakfast.

It's possible though unlikely that a new randomized trial will result in Big Data. It's possible, especially with bioninformatics or large physical sensor arrays collecting data at high time frequency. But it's generally not the case, and this loose correlation of Big Data with pre-collected data (and of small data with a randomized trial) is where I came up with the trite headline.

A smug data scientist may brag about working on Big Data, but the complete data scientist is conducting experiments (and generating probably small data).