Untestable & Unreasonable Assumptions in Models is Data Science Malpractice

On 6 Dec, 2014 By Michael.Walker 0 Comments

We all know models are always an illusion of reality - yet may or may not be useful. Data scientists should build both simple and complex models with reasonable and testable assumptions that capture what is important and subject both assumptions and results to rigorous empirical examination.

Data scientists should always ask: are there other more reasonable assumptions that explain observations?

I often see beautiful models - with wonderful logic, statistics and mathematical equations as supporting evidence of an incredible conclusion or predictive technique. Sometimes predictive results appear too good to be true yet purportedly the result of an unbiased model. When I ask about assumptions built into the model I usually get vague and shifting responses and effort to change the subject. When I insist on reviewing specific assumptions in detail the trouble begins.

I suggest models should be judged by reasonable, dubious or untestable assumptions - not only predictive results (even a broken clock is right twice a day). Simplifying assumptions usually makes them unrealistic and disconnected from the real world.

I often get the feeling a model intentionally searched for certain assumptions to create a specific result. Yet searching for assumptions that produces a desired result is not acceptable data science practice. Bad assumptions have consequences: freedom to select any assumptions allows the creation of a model to support any result.

Models can be useful to help understand complex phenomena and increase prediction accuracy within certain margins of error. Yet predictive models usually only work under certain limited circumstances for a limited time (until they do not work anymore). One should always be skeptical of the usefulness of predictive models in high causal density environments (e.g., human behavior, climate, finance...etc.).

Data scientists should use models properly: to gain understanding of complex phenomena when no real alternatives are available. All models need to be subjected to rigorous empirical tests to avoid creating an illusion of reality that leads to data science malpractice and bad consequences.

You are here

Untestable & Unreasonable Assumptions in Models is Data Science Malpractice