Bad Data: Don't be a Gullible Fool

Professional data scientists rank quality and veracity of data. Recently we have seen a significant rise in the amount of untruthful data and false data creation. During COVID19, we often see both untruthful and truthful data taken out of context and thus creating a misleading interpretation. Data scientists sometimes call this torturing the data to fit a narrative or theory.

One major issue with data science results is the truthfulness of data - also known as "data veracity". In the past few years we have seen a rapid rise in the amount of false data creation and misleading data presentation. Data veracity is defined as false or inaccurate data. The data may be intentionally, negligently or mistakenly falsified.

Data veracity may be distinguished from data quality, usually defined as reliability and application efficiency of data, and sometimes used to describe incomplete, uncertain or imprecise data.

The truthfulness or accuracy of data supersedes data quality issues: if data is objectively false then data science results are meaningless and unreliable and may create an illusion of reality causing bad or sub-optimal decisions and sometimes fraud with civil or criminal liability.