Statistical Modelling in the Age of Data Science

DSA ADS Course, 2022

Causal Machine learning, Double Machine Learning, Targeted Learning, Statistical Analysis, Data Interpretation, Causal Analysis, Causality, Data Fusion, Missing Data, Counterfactuals

Discuss recent advances in machine learning and causal inference and the separation between the data-fitting and data-interpretation components of statistical modeling.

Discuss causal machine learning, double machine learning and targeted learning.

Statistical Modelling in the Age of Data Science - July, 2021

Abstract

Twenty years after Leo Breiman’s wake-up call on the use of data models, I reconsider his concerns, which were heavily influenced by problems in prediction and classification, in light of the much vaster class of problems of estimating effects and (conditional) associations. Viewed from this perspective, one realises that the statistical community’s commitment to the use of data models continues to be dominant and problematic, but that algorithmic modelling (machine learning) does not readily provide a satisfactory alternative, by virtue of being almost exclusively focused on prediction and classification. The only successful way forward is to bridge the two cultures. It requires machine learning skills from the algorithmic modelling culture in order to reduce model misspecification bias and to enable pre-specification of the statistical analysis. It moreover requires data modelling skills in order to choose and construct interpretable effect and association measures that target the scientific question; in order to identify those measures from observed data under the considered sampling design by relating to minimal and well-understood assumptions; and finally, in order to reduce regularisation bias and quantify uncertainty in the obtained estimates by relating to asymptotic theory.

Resource Type: