Causal Inference and Data Fusion in Econometrics
Learning about cause and effect is arguably the main goal in applied econometrics. In practice, the validity of these causal inferences is contingent on a number of critical assumptions regarding the type of data that has been collected and the substantive knowledge that is available. For instance, unobserved confounding factors threaten the internal validity of estimates, data availability is often limited to non-random, selection-biased samples, causal effects need to be learned from surrogate experiments with imperfect compliance, and causal knowledge has to be extrapolated across structurally heterogeneous populations. A powerful causal inference framework is required to tackle these challenges, which plague most data analysis to varying degrees. Building on the structural approach to causality introduced by Haavelmo (1943) and the graph-theoretic framework proposed by Pearl (1995), the artificial intelligence (AI) literature has developed a wide array of techniques for causal learning that allow to leverage information from various imperfect, heterogeneous, and biased data sources (Bareinboim and Pearl, 2016). In this paper, we discuss recent advances in this literature that have the potential to contribute to econometric methodology along three dimensions. First, they provide a unified and comprehensive framework for causal inference, in which the aforementioned problems can be addressed in full generality. Second, due to their origin in AI, they come together with sound, efficient, and complete algorithmic criteria for automatization of the corresponding identification task. And third, because of the nonparametric description of structural models that graph-theoretic approaches build on, they combine the strengths of both structural econometrics as well as the potential outcomes framework, and thus offer an effective middle ground between these two literature streams.