Real-World Evidence, Causal Inference, and Machine Learning
The current focus on real world evidence (RWE) is occurring at a time when at least two major trends are converging. First, is
the progress made in observational research design and methods over the past decade. Second, the development of
numerous large observational healthcare databases around the world is creating repositories of improved data assets to
support observational research.
Objective: This paper examines the implications of the improvements in observational methods and research design, as well
as the growing availability of real world data for the quality of RWE. These developments have been very positive. On the
other hand, unstructured data, such as medical notes, and the sparcity of data created by merging multiple data assets are not
easily handled by traditional health services research statistical methods. In response, machine learning methods are gaining
increased traction as potential tools for analyzing massive, complex datasets.
Conclusions: Machine learning methods have traditionally been used for classification and prediction, rather than causal
inference. The prediction capabilities of machine learning are valuable by themselves. However, using machine learning for
causal inference is still evolving. Machine learning can be used for hypothesis generation, followed by the application of
traditional causal methods. But relatively recent developments, such as targeted maximum likelihood methods, are directly
integrating machine learning with causal inference.