Learning Optimal Dynamic Treatment Strategies from Temporal Monitoring Health Data


Contact: Dr. Li-wei Lehman

Project Synopsis

A fundamental challenge of treatment decision making in a medical setting is that treatment strategies are complex and dynamic. They involve sequential decisions at many timepoints, each based on evolving patient history. We refer to such treatment strategies as Dynamic Treatment Regimes (or DTRs). Standard reinforcement learning (RL) approaches to estimating optimal DTRs from observational data, such as Q-learning or A-learning, face challenges in that the learned DTRs have high-variance and the model estimates can be biased and un-interpretable.   We are working along multiple threads, all aimed at understanding, mitigating and solving these problems.

Building Interpretable Probabilistic Models for Informed Treatment Decision Making:  In our AAAI 2022 paper “Knowledge Distillation via Constrained Variational Inference,” we leverage deep learning to build more powerful probabilistic models that can simultaneously identify interpretable latent structure in medical data and accurately predict patient outcomes.  In our NeurIPS 2022 workshop paper “Treatment-RSPN: Recurrent Sum-Product Networks for Sequential Treatment Regimes,we introduce a probabilistic deep generative approach, Treatment-RSPN, that leverages Recurrent Sum Product Networks (RSPNs) for joint modelling of treatment decision-making and patient treatment response. As part of our framework, we develop a method for transforming conventional probabilistic graphical models (PGMs), such as dynamic Bayesian networks (DBNs), into RSPNs, allowing us to bootstrap our models with a structure informed by domain knowledge and the specific task.


Outcome Prediction under Dynamic and Time-Varying Treatment Regimes: We are developing sequential deep learning approaches to estimate expected patient trajectories and treatment outcomes under dynamic and time-varying treatment strategies.    Please read our paper "G-Net: A Recurrent-Network Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes" for an initial approach that we have developed in this framework. G-Net is based on G-computation, a causal inference method that can be used to estimate the average effect of a DTR on the population, or the conditional effect given observed patient history.  We are extending this framework using probabilistic dynamic models to make personalized counterfactual predictions of patient outcomes.

Learning to Treat COVID-19 Patients from Observational ICU Data:  We are developing an AI tool based on causal inference methods to facilitate mechanical ventilation decision making for COVID-19 patients with acute respiratory distress syndrome (ARDS) in the ICUs. Our approach consists of a sequential modeling framework that would allow clinicians to explore various “what-if” scenarios to estimate both individual and population-level effects of alternative mechanical ventilation strategies for ARDS patients.   

Dynamic Marginal Structural Models (DynMSMs) to Estimate Time-Varying Treatment Effects: In this line of work, we collaborate closely with clinicians to define clinical questions pertaining to simplified classes of treatment strategies. We do not attempt to estimate the overall optimal DTR across all possible functions of patient history. Instead, we estimate the optimal DTR within a restricted class (or in one case just estimate the effects of a few separate treatment strategies). In our paper Fluid-limiting treatment strategies among sepsis patients in the ICU: a retrospective causal analysis, we use DynMSM to estimate the effect of different fluid resuscitation strategies on sepsis patient outcomes.

Safe and Robust Machine Learning Models for Health: One important aspect of responsible AI is to evaluate the safety and robustness of ML models in the context of clinical treatment decision making to avoid unintended harm.  We evaluated the safety and robustness of machine learning models for clinical treatment decision making in our AMIA 2020 Annual Symposium paper "Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients."  Our work systematically explored the sensitivity of a deep reinforcement learning (DRL) technique for sepsis treatment, and uncovered several important areas of caution in adopting DRL in a healthcare setting.


·       Our paper G-Net was featured on the MIT News! Read this MIT news article about G-Net, a new deep learning approach developed by our team to simulate treatment outcomes under time-varying and dynamic treatment strategies.

·       Congratulations to our team for winning the Distinguished Paper Award at the 2020 American Medical Informatics Association (AMIA)’s annual symposium!  In our AMIA 2020 Annual Symposium paper "Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients," we evaluated the safety and robustness of deep reinforcement learning (DRL) models for clinical treatment decision making.  Our work systematically explored the sensitivity of a DRL technique for sepsis treatment, and uncovered several important areas of caution in adopting DRL in a healthcare setting.


We are seeking talented students, interns, and collaborators to join our team. For more details, please contact Li-wei Lehman.


Our team consists of an interdisciplinary group of researchers from MIT and MIT-IBM Watson AI Lab, bringing together expertise from machine learning, causal inference, physiological modeling, and medical informatics to solve challenging problems in dynamic treatment regimes for clinical medicine.

MIT Principal Investigator (PI): Dr. Li-wei Lehman (Ph.D.)

MIT Co-PI: Professor Roger Mark (M.D., Ph.D.)

MIT Students: Adam Dejl, Amy Hu, Jenny Moralejo

MIT Alumni: Rui Li (MEng Student), Dr. Jun Li (Post-doc), Fengyi Andy Tang (Research Intern), Kechun Huang (Research Fellow), MingYu Lu (Research Fellow), Stephanie Hu (MEng), Yuria Utsumi (UROP), Michelle Yin (UROP), Nicholas Baginski (UROP), Ardavan Saeedi (Ph.D.)

IBM Team: Dr. Zach Shahn (Ph.D., PI), Dr. Daby Sow (Ph.D., PI), Dr. Prithwish Chakraborty (Ph.D.), Dr. Mohamed Ghalwash (Ph.D.), Piyush Madan (MSc)

Clinical Collaborators (BIDMC):   Dr. Daniel Talmor, Dr. Nate Shapiro, Dr. Elias Kassis, Dr. Somnath Bose


This research is funded by MIT-IBM Watson AI Lab.


Representation Learning and Dynamics Modeling

·       Treatment-RSPN: Recurrent Sum-Product Networks for Sequential Treatment Regimes, Adam Dejl, Harsh Deep, Jonathan Fei, Ardavan Saeedi, Li-wei Lehman, ML4H Symposium and NeurIPS Time Series for Health Workshop, New Orleans, 2022. Poster.

·       Knowledge Distillation via Constrained Variational Inference, Ardavan Saeedi, Yuria Utsumi, Li Sun, Kayhan Batmanghelich, Li-wei H. Lehman, Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, February 2022.

·       Retaining Privileged Information for Multi-Task Learning, Fengyi Tang, Cao Xiao, Fei Wang, Jiayu Zhou, Li-wei H. Lehman, Proceedings of the 25th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), August 2019, Anchorage, Alaska USA. Research Track.

·       Robust Low-Rank Discovery of Data-Driven Partial Differential Equations, Jun Li, Gan Sun, Guoshuai Zhao, Li-wei Lehman, The Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020, New York, USA.

·       A Physiological Time Series Dynamics-Based Approach to Patient Monitoring and Outcome Prediction, Li-wei H. Lehman, Ryan P. Adams, Louis Mayaud, George B. Moody, Atul Malhotra, Roger G. Mark, Shamim Nemati, IEEE Journal of Biomedical and Health Informatics, 19(3):1068-1076, May 2015. doi:10.1109/JBHI.2014.2330827. Preprint.

·       Bayesian nonparametric learning of switching dynamics in cohort physiological time series: application in critical care patient monitoring, Li-wei H. Lehman, Matthew J. Johnson, Shamim Nemati, Ryan P. Adams, Roger G. Mark, Chapter in Advanced State Space Methods for Neural and Clinical Data, Cambridge University Press, 2015. Publisher's Version.

·       A Model-Based Machine Learning Approach to Probing Autonomic Regulation from Nonstationary Vital-Sign Time Series, Li-wei H. Lehman, Roger G. Mark, Shamim Nemati, IEEE Journal of Biomedical and Health Informatics, Vol. 22, No. 1, January 2018. doi:10.1109/JBHI.2016.2636808. Preprint

Dynamic Treatment Regimes and Causal Inference

·       G-Net: a Recurrent Network Approach to G-Computation for Counterfactual Prediction Under a Dynamic Treatment Regime, Rui Li, Stephanie Hu, Mingyu Lu, Yuria Utsumi, Prithwish Chakraborty, Daby M. Sow, Piyush Madan, Mohamed Ghalwash, Zach Shahn, Li-wei H. Lehman , Proceedings of Machine Learning for Health, PMLR 158:280-297, 2021. An earlier ArXiv version of this work is available at arXiv:2003.10551.

·       Fluid-limiting treatment strategies among sepsis patients in the ICU: a retrospective causal analysis, Zach Shahn, Nathan I. Shapiro, Patrick D. Tyler, Daniel Talmor and Li-wei H. Lehman, Journal of Critical Care, 24:62, 2020.

·       Titration of Ventilator Settings to Target Driving Pressure and Mechanical Power, Elias Baedorf Kassis, Stephanie Hu, MingYu Lu, Alistair Johnson, Somnath Bose, Maximilian S Schaefer, Daniel Talmor, Li-wei H Lehman, Zach Shahn, Respiratory Care, July 2022.

·       Delaying initiation of diuretics in critically ill patients with recent vasopressor use and high positive fluid balance, Zach Shahn, Li-wei H Lehman, Roger G. Mark, Daniel Talmor, Somnath Bose, British Journal of Anaesthesia (BJA), Vol. 127, Issue 4, 2021, pages 569-576.

·       Should Diuretic Initiation be Delayed in ICU Patients with Recent Vasopressor Use? A Causal Analysis, Somnath Bose, Li-wei H. Lehman, Kechun Huang, Daniel Talmor, Zach Shahn, Critical Care Medicine, Volume 48, Issue 1, p 733, January 2020.

·       Efficient estimation of optimal regimes under a no direct effect assumption, Lin Liu, Zach Shahn, James M. Robins, Andrea Rotnitzky, Journal of the American Statistical Association (JASA), 2020. ArXiv version https://arxiv.org/abs/1908.10448.

·       Estimating Optimal Dynamic Treatment Regimes Under Resource Constraints Using Dynamic Marginal Structural Models, Ellen Caniglia, Eleanor Murray, Miguel Hernan, and Zach Shahn, Statistics in Medicine, in revision. https://arxiv.org/abs/1903.06488

·       Latent Class Mixture Models of Treatment Effect Heterogeneity, Zach Shahn and David Madigan, Bayesian Analysis, 12(3), pp831-854, 2017.

Safe and Robust Reinforcement Learning for Health

·       Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Sepsis Treatment," MingYu Lu, Zachary Shahn, Daby Sow, Finale Doshi-Velez, Li-wei H. Lehman, arXiv:2005.04301, AMIA Annual Symposium 2020.

·       Evaluating Reinforcement Learning Algorithms in Observational Health Setting, Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, Finale Doshi-Velez, https://arxiv.org/abs/1805.12298v1, 2018.

·       Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning, Xuefeng Peng, Yi Ding, David Wihl, Omer Gottesman, Matthieu Komorowski, Li-wei H. Lehman, Andrew Ross, Aldo Faisal, Finale Doshi-Velez, Proceedings of the AMIA Annual Symposium, 2018.