My research is currently supported by the
Alana Down Syndrome Center at MIT,
supporting our aims of using contextualized machine learning to understand complex diseases such as Down Syndrome.
My current research foci include:
-
Context-Adaptive Systems (Meta- and Contextualized Learning): Can we build AI agents which adapt to different contexts?
Selected Publications
NOTMAD: Estimating Bayesian Networks with Sample-Specific Structures and Parameters
Benjamin Lengerich, Caleb Ellington, Bryon Aragam, Eric P. Xing, Manolis Kellis
Abstract
Pre-print
Context-specific Bayesian networks (i.e. directed acyclic graphs, DAGs) identify context-dependent relationships between variables, but the non-convexity induced by the acyclicity requirement makes it difficult to share information between context-specific estimators (e.g. with graph generator functions). For this reason, existing methods for inferring context-specific Bayesian networks have favored breaking datasets into subsamples, limiting statistical power and resolution, and preventing the use of multidimensional and latent contexts. To overcome this challenge, we propose NOTEARS-optimized Mixtures of Archetypal DAGs (NOTMAD). NOTMAD models context-specific Bayesian networks as the output of a function which learns to mix archetypal networks according to sample context. The archetypal networks are estimated jointly with the context-specific networks and do not require any prior knowledge. We encode the acyclicity constraint as a smooth regularization loss which is back-propagated to the mixing function; in this way, NOTMAD shares information between context-specific acyclic graphs, enabling the estimation of Bayesian network structures and parameters at even single-sample resolution. We demonstrate the utility of NOTMAD and sample-specific network inference through analysis and experiments, including patient-specific gene expression networks which correspond to morphological variation in cancer.
Arxiv 2021
Code: Contextualized.ML Python package 
Talk: ContexutalizedML for Disease Subtyping (presented at BIRS). Video/slides available.
-
Prior Knowledge as Context: Connecting Statistical Inference to Foundation Models
Selected Publications
LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs
Benjamin J. Lengerich, Sebastian Bordt, Harsha Nori, Mark E. Nunnally, Yin Aphinyanaphongs, Manolis Kellis, Rich Caruana
Abstract
Pre-print
Cite
We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package 𝚃𝚊𝚕𝚔𝚃𝚘𝙴𝙱𝙼 as an open-source LLM-GAM interface.
@article{lengerich2022llms,
author = {Lengerich, Benjamin J. and Bordt, Sebastian and Nori, Harsha and Nunnally, Mark E. and Aphinyanaphongs, Yin and Kellis, Manolis and Caruana, Rich},
title = {LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs}
year={2023},
}
Arxiv 2023
Code: TalkToEBM Python package 
-
Interpretable Representations of Complex and Nonlinear Systems: How can we build models that summarize complicated patterns in interpretable ways?
Selected Publications
Neural Additive Models: Interpretable Machine Learning with Neural Nets.
Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Geoffrey Hinton, Rich Caruana
Abstract
Pre-print
Paper
Cite
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature. These networks are trained jointly and can learn arbitrarily complex relationships between their input feature and the output. Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees. They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees. To demonstrate this, we show how NAMs can be used for multitask learning on synthetic data and on the COMPAS recidivism data due to their composability, and demonstrate that the differentiability of NAMs allows them to train more complex interpretable models for COVID-19. Source code is available at neural-additive-models.github.io.
@InProceedings{agarwal2022neural,
title={Neural Additive Models: Interpretable Machine Learning with Neural Nets},
author={Agarwal, Rishabh and Melnick, Levi and Frosst, Nicholas and Zhang, Xuezhou and Lengerich, Ben and Hinton, Geoffrey and Caruana, Rich},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2021},
url_Paper = {https://arxiv.org/pdf/2004.13912.pdf},
url_Website = {https://neural-additive-models.github.io/},
url_Slides = {https://neural-additive-models.github.io/assets/nam_slides.pdf},
url_Talk = {https://neurips.cc/virtual/2021/poster/28229},
url_ShortSummary = {https://youtu.be/vqKJbJ5c1NI},
url_TweetPrint = {https://twitter.com/nickfrosst/status/1255889440083447810?s=20},
url_Code = {https://github.com/google-research/google-research/tree/master/neural_additive_models},
abstract={Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature. These networks are trained jointly and can learn arbitrarily complex relationships between their input feature and the output. Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees. They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees. To demonstrate this, we show how NAMs can be used for multitask learning on synthetic data and on the COMPAS recidivism data due to their composability, and demonstrate that the differentiability of NAMs allows them to train more complex interpretable models for COVID-19. Source code is available at neural-additive-models.github.io.}
}
NeurIPS 2021
Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
Benjamin J. Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, Rich Caruana
Abstract
Pre-print
Paper
Cite
Recent methods for training generalized additive models (GAMs) with pairwise interactions achieve state-of-the-art accuracy on a variety of datasets. Adding interactions to GAMs, however, introduces an identifiability problem: effects can be freely moved between main effects and interaction effects without changing the model predictions. In some cases, this can lead to contradictory interpretations of the same underlying function. This is a critical problem because a central motivation of GAMs is model interpretability. In this paper, we use the Functional ANOVA decomposition to uniquely define interaction effects and thus produce identifiable additive models with purified interactions. To compute this decomposition, we present a fast, exact, mass-moving algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to several datasets and show large disparity, including contradictions, between the apparent and the purified effects.
@InProceedings{pmlr-v108-lengerich20a,
title = {Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models},
author = {Lengerich, Benjamin and Tan, Sarah and Chang, Chun-Hao and Hooker, Giles and Caruana, Rich},
pages = {2402--2412},
year = {2020},
editor = {Silvia Chiappa and Roberto Calandra},
volume = {108},
series = {Proceedings of Machine Learning Research},
address = {Online},
month = {26--28 Aug},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v108/lengerich20a/lengerich20a.pdf},
url = {http://proceedings.mlr.press/v108/lengerich20a.html},
abstract = {Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits “contradictory" models to represent the same function. To solve this problem, we propose pure interaction effects: variance in the outcome which cannot be represented by any subset of features. This definition has an equivalence with the Functional ANOVA decomposition. To compute this decomposition, we present a fast, exact algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to Generalized Additive Models with interactions trained on several datasets and show large disparity, including contradictions, between the apparent and the purified effects. These results underscore the need to specify data distributions and ensure identifiability before interpreting model parameters.}
}
}
AISTATS 2020
Code: Interpret.ML Python package.
-
Clinical Tools for Personalized Medicine: How can we deliver optimal care for every individual patient?
Selected Publications
Death by Round Numbers: Glass-Box Machine Learning Uncovers Biases in Medical Practice
Benjamin J. Lengerich, Rich Caruana, Mark E. Nunnally, Manolis Kellis
Abstract
Pre-print
Cite
Real-world evidence is confounded by treatments, so data-driven systems can learn to recapitulate biases that influenced treatment decisions. This confounding presents a challenge: uninterpretable black-box systems can put patients at risk by confusing treatment benefits with intrinsic risk, but also an opportunity: interpretable “glass-box” models can improve medical practice by highlighting unexpected patterns which suggest biases in medical practice. We propose a glass-box model that enables clinical experts to find unexpected changes in patient mortality risk. By applying this model to four datasets, we identify two characteristic types of biases: (1) discontinuities where sharp treatment thresholds produce step-function changes in risk near clinically-important round-number cutoffs, and (2) counter-causal paradoxes where aggressive treatment produces non-monotone risk curves that contradict underlying causal risk by lowering the risk of treated patients below that of healthier, but untreated, patients. While these effects are learned by all accurate models, they are only revealed by interpretable models. We show that because these effects are the result of clinical practice rather than statistical aberration, they are pervasive even in large, canonical datasets. Finally, we apply this method to uncover opportunities for improvements in clinical practice, including 8000 excess deaths per year in the US, where paradoxically, patients with moderately-elevated serum creatinine have higher mortality risk than patients with severely-elevated serum creatinine.
@article{lengerich2022death,
title={Death by Round Numbers: Glass-Box Machine Learning Uncovers Biases in Medical Practice},
author={Lengerich, Benjamin J and Caruana, Rich and Nunnally, Mark E and Kellis, Manolis},
journal={medRxiv},
pages={2022--04},
year={2022},
publisher={Cold Spring Harbor Laboratory Press}
}
Medrxiv 2022
Automated Interpretable Discovery of Heterogeneous Treatment Effectiveness: A COVID-19 Case Study
Benjamin J. Lengerich, Mark E. Nunnally, Yin Aphinyanaphongs, Caleb Ellington, Rich Caruana
Abstract
Pre-print
Paper
Cite
Testing multiple treatments for heterogeneous (varying) effectiveness with respect to many underlying risk factors requires many pairwise tests; we would like to instead automatically discover and visualize patient archetypes and predictors of treatment effectiveness using multitask machine learning. In this paper, we present a method to estimate these heterogeneous treatment effects with an interpretable hierarchical framework that uses additive models to visualize expected treatment benefits as a function of patient factors (identifying personalized treatment benefits) and concurrent treatments (identifying combinatorial treatment benefits). This method achieves state-of-the-art predictive power for COVID-19 in-hospital mortality and interpretable identification of heterogeneous treatment benefits. We first validate this method on the large public MIMIC-IV dataset of ICU patients to test recovery of heterogeneous treatment effects. Next we apply this method to a proprietary dataset of over 3000 patients hospitalized for COVID-19, and find evidence of heterogeneous treatment effectiveness predicted largely by indicators of inflammation and thrombosis risk: patients with few indicators of thrombosis risk benefit most from treatments against inflammation, while patients with few indicators of inflammation risk benefit most from treatments against thrombosis. This approach provides an automated methodology to discover heterogeneous and individualized effectiveness of treatments.
@article{lengerich2022automated,
title={Automated Interpretable Discovery of Heterogeneous Treatment Effectiveness: A Covid-19 Case Study},
author={Lengerich, Benjamin J. and Nunnally, Mark and Aphinyanaphongs, Yin and Caruana, Rich},
journal={Journal of Biomedical Informatics},
year={2022},
url_Preprint={https://www.medrxiv.org/content/10.1101/2021.10.30.21265430v1},
url_Paper = {https://www.sciencedirect.com/science/article/abs/pii/S1532046422001022},
abstract={Testing multiple treatments for heterogeneous (varying) effectiveness with respect to many underlying risk factors requires many pairwise tests; we would like to instead automatically discover and visualize patient archetypes and predictors of treatment effectiveness using multitask machine learning. In this paper, we present a method to estimate these heterogeneous treatment effects with an interpretable hierarchical framework that uses additive models to visualize expected treatment benefits as a function of patient factors (identifying personalized treatment benefits) and concurrent treatments (identifying combinatorial treatment benefits). This method achieves state-of-the-art predictive power for COVID-19 in-hospital mortality and interpretable identification of heterogeneous treatment benefits. We first validate this method on the large public MIMIC-IV dataset of ICU patients to test recovery of heterogeneous treatment effects. Next we apply this method to a proprietary dataset of over 3000 patients hospitalized for COVID-19, and find evidence of heterogeneous treatment effectiveness predicted largely by indicators of inflammation and thrombosis risk: patients with few indicators of thrombosis risk benefit most from treatments against inflammation, while patients with few indicators of inflammation risk benefit most from treatments against thrombosis. This approach provides an automated methodology to discover heterogeneous and individualized effectiveness of treatments.}
}
JBI 2022
More information about these directions and publications is available
here.