I research machine learning methods and build artificial intelligence systems to automate the process of
scientific discovery. I am particularly interested in questions of automatically identifying and adapting
to changing contexts. This research focus requires advances in interpretable machine learning, multi-task
learning, and task representation learning, and finds natural applications in precision medicine and computational
genomics of complex diseases such as Alzheimer’s Disease, Down Syndrome, and cancer.
Context-Specific Inferences: What happens if we build models which can adapt to different contexts?
-
Code: Contextualized.ML Python package
-
NOTMAD: Estimating Bayesian Networks with Sample-Specific Structures and Parameters
Benjamin Lengerich, Caleb Ellington, Bryon Aragam, Eric P. Xing, Manolis Kellis
Abstract
Pre-print
Context-specific Bayesian networks (i.e. directed acyclic graphs, DAGs) identify context-dependent relationships between variables, but the non-convexity induced by the acyclicity requirement makes it difficult to share information between context-specific estimators (e.g. with graph generator functions). For this reason, existing methods for inferring context-specific Bayesian networks have favored breaking datasets into subsamples, limiting statistical power and resolution, and preventing the use of multidimensional and latent contexts. To overcome this challenge, we propose NOTEARS-optimized Mixtures of Archetypal DAGs (NOTMAD). NOTMAD models context-specific Bayesian networks as the output of a function which learns to mix archetypal networks according to sample context. The archetypal networks are estimated jointly with the context-specific networks and do not require any prior knowledge. We encode the acyclicity constraint as a smooth regularization loss which is back-propagated to the mixing function; in this way, NOTMAD shares information between context-specific acyclic graphs, enabling the estimation of Bayesian network structures and parameters at even single-sample resolution. We demonstrate the utility of NOTMAD and sample-specific network inference through analysis and experiments, including patient-specific gene expression networks which correspond to morphological variation in cancer.
Arxiv 2021
-
Discriminative Subtyping of Lung Cancers from Histopathology Images via Contextual Deep Learning
Benjamin J. Lengerich*, Maruan Al-Shedivat*, Amir Alavi, Jennifer Williams, Sami Labbaki, Eric P. Xing
Abstract
Pre-print
When designing individualized treatment protocols for cancer patients, clinicians must synthesize the information from multiple data modalities into a single parsimonious description of the patient's personal disease. However, such a description of a patient is never observed. In this work, we propose to model these patient descriptions as latent \emph{discriminative subtypes}---sample representations which can be learned from one data modality and used to contextualize predictions based on another data modality. We apply contextual deep learning to learn these sample-specific discriminative subtypes from lung cancer histopathology imagery. Based on these subtypes, we produce sample-specific transcriptomic models which accurately classify samples as adenocarcinoma, squamous cell carcinoma, or healthy tissue (F1 score of 0.97, outperforming previous state-of-the-art multimodal approaches). Combining these data modalities in a single pipeline not only improves the predictive accuracy, but also gives biological interpretations of the discriminative subtypes and ties the phenotypic patterns present in histopathology images to biological processes.
Medrxiv 2020
Interpretable AI: How can we build models that summarize patterns in ways that we can use to understand the underlying phenomena?
Neural Additive Models: Interpretable Machine Learning with Neural Nets.
Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Geoffrey Hinton, Rich Caruana
Abstract
Pre-print
Paper
Cite
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature. These networks are trained jointly and can learn arbitrarily complex relationships between their input feature and the output. Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees. They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees. To demonstrate this, we show how NAMs can be used for multitask learning on synthetic data and on the COMPAS recidivism data due to their composability, and demonstrate that the differentiability of NAMs allows them to train more complex interpretable models for COVID-19. Source code is available at neural-additive-models.github.io.
@InProceedings{agarwal2022neural,
title={Neural Additive Models: Interpretable Machine Learning with Neural Nets},
author={Agarwal, Rishabh and Melnick, Levi and Frosst, Nicholas and Zhang, Xuezhou and Lengerich, Ben and Hinton, Geoffrey and Caruana, Rich},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2021},
url_Paper = {https://arxiv.org/pdf/2004.13912.pdf},
url_Website = {https://neural-additive-models.github.io/},
url_Slides = {https://neural-additive-models.github.io/assets/nam_slides.pdf},
url_Talk = {https://neurips.cc/virtual/2021/poster/28229},
url_ShortSummary = {https://youtu.be/vqKJbJ5c1NI},
url_TweetPrint = {https://twitter.com/nickfrosst/status/1255889440083447810?s=20},
url_Code = {https://github.com/google-research/google-research/tree/master/neural_additive_models},
abstract={Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature. These networks are trained jointly and can learn arbitrarily complex relationships between their input feature and the output. Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees. They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees. To demonstrate this, we show how NAMs can be used for multitask learning on synthetic data and on the COMPAS recidivism data due to their composability, and demonstrate that the differentiability of NAMs allows them to train more complex interpretable models for COVID-19. Source code is available at neural-additive-models.github.io.}
}
NeurIPS 2021
Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
Benjamin J. Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, Rich Caruana
Abstract
Pre-print
Paper
Cite
Recent methods for training generalized additive models (GAMs) with pairwise interactions achieve state-of-the-art accuracy on a variety of datasets. Adding interactions to GAMs, however, introduces an identifiability problem: effects can be freely moved between main effects and interaction effects without changing the model predictions. In some cases, this can lead to contradictory interpretations of the same underlying function. This is a critical problem because a central motivation of GAMs is model interpretability. In this paper, we use the Functional ANOVA decomposition to uniquely define interaction effects and thus produce identifiable additive models with purified interactions. To compute this decomposition, we present a fast, exact, mass-moving algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to several datasets and show large disparity, including contradictions, between the apparent and the purified effects.
@InProceedings{pmlr-v108-lengerich20a,
title = {Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models},
author = {Lengerich, Benjamin and Tan, Sarah and Chang, Chun-Hao and Hooker, Giles and Caruana, Rich},
pages = {2402--2412},
year = {2020},
editor = {Silvia Chiappa and Roberto Calandra},
volume = {108},
series = {Proceedings of Machine Learning Research},
address = {Online},
month = {26--28 Aug},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v108/lengerich20a/lengerich20a.pdf},
url = {http://proceedings.mlr.press/v108/lengerich20a.html},
abstract = {Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits “contradictory" models to represent the same function. To solve this problem, we propose pure interaction effects: variance in the outcome which cannot be represented by any subset of features. This definition has an equivalence with the Functional ANOVA decomposition. To compute this decomposition, we present a fast, exact algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to Generalized Additive Models with interactions trained on several datasets and show large disparity, including contradictions, between the apparent and the purified effects. These results underscore the need to specify data distributions and ensure identifiability before interpreting model parameters.}
}
}
AISTATS 2020
Code: Interpret.ML Python package.
Personalized and Precision Medicine: How can we deliver optimal care for every individual patient?
-
Automated Interpretable Discovery of Heterogeneous Treatment Effectiveness: A COVID-19 Case Study
Benjamin J. Lengerich, Mark E. Nunnally, Yin Aphinyanaphongs, Caleb Ellington, Rich Caruana
Abstract
Pre-print
Paper
Cite
Testing multiple treatments for heterogeneous (varying) effectiveness with respect to many underlying risk factors requires many pairwise tests; we would like to instead automatically discover and visualize patient archetypes and predictors of treatment effectiveness using multitask machine learning. In this paper, we present a method to estimate these heterogeneous treatment effects with an interpretable hierarchical framework that uses additive models to visualize expected treatment benefits as a function of patient factors (identifying personalized treatment benefits) and concurrent treatments (identifying combinatorial treatment benefits). This method achieves state-of-the-art predictive power for COVID-19 in-hospital mortality and interpretable identification of heterogeneous treatment benefits. We first validate this method on the large public MIMIC-IV dataset of ICU patients to test recovery of heterogeneous treatment effects. Next we apply this method to a proprietary dataset of over 3000 patients hospitalized for COVID-19, and find evidence of heterogeneous treatment effectiveness predicted largely by indicators of inflammation and thrombosis risk: patients with few indicators of thrombosis risk benefit most from treatments against inflammation, while patients with few indicators of inflammation risk benefit most from treatments against thrombosis. This approach provides an automated methodology to discover heterogeneous and individualized effectiveness of treatments.
@article{lengerich2022automated,
title={Automated Interpretable Discovery of Heterogeneous Treatment Effectiveness: A Covid-19 Case Study},
author={Lengerich, Benjamin J. and Nunnally, Mark and Aphinyanaphongs, Yin and Caruana, Rich},
journal={Journal of Biomedical Informatics},
year={2022},
url_Preprint={https://www.medrxiv.org/content/10.1101/2021.10.30.21265430v1},
url_Paper = {https://www.sciencedirect.com/science/article/abs/pii/S1532046422001022},
abstract={Testing multiple treatments for heterogeneous (varying) effectiveness with respect to many underlying risk factors requires many pairwise tests; we would like to instead automatically discover and visualize patient archetypes and predictors of treatment effectiveness using multitask machine learning. In this paper, we present a method to estimate these heterogeneous treatment effects with an interpretable hierarchical framework that uses additive models to visualize expected treatment benefits as a function of patient factors (identifying personalized treatment benefits) and concurrent treatments (identifying combinatorial treatment benefits). This method achieves state-of-the-art predictive power for COVID-19 in-hospital mortality and interpretable identification of heterogeneous treatment benefits. We first validate this method on the large public MIMIC-IV dataset of ICU patients to test recovery of heterogeneous treatment effects. Next we apply this method to a proprietary dataset of over 3000 patients hospitalized for COVID-19, and find evidence of heterogeneous treatment effectiveness predicted largely by indicators of inflammation and thrombosis risk: patients with few indicators of thrombosis risk benefit most from treatments against inflammation, while patients with few indicators of inflammation risk benefit most from treatments against thrombosis. This approach provides an automated methodology to discover heterogeneous and individualized effectiveness of treatments.}
}
JBI 2022
Computational Genomics of Complex Diseases: What are the biological causes, implications, and therapeutics targets of complex diseases?