ORC 65th Anniversary

The ORC is very proud of our students and their research and we invite you to participate in the
two poster sessions being held during the 65th Anniversary celebration .

The two poster sessions will be held on:
Monday, November 26th from 1:15pm – 2:00pm (PDF of 11/26 Poster Session)
Tuesday, November 27th from 1:45pm – 2:30pm (PDF of 11/27 Poster Session)

Author	Title and Abstract
Lennart Baardman Advisor(s): Georgia Perakis	Learning Optimal Online Advertising Portfolios with Periodic Budgets The tremendous growth in online personalized data and ad channels enables publishers to auction off targeted advertising slots to advertisers. Everyday, advertisers bid on a portfolio of targets to maximize advertising revenue within their budget constraint. As the advertiser needs to learn the value of targets, we use an exploration-exploitation framework to model this online advertising portfolio optimization problem. We solve this problem using an optimistic-robust learning (ORL) algorithm based on UCB algorithms and robust optimization. Theoretically, we prove its expected regret is bounded, and computationally, we show its powerful performance on real-world data.
Hari Bandi Advisor(s): Dimitris Bertsimas, Rahul Mazumder	Statistical Hypothesis Testing via Robust Optimization In this work, we develop a novel framework for statistical hypothesis testing when given access to i.i.d samples that are valid under the corresponding hypotheses. In classical hypothesis testing, the uncertainty in the sample is modeled using probability distributions and then the rejection region for the null hypothesis or the p-value are computed based on the distribution of the chosen test statistic. In contrast, here we model the uncertainty in the sample using Wasserstein distance bounded uncertainty sets and design tests that maximally separate the two hypotheses using an affine combination of statistics by solving a sample robust optimization problem. Similar to classical hypothesis testing, our framework can compute the p-value for significance testing and confidence intervals for any parameters of interest. Furthermore, we also develop a novel bootstrap distributionally robust test that minimizes the maximum cumulative risk on all distributions that are close to the bootstrap distribution under the Wasserstein distance metric.
Max Biggs Advisor(s): Georgia Perakis	Optimizing over Objective Functions Determined from Random Forests We examine a data-driven optimization approach, where an uncertain objective function is estimated using a random forest. We explore how to make optimal decisions, as evaluated by the random forest, while decisions can be constrained by an arbitrary polyhedral set to model business constraints. We model this optimization problem as a Mixed Integer Linear Program (MILP) and show this formulation can be solved to optimality efficiently using Benders cuts. For large problem instances, we consider a random forest approximation that consists of only a subset of trees and establish that this gives rise to near-optimal solutions by proving analytical guarantees. This is joint work with Rim Hariss and Prof. Georgia Perakis.
Zachary Blanks Advisor(s): Rahul Mazumder	Label Reduction: A Novel Technique for Hierarchical Classification Hierarchical classification is strategy for multi-class classification which involves exploiting either a known or unknown label hierarchy to improve classifier performance as a consequence of label similarity. We focus on the case of an unknown label hierarchy and propose to find it by solving a mixed-integer linear optimization problem. Using the structure of the formulation we employ a heuristic local search which allows us to scale to problem sizes of interest. In case studies from benchmark datasets, we demonstrate that our approach yields a statistically significant improvement in classifier performance relative to the traditional multi-class approach.
Louis Chen Advisor(s): David Simchi-Levi	Distributionally Robust Max Flows Given an arbitrary flow network with stochastic arc capacities, we study how to find the worst-case coupling of the arc capacities that yields the smallest expected max-flow value. We approach this problem via a primal-dual pair of linear programs that together yield both the smallest expected max-flow and an optimal coupling.
Christopher Coey Advisor(s): Juan Pablo Vielma	Practical Conic Optimization in Julia and Mixed-integer Extensions Conic optimization research and solver implementations have almost exclusively focused on the symmetric cones (linear, second-order, and positive semidefinite cones), which are not sufficient for expressing many convex optimization problems of practical interest. We implement a new algorithm for conic optimization with generic convex cones for which a primal barrier function is known (for example, the sum-of-squares cones useful for polynomial and moment optimization). We also implement an outer-approximation-based mixed-integer conic optimization solver that combines the power of our conic solver with commercial branch-and-cut MILP solvers in order to provably converge to a global solution.
Tamar Cohen Advisor(s): Georgia Perakis	Detecting Customer Trends for Optimal Promotion Targeting Detecting trends in fashion can help retailers determine effective personalized promotion plans more easily. Social media data is valuable in understanding these trends, but it is often unavailable. We introduce a personalized demand model that captures customer trends from transaction data instead. We construct an efficient algorithm that is based on instrumental variables. This allows us to draw causal conclusions on the effects that targeted promotions have. The promotion targeting problem is NP-hard to solve, so we develop an efficient and provably-good greedy approach. Finally, we estimate our customer trend model and test the promotion targeting algorithm.
Ryan Cory-Wright Advisor(s): Dimitris Bertsimas	A Scalable Algorithm for Sparse and Robust Portfolios A central problem in financial economics concerns investing capital to maximize a portfolio's expected return while minimizing its variance, subject to an upper bound on the number of positions, minimum investment and linear inequality constraints, at scale. Existing approaches to this problem do not provide provably optimal portfolios for real-world problem sizes with more than 300 securities. In this work, we present a cutting-plane method which solves problems with $1000$s of securities to provable optimality, by exploiting a dual representation of the continuous Markowitz problem to obtain a closed form representation of the problem's subgradients. We improve the performance of the cutting-plane method in three different ways. First, we implement a local-search heuristic which applies our subgradient representation to obtain high-quality warm-starts. Second, we embed the local-search heuristic within the cutting-plane method to accelerate recovery of an optimal solution. Third, we exploit a correspondence between the convexified sparse Markowitz problem and a rotated second-order cone problem to obtain a tight lower bound which is sometimes exact. Finally, we establish that the cutting-plane method is 3-4 orders of magnitude more efficient than existing methods, and construct provably optimal sparsity-constrained frontiers for the S&P 500, Russell 1000, and Wilshire 5000.
Mathieu Dahan Advisor(s): Saurabh Amin	Defending Critical Infrastructure Networks against Correlated Failures We focus on the design of monitoring and inspection strategies to improve the resilience of critical infrastructure networks to correlated failure events. For strategic planning, we consider the problem of optimal allocation of monitoring resources for ensuring a target detection performance against worst-case attacks. This problem involves solving a large strategic game between the network operator and attacker. Our solution approach is based on combinatorial optimization and game-theoretic arguments, and provides good optimality guarantees. For operational response, we develop a modeling approach that utilizes failure predictions for designing optimal network inspection strategies. This problem is posed as a stochastic orienteering problem with probing constraints. We exploit the problem structure to develop integer programming formulations that can be solved for large-scale instances. Finally, we discuss how our integrated design can be used to defend real-world natural gas pipeline networks against security attacks and earthquakes.
Arthur Delarue and Sebastien Martin Advisor(s): Dimitris Bertsimas Advisor(s): Dimitris Bertsimas, Patrick Jaillet	From School Buses to Bell Times: Driving Policy with Optimization Spreading start times allows school districts to reduce transportation costs by reusing buses between schools. However, assigning each school a time involves both estimating the impact on transportation costs and reconciling additional competing objectives. These challenges force many school districts to make myopic decisions, leading to an expensive and inequitable status quo. For instance, most American teenagers start school before 8, despite evidence of significant associated health issues. We propose the first algorithm to jointly solve the school bus routing and bell time assignment problems. Our application in Boston led to $5 million in yearly savings (maintaining service quality despite a 50-bus fleet reduction) and to the unanimous approval of the first school start time reform in thirty years.
Julia Gaudio Advisor(s): David Gamarnik, Patrick Jaillet	Attracting Random Walks We introduce the Interacting Random Walks model, which is an attractive interacting particle system. In the model, particles move between adjacent vertices of a graph G, with transition probabilities that depend positively on particle counts at neighboring vertices. From an applied standpoint, the model captures systems in which quantity is attractive. We mention some possible applied areas, but the focus is on developing the properties of the associated Markov chain. The phase transition of the stationary distribution is of particular interest. When G is the complete graph, the model is a projection of the Potts model, whose phase transition is known. We show the existence of phase transition on all graphs. Joint work with David Gamarnik, Patrick Jaillet, Yury Polyanskiy, Yuval Peres, Eyal Lubetzky, and Reza Gheissari.
Emma Gibson Advisor(s): Jonas Jonasson	Optimal Routing and Scheduling in Sample Transportation for Diagnostic Networks Due to limited resources, diagnostic and disease monitoring services in sub-Saharan Africa are delivered through a network of clinics and laboratories. An ongoing challenge is to develop cost-effective sample transportation systems to ensure short turnaround times of results. Using data from Riders for Health in Malawi, we develop an algorithm for the daily scheduling and route optimization of couriers in a diagnostic network and evaluate its impact on turnaround times and distance driven.
Rim Hariss Advisor(s): Georgia Perakis, Y. Karen Zheng	Pricing for Heterogeneous Products: Analytics for Ticket Reselling In collaboration with a major secondary ticket exchange, we study a trading strategy for buying and reselling tickets for events. We look at classifying whether each individual ticket will sell at a given price, in the presence of confounding factors. We construct an optimization based ticket trading strategy that is the reseller is piloting as well as a novel method for heterogeneous treatment effect estimation for classification. On synthetic datasets this approach beats state-of-the-art machine learning approaches for estimating treatment effects. We show using our approach to trade on NBA ticket listings yields a seller's revenue up to $9.6 million (27% return) for a single season. This is joint work with Michael Alley, Max Biggs, Charles Herrmann, Michael Li and Prof. Georgia Perakis.
Hussein Hazimeh Advisor(s): Rahul Mazumder	Fast Algorithms for Best Subset Selection We consider the L0-regularized least squares problem, which is generally perceived as the "gold standard" under many sparse learning regimes. In spite of worst-case computational intractability results, recent work has shown that advances in mixed integer optimization can be used to obtain near-optimal solutions to this problem for instances where the number of features is in the order of thousands. While these methods lead to estimators with excellent statistical properties, often there is a price to pay in terms of a steep increase in computation times, especially when compared to the highly efficient popular algorithms for sparse learning (e.g., based on L1-regularization). In this work, we propose a hierarchy of necessary optimality conditions for the problem, which are more restrictive than the conditions commonly considered in the literature. We develop very fast algorithms, based on coordinate descent and local combinatorial search, that are guaranteed to satisfy the proposed optimality conditions. We show empirically that our framework can handle problem instances with p ~ 10^6 and works well in terms of both optimization and statistical properties, compared to state-of-the-art sparse learning algorithms. Our new sparse learning toolkit L0Learn reaches up to a three-fold speedup compared to popular toolkits such as glmnet and ncvreg.
Michael Hu Advisor(s): Retsef Levi	Evaluating the Importance of Non-facetime Work in Patient Panel Complexity Primary care physicians (PCPs) have traditionally been paid according to activities-based compensation models. Health systems are now moving to patient panel-based payments. This requires determining the workload associated with different patient panels. Until now, panels have been evaluated using risk scores that measure clinical complexity. However, risk scores do not capture non-facetime work, such as visit documentation & staff communication, which consumes substantial PCP time. We demonstrate that risk scores do not accurately measure the workload of patient panels. We then create prediction models that reliably estimate the workload required to manage different patients.
Haihao (Sean) Lu Advisor(s): Robert Freund	Scalable Linear Programming via First Order Methods Traditional methods for solving linear programming use either simplex method or interior point method. When facing the huge-scale problems arising in big data era, both methods do not work well due to the fact that they require solving a linear equation in each iteration, which is hard to be implemented in the distributed setting. We designed and implemented a first-order method approach for solving huge-scale linear programming. The numerical experiments show the power of the new approach for huge-scale problems traditional solvers can not solve. This is a joint work with Miles Lubin and David Applegate from Google.
Christopher McCord Advisor(s): Dimitris Bertsimas	Optimization over Continuous and Multi-dimensional Decisions with Observational Data We consider the optimization of an uncertain objective over continuous and multi-dimensional decision spaces in problems in which we are only provided with observational data. We propose a novel algorithmic framework that is tractable, asymptotically consistent, and superior to comparable methods on example problems. Our approach leverages predictive machine learning methods and incorporates information on the uncertainty of the predicted outcomes for the purpose of prescribing decisions. We demonstrate the efficacy of our method on examples involving both synthetic and real data sets.
Emily Meigs Advisor(s): Asuman Ozdaglar	Information and Learning in Traffic Routing We give a broad overview of two related projects: 1) How a simple distributed learning algorithm by individual users can converge to the Wardrop equilibrium almost surely. 2) A model of how routing applications (e.g. Waze, Google Maps) should optimally provide information to users to encourage necessary experimentation, while achieving a lower cost (than cost under equilibrium) and maintaining incentive compatibility.
Konstantina Mellou Advisor(s): Patrick Jaillet	Dynamic Redistribution of Bike Sharing Systems Spatial and temporal imbalances of the user demand of bike sharing systems often lead to empty and full bike stations. Since reliable service is essential for the viability of these systems, operators use a fleet of vehicles to redistribute bikes across the network. We propose a mixed integer programming formulation for the dynamic version of the problem, and, combined with heuristics and decomposition techniques, we are able to solve large real-world instances. Since accurate estimation of the user demand is essential to plan for efficient rebalancing, we also develop approaches to consider lost and shifted demand, which very often are not taken into account.
Agni Orfanoudaki Advisor(s): Dimitris Bertsimas	Personalized Treatment for Coronary Artery Disease Patients: A Machine Learning Approach Current clinical practice guidelines for managing coronary artery disease (CAD) account for general cardiovascular risk factors but do not present a framework that considers personalized, individual patient-specic factors. We developed a data-driven algorithm for personalized CAD management that significantly improves health outcomes relative to the standard of care. We used electronic medical records for 21,617 patients who received treatment for CAD at Boston Medical Center (BMC). Leveraging the patient's medical history and the clinical examination results, we developed predictive and prescriptive algorithms based on machine learning methods that provide personalized treatment recommendations to maximize the time from diagnosis to potential adverse event (myocardial infarction or stroke). We improve the expected time to adverse event after diagnosis by 15.3%, increasing from 73.4 to 84.6 months for the overall population. Our prediction model detects with 81.5% accuracy whether the patient will experience an adverse event due to CAD within a 10-year time frame. The algorithm performs particularly well for female (17.5% improvement) and Hispanic (24.8% improvement) subpopulations. We notice a positive shift in the percentage of elderly and young patients assigned to coronary artery bypass graft (CABG) surgery and in the proportion of middle-aged patients assigned to percutaneous coronary intervention (PCI). A personalized approach with modern machine learning techniques yielded substantial improvements in the management of CAD relative to the standard of care, providing physicians with an interpretable, accurate, readily implementable and effective tool.
Elisabeth Paulson Advisor(s): Retsef Levi, Georgia Perakis	Fresh food: Reducing waste and increasing consumption Elisabeth's work with Retsef and Georgia centers around access to, and consumption of, fresh food. The work thus far focuses on two interconnected problems: (i) Interventions to increase consumption of fruits and vegetables among low income households, and (ii) Reducing farm-level food waste in order to both improve farmers' economic viability and simultaneously increase the supply of fruits and vegetables to people in need. I n the first stream of work, Elisabeth has done empirical research to better understand the drivers of consumer decision-making with respect to food purchasing, and developed a mathematical model to determine a central planner's optimal budget allocation between various interventions, with the goal of maximizing consumers' consumption of fresh food. The second stream of work includes empirical estimation of farm-level food waste, understanding the behavioral and economic drivers of farm-level food waste, and developing interventions to reduce waste and increase throughput of fruits and vegetables, particularly to low-income areas, in a way that is financially viable to the farmer. This on-going work is in collaboration with Deishin Lee and the Boston Area Gleaners (a non-profit whose mission is to rescue surplus crop from farms and distribute it to people in need).
Jean Pauphilet Advisor(s): Dimitris Bertsimas	Sparse regression: Scalable algorithms and empirical performance In this work, we review state-of-the-art methods for feature selection in statistics with an application-oriented eye. We compare the empirical behaviors of five methods, the cardinality-constrained formulation, its Boolean relaxation, Lasso regularization and two methods with non-convex penalties, in various regimes of noise and correlation. In terms of accuracy – the number of correct features selected – we demonstrate that non-convex formulations should be preferred over Lasso, for they do not require mutual incoherence condition to hold and are generally more accurate. In terms of false detection – the number of incorrect features selected – cardinality-constrained formulations have a clear edge over both Lasso and non-convex regularizations. In addition, apart from the discrete optimization approach which requires a substantial, yet often affordable, computational time, all methods terminate in times comparable with the GLMNet package for Lasso. We released code for methods that were not publicly implemented. Jointly considered, accuracy, false detection and computational time provide a comprehensive assessment of each feature selection method and shed light on alternatives to the Lasso-regularization which are not as popular in practice yet.
Nicholas Renegar Advisor(s): Retsef Levi	Analytic and Network Approaches to Food Safety Regulation Despite extensive regulatory efforts, food safety has remained a significant problem both in the US and abroad. This presentation includes an overview of two pieces of work, developing models and tools to aid in food safety regulatory efforts. The first piece is aimed identifying risky US importers of shrimp, through the understanding that various networks (shipping, product, website) give significant indicators of risky behavior. The second piece of work is aimed at informing domestic food safety regulatory policy in China. Through analytics, applied machine learning, and automated scraping algorithms, we create a comprehensive database of food safety tests from largely independent regulatory agencies, and are able to understand the data in the context of the supply chain. This comprehensive dataset yields several clear recommendations for locationally-targeted food safety testing in China.
Somya Singhvi Advisor(s): Retsef Levi, Y. Karen Zheng	Data-Driven market design to improve efficiency in India’s agricultural markets This research effort is focused on agriculture supply chains and markets in India. The goal of this project is to develop, test and evaluate a range of interventions and decision support tools to improve the efficiency and safety of agriculture supply chains, as well as increasing the welfare of farmers and consumers. The MIT team collaborates with partner organizations on the ground in India to utilize available field data, by employing broad range of expertise, such as predictive analytics and optimization, supply chain management, market design and economics. The team is 1) conducting data-driven analysis to improve the efficiency of Unified Market Platform (UMP) in Karnataka; 2) analyzing the impact of different government interventions (inventory release policies, consumer subsidy schemes etc.) on prices of essential commodities; 3) developing a recommendation tool that identifies optimal agricultural target markets for farmers; 4) developing analytical tools to mitigate risks related to economically motivated adulteration in food supply chains.
Matthew Sobiesk Advisor(s): Dimitris Bertsimas, Rahul Mazumder	Optimal Classification Trees with Hyperplanes are as Powerful as Classification Neural Networks We investigate the modeling power of neural networks in comparison with optimal classification trees with hyperplanes (OCT-Hs). We prove that a variety of neural networks can be transformed to decision trees with hyperplanes, showing that OCT-Hs are at least as powerful as neural networks. Conversely, we show that a given classification OCT-H can be transformed to a classification neural network with the perception activation function, showing that these neural networks are at least as powerful as OCT-Hs, that is, OCT-Hs and these neural networks are equivalent in terms of modeling power.
Bradley Sturt Advisor(s): Dimitris Bertsimas	A Tractable, Data-Driven Approach for Multi-Stage Optimization Stochastic optimization can effectively describe many decision problems where uncertainty is revealed over multiple stages. A central question is how to estimate the joint distribution of the uncertain parameters, in order to find policies with good out-of-sample average performance. In addition, any such approach must be solvable in practical computation times using off-the-shelf software. In this work, we develop a novel data-driven framework for multi-stage decision making, in which we choose the policies that perform best when the historical data realizations are slightly perturbed. First, we establish asymptotic optimality guarantees for near-optimal policies, which hold when the uncertain parameters are (arbitrarily) correlated between stages. Second, we develop an approximation algorithm for finding near-optimal policies in mixed-integer settings based on piecewise linear decision rules. By exploiting duality, these policies are found by solving a compact optimization problem which scales very efficiently with the size of the dataset. We demonstrate the practical impact of the proposed approach via computational examples on an inventory control problem with short lifecycle products.
Yuchen Wang Advisor(s): Dimitris Bertsimas	Optimal Nonlinear Trees for Regression MARS is a greedy method for constructing a decision tree with nonlinear prediction functions in the leaves. We show that we can formulate the process of building such trees as a global optimization problem, which gives rise to our new method, Optimal Nonlinear Trees. We show in a collection of real-world datasets that our Optimal Nonlinear Trees improve substantially over MARS. Optimal Nonlinear Trees also perform better than tree-based boosting methods like xgboost on small size datasets and have comparable performance on larger datasets.
Holly Wiberg Advisor(s): Dimitris Bertsimas	Interpretable Clustering: An Optimal Trees Approach State-of-the-art clustering algorithms use heuristics to partition the feature vectors while providing little insight regarding the interpretability of each cluster. We present a new Optimal Clustering Trees algorithm that leverages mixed-integer optimization (MIO) techniques to generate interpretable and globally optimal clustering tree models. We demonstrate that our method achieves superior performance on both synthetic and real-world datasets and scales even when the number of observations is high. Joint work with Agni Orfanoudaki.
Julia Yan Advisor(s): Dimitris Bertsimas	Data-Driven Transit Network Design at Scale Transit authorities across the country have been re-designing their bus networks in an attempt to boost ridership. However, their analysis appears to rely on ad-hoc methods, for example, considering each bus line in isolation and using manual incremental adjustments with backtracking. We provide a holistic approach to optimally design a transit network using column-and-constraint generation. Our approach is tractable, scaling to hundreds of stops, and we demonstrate its usefulness on a case study with real data from Boston.
Ilias Zadik Advisor(s): David Gamarnik	High Dimensional Linear Regression using Lattice Basis Reduction In this talk, we focus on the high dimensional linear regression problem where the goal is to efficiently recover the unknown vector of coefficients from noisy linear observations of the features of the model. Unlike most of the literature on this model we make no sparsity assumption on the coefficients of the model. Instead we adopt a regularization based on assuming that the underlying coefficients have rational entries with the same denominator Q. We call this the Q-rationality assumption. We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We establish that under the Q-rationality assumption, our algorithm recovers exactly the vector of coefficients for a large class of distributions for the feature matrix and non-zero noise level. We prove that it is successful under small noise, even when the learner has access to only one observation. Furthermore, we prove that in many parameter regimes our algorithm tolerates a nearly optimal information-theoretic level of the noise. Joint work with David Gamarnik.
Renbo Zhao Advisor(s): Robert Freund	Primal-Dual Algorithms for Doubly Stochastic Three-Composite Bilinear Saddle Point Problems We develop primal-dual first-order algorithms for a class of doubly stochastic three-composite bilinear saddle point problems. We consider both convex-concave and strongly-convex-concave settings. In the former setting, the (first-order) oracle complexity of our method not only improves upon the existing methods, but also matches a lower bound derived for all first-order methods that solve this problem. In the latter setting, by leveraging the primal strong convexity, we design a restart scheme that significantly improves the oracle complexity as compared to the convex-concave case. In addition, we investigate the problem under primal relative smoothness and primal relative strong convexity, and develop and analyze algorithms for these settings accordingly.

Author

Title and Abstract

Lennart Baardman

Advisor(s): Georgia Perakis

Learning Optimal Online Advertising Portfolios with Periodic Budgets

The tremendous growth in online personalized data and ad channels enables publishers to auction off targeted advertising slots to advertisers. Everyday, advertisers bid on a portfolio of targets to maximize advertising revenue within their budget constraint. As the advertiser needs to learn the value of targets, we use an exploration-exploitation framework to model this online advertising portfolio optimization problem. We solve this problem using an optimistic-robust learning (ORL) algorithm based on UCB algorithms and robust optimization. Theoretically, we prove its expected regret is bounded, and computationally, we show its powerful performance on real-world data.

Hari Bandi

Advisor(s): Dimitris Bertsimas, Rahul Mazumder

Statistical Hypothesis Testing via Robust Optimization

In this work, we develop a novel framework for statistical hypothesis testing when given access to i.i.d samples that are valid under the corresponding hypotheses. In classical hypothesis testing, the uncertainty in the sample is modeled using probability distributions and then the rejection region for the null hypothesis or the p-value are computed based on the distribution of the chosen test statistic. In contrast, here we model the uncertainty in the sample using Wasserstein distance bounded uncertainty sets and design tests that maximally separate the two hypotheses using an affine combination of statistics by solving a sample robust optimization problem. Similar to classical hypothesis testing, our framework can compute the p-value for significance testing and confidence intervals for any parameters of interest. Furthermore, we also develop a novel bootstrap distributionally robust test that minimizes the maximum cumulative risk on all distributions that are close to the bootstrap distribution under the Wasserstein distance metric.

Max Biggs

Advisor(s): Georgia Perakis

Optimizing over Objective Functions Determined from Random Forests

We examine a data-driven optimization approach, where an uncertain objective function is estimated using a random forest. We explore how to make optimal decisions, as evaluated by the random forest, while decisions can be constrained by an arbitrary polyhedral set to model business constraints. We model this optimization problem as a Mixed Integer Linear Program (MILP) and show this formulation can be solved to optimality efficiently using Benders cuts. For large problem instances, we consider a random forest approximation that consists of only a subset of trees and establish that this gives rise to near-optimal solutions by proving analytical guarantees.

This is joint work with Rim Hariss and Prof. Georgia Perakis.

Zachary Blanks

Advisor(s): Rahul Mazumder

Label Reduction: A Novel Technique for Hierarchical Classification

Hierarchical classification is strategy for multi-class classification which involves exploiting either a known or unknown label hierarchy to improve classifier performance as a consequence of label similarity. We focus on the case of an unknown label hierarchy and propose to find it by solving a mixed-integer linear optimization problem. Using the structure of the formulation we employ a heuristic local search which allows us to scale to problem sizes of interest. In case studies from benchmark datasets, we demonstrate that our approach yields a statistically significant improvement in classifier performance relative to the traditional multi-class approach.

Louis Chen

Advisor(s): David Simchi-Levi

Distributionally Robust Max Flows

Given an arbitrary flow network with stochastic arc capacities, we study how to find the worst-case coupling of the arc capacities that yields the smallest expected max-flow value. We approach this problem via a primal-dual pair of linear programs that together yield both the smallest expected max-flow and an optimal coupling.

Christopher Coey

Advisor(s): Juan Pablo Vielma

Practical Conic Optimization in Julia and Mixed-integer Extensions

Conic optimization research and solver implementations have almost exclusively focused on the symmetric cones (linear, second-order, and positive semidefinite cones), which are not sufficient for expressing many convex optimization problems of practical interest. We implement a new algorithm for conic optimization with generic convex cones for which a primal barrier function is known (for example, the sum-of-squares cones useful for polynomial and moment optimization). We also implement an outer-approximation-based mixed-integer conic optimization solver that combines the power of our conic solver with commercial branch-and-cut MILP solvers in order to provably converge to a global solution.

Tamar Cohen

Advisor(s): Georgia Perakis

Detecting Customer Trends for Optimal Promotion Targeting

Detecting trends in fashion can help retailers determine effective personalized promotion plans more easily. Social media data is valuable in understanding these trends, but it is often unavailable. We introduce a personalized demand model that captures customer trends from transaction data instead. We construct an efficient algorithm that is based on instrumental variables. This allows us to draw causal conclusions on the effects that targeted promotions have. The promotion targeting problem is NP-hard to solve, so we develop an efficient and provably-good greedy approach. Finally, we estimate our customer trend model and test the promotion targeting algorithm.

Ryan Cory-Wright

Advisor(s): Dimitris Bertsimas

A Scalable Algorithm for Sparse and Robust Portfolios

A central problem in financial economics concerns investing capital to maximize a portfolio's expected return while minimizing its variance, subject to an upper bound on the number of positions, minimum investment and linear inequality constraints, at scale. Existing approaches to this problem do not provide provably optimal portfolios for real-world problem sizes with more than 300 securities. In this work, we present a cutting-plane method which solves problems with $1000$s of securities to provable optimality, by exploiting a dual representation of the continuous Markowitz problem to obtain a closed form representation of the problem's subgradients. We improve the performance of the cutting-plane method in three different ways. First, we implement a local-search heuristic which applies our subgradient representation to obtain high-quality warm-starts. Second, we embed the local-search heuristic within the cutting-plane method to accelerate recovery of an optimal solution. Third, we exploit a correspondence between the convexified sparse Markowitz problem and a rotated second-order cone problem to obtain a tight lower bound which is sometimes exact. Finally, we establish that the cutting-plane method is 3-4 orders of magnitude more efficient than existing methods, and construct provably optimal sparsity-constrained frontiers for the S&P 500, Russell 1000, and Wilshire 5000.

Mathieu Dahan

Advisor(s): Saurabh Amin

Defending Critical Infrastructure Networks against Correlated Failures

We focus on the design of monitoring and inspection strategies to improve the resilience of critical infrastructure networks to correlated failure events. For strategic planning, we consider the problem of optimal allocation of monitoring resources for ensuring a target detection performance against worst-case attacks. This problem involves solving a large strategic game between the network operator and attacker. Our solution approach is based on combinatorial optimization and game-theoretic arguments, and provides good optimality guarantees. For operational response, we develop a modeling approach that utilizes failure predictions for designing optimal network inspection strategies. This problem is posed as a stochastic orienteering problem with probing constraints. We exploit the problem structure to develop integer programming formulations that can be solved for large-scale instances. Finally, we discuss how our integrated design can be used to defend real-world natural gas pipeline networks against security attacks and earthquakes.

Arthur Delarue and Sebastien Martin

Advisor(s): Dimitris Bertsimas
Advisor(s): Dimitris Bertsimas, Patrick Jaillet

From School Buses to Bell Times: Driving Policy with Optimization

Spreading start times allows school districts to reduce transportation costs by reusing buses between schools. However, assigning each school a time involves both estimating the impact on transportation costs and reconciling additional competing objectives. These challenges force many school districts to make myopic decisions, leading to an expensive and inequitable status quo. For instance, most American teenagers start school before 8, despite evidence of significant associated health issues. We propose the first algorithm to jointly solve the school bus routing and bell time assignment problems. Our application in Boston led to $5 million in yearly savings (maintaining service quality despite a 50-bus fleet reduction) and to the unanimous approval of the first school start time reform in thirty years.

Julia Gaudio

Advisor(s): David Gamarnik, Patrick Jaillet

Attracting Random Walks

We introduce the Interacting Random Walks model, which is an attractive interacting particle system. In the model, particles move between adjacent vertices of a graph G, with transition probabilities that depend positively on particle counts at neighboring vertices. From an applied standpoint, the model captures systems in which quantity is attractive. We mention some possible applied areas, but the focus is on developing the properties of the associated Markov chain. The phase transition of the stationary distribution is of particular interest. When G is the complete graph, the model is a projection of the Potts model, whose phase transition is known. We show the existence of phase transition on all graphs.

Joint work with David Gamarnik, Patrick Jaillet, Yury Polyanskiy, Yuval Peres, Eyal Lubetzky, and Reza Gheissari.

Emma Gibson

Advisor(s): Jonas Jonasson

Optimal Routing and Scheduling in Sample Transportation for Diagnostic Networks

Due to limited resources, diagnostic and disease monitoring services in sub-Saharan Africa are delivered through a network of clinics and laboratories. An ongoing challenge is to develop cost-effective sample transportation systems to ensure short turnaround times of results. Using data from Riders for Health in Malawi, we develop an algorithm for the daily scheduling and route optimization of couriers in a diagnostic network and evaluate its impact on turnaround times and distance driven.

Rim Hariss

Advisor(s): Georgia Perakis, Y. Karen Zheng

Pricing for Heterogeneous Products: Analytics for Ticket Reselling

In collaboration with a major secondary ticket exchange, we study a trading strategy for buying and reselling tickets for events. We look at classifying whether each individual ticket will sell at a given price, in the presence of confounding factors. We construct an optimization based ticket trading strategy that is the reseller is piloting as well as a novel method for heterogeneous treatment effect estimation for classification. On synthetic datasets this approach beats state-of-the-art machine learning approaches for estimating treatment effects. We show using our approach to trade on NBA ticket listings yields a seller's revenue up to $9.6 million (27% return) for a single season.

This is joint work with Michael Alley, Max Biggs, Charles Herrmann, Michael Li and Prof. Georgia Perakis.

Hussein Hazimeh

Advisor(s): Rahul Mazumder

Fast Algorithms for Best Subset Selection

We consider the L0-regularized least squares problem, which is generally perceived as the "gold standard" under many sparse learning regimes. In spite of worst-case computational intractability results, recent work has shown that advances in mixed integer optimization can be used to obtain near-optimal solutions to this problem for instances where the number of features is in the order of thousands. While these methods lead to estimators with excellent statistical properties, often there is a price to pay in terms of a steep increase in computation times, especially when compared to the highly efficient popular algorithms for sparse learning (e.g., based on L1-regularization). In this work, we propose a hierarchy of necessary optimality conditions for the problem, which are more restrictive than the conditions commonly considered in the literature. We develop very fast algorithms, based on coordinate descent and local combinatorial search, that are guaranteed to satisfy the proposed optimality conditions. We show empirically that our framework can handle problem instances with p ~ 10^6 and works well in terms of both optimization and statistical properties, compared to state-of-the-art sparse learning algorithms. Our new sparse learning toolkit L0Learn reaches up to a three-fold speedup compared to popular toolkits such as glmnet and ncvreg.

Michael Hu

Advisor(s): Retsef Levi

Evaluating the Importance of Non-facetime Work in Patient Panel Complexity

Primary care physicians (PCPs) have traditionally been paid according to activities-based compensation models. Health systems are now moving to patient panel-based payments. This requires determining the workload associated with different patient panels.

Until now, panels have been evaluated using risk scores that measure clinical complexity. However, risk scores do not capture non-facetime work, such as visit documentation & staff communication, which consumes substantial PCP time.

We demonstrate that risk scores do not accurately measure the workload of patient panels. We then create prediction models that reliably estimate the workload required to manage different patients.

Haihao (Sean) Lu

Advisor(s): Robert Freund

Scalable Linear Programming via First Order Methods

Traditional methods for solving linear programming use either simplex method or interior point method. When facing the huge-scale problems arising in big data era, both methods do not work well due to the fact that they require solving a linear equation in each iteration, which is hard to be implemented in the distributed setting. We designed and implemented a first-order method approach for solving huge-scale linear programming. The numerical experiments show the power of the new approach for huge-scale problems traditional solvers can not solve. This is a joint work with Miles Lubin and David Applegate from Google.

Christopher McCord

Advisor(s): Dimitris Bertsimas

Optimization over Continuous and Multi-dimensional Decisions with Observational Data

We consider the optimization of an uncertain objective over continuous and multi-dimensional decision spaces in problems in which we are only provided with observational data. We propose a novel algorithmic framework that is tractable, asymptotically consistent, and superior to comparable methods on example problems. Our approach leverages predictive machine learning methods and incorporates information on the uncertainty of the predicted outcomes for the purpose of prescribing decisions. We demonstrate the efficacy of our method on examples involving both synthetic and real data sets.

Emily Meigs

Advisor(s): Asuman Ozdaglar

Information and Learning in Traffic Routing

We give a broad overview of two related projects: 1) How a simple distributed learning algorithm by individual users can converge to the Wardrop equilibrium almost surely. 2) A model of how routing applications (e.g. Waze, Google Maps) should optimally provide information to users to encourage necessary experimentation, while achieving a lower cost (than cost under equilibrium) and maintaining incentive compatibility.

Konstantina Mellou

Advisor(s): Patrick Jaillet

Dynamic Redistribution of Bike Sharing Systems

Spatial and temporal imbalances of the user demand of bike sharing systems often lead to empty and full bike stations. Since reliable service is essential for the viability of these systems, operators use a fleet of vehicles to redistribute bikes across the network. We propose a mixed integer programming formulation for the dynamic version of the problem, and, combined with heuristics and decomposition techniques, we are able to solve large real-world instances. Since accurate estimation of the user demand is essential to plan for efficient rebalancing, we also develop approaches to consider lost and shifted demand, which very often are not taken into account.

Agni Orfanoudaki

Advisor(s): Dimitris Bertsimas

Personalized Treatment for Coronary Artery Disease Patients: A Machine Learning Approach

Current clinical practice guidelines for managing coronary artery disease (CAD) account for general cardiovascular risk factors but do not present a framework that considers personalized, individual patient-specic factors. We developed a data-driven algorithm for personalized CAD management that significantly improves health outcomes relative to the standard of care. We used electronic medical records for 21,617 patients who received treatment for CAD at Boston Medical Center (BMC). Leveraging the patient's medical history and the clinical examination results, we developed predictive and prescriptive algorithms based on machine learning methods that provide personalized treatment recommendations to maximize the time from diagnosis to potential adverse event (myocardial infarction or stroke). We improve the expected time to adverse event after diagnosis by 15.3%, increasing from 73.4 to 84.6 months for the overall population. Our prediction model detects with 81.5% accuracy whether the patient will experience an adverse event due to CAD within a 10-year time frame. The algorithm performs particularly well for female (17.5% improvement) and Hispanic (24.8% improvement) subpopulations. We notice a positive shift in the percentage of elderly and young patients assigned to coronary artery bypass graft (CABG) surgery and in the proportion of middle-aged patients assigned to percutaneous coronary intervention (PCI). A personalized approach with modern machine learning techniques yielded substantial improvements in the management of CAD relative to the standard of care, providing physicians with an interpretable, accurate, readily implementable and effective tool.

Elisabeth Paulson

Advisor(s): Retsef Levi, Georgia Perakis

Fresh food: Reducing waste and increasing consumption

Elisabeth's work with Retsef and Georgia centers around access to, and consumption of, fresh food. The work thus far focuses on two interconnected problems: (i) Interventions to increase consumption of fruits and vegetables among low income households, and (ii) Reducing farm-level food waste in order to both improve farmers' economic viability and simultaneously increase the supply of fruits and vegetables to people in need. I

n the first stream of work, Elisabeth has done empirical research to better understand the drivers of consumer decision-making with respect to food purchasing, and developed a mathematical model to determine a central planner's optimal budget allocation between various interventions, with the goal of maximizing consumers' consumption of fresh food. The second stream of work includes empirical estimation of farm-level food waste, understanding the behavioral and economic drivers of farm-level food waste, and developing interventions to reduce waste and increase throughput of fruits and vegetables, particularly to low-income areas, in a way that is financially viable to the farmer. This on-going work is in collaboration with Deishin Lee and the Boston Area Gleaners (a non-profit whose mission is to rescue surplus crop from farms and distribute it to people in need).

Jean Pauphilet

Advisor(s): Dimitris Bertsimas

Sparse regression: Scalable algorithms and empirical performance

In this work, we review state-of-the-art methods for feature selection in statistics with an application-oriented eye. We compare the empirical behaviors of five methods, the cardinality-constrained formulation, its Boolean relaxation, Lasso regularization and two methods with non-convex penalties, in various regimes of noise and correlation. In terms of accuracy – the number of correct features selected – we demonstrate that non-convex formulations should be preferred over Lasso, for they do not require mutual incoherence condition to hold and are generally more accurate. In terms of false detection – the number of incorrect features selected – cardinality-constrained formulations have a clear edge over both Lasso and non-convex regularizations. In addition, apart from the discrete optimization approach which requires a substantial, yet often affordable, computational time, all methods terminate in times comparable with the GLMNet package for Lasso. We released code for methods that were not publicly implemented. Jointly considered, accuracy, false detection and computational time provide a comprehensive assessment of each feature selection method and shed light on alternatives to the Lasso-regularization which are not as popular in practice yet.

Nicholas Renegar

Advisor(s): Retsef Levi

Analytic and Network Approaches to Food Safety Regulation

Despite extensive regulatory efforts, food safety has remained a significant problem both in the US and abroad. This presentation includes an overview of two pieces of work, developing models and tools to aid in food safety regulatory efforts. The first piece is aimed identifying risky US importers of shrimp, through the understanding that various networks (shipping, product, website) give significant indicators of risky behavior. The second piece of work is aimed at informing domestic food safety regulatory policy in China. Through analytics, applied machine learning, and automated scraping algorithms, we create a comprehensive database of food safety tests from largely independent regulatory agencies, and are able to understand the data in the context of the supply chain. This comprehensive dataset yields several clear recommendations for locationally-targeted food safety testing in China.

Somya Singhvi

Advisor(s): Retsef Levi, Y. Karen Zheng

Data-Driven market design to improve efficiency in India’s agricultural markets

This research effort is focused on agriculture supply chains and markets in India. The goal of this project is to develop, test and evaluate a range of interventions and decision support tools to improve the efficiency and safety of agriculture supply chains, as well as increasing the welfare of farmers and consumers. The MIT team collaborates with partner organizations on the ground in India to utilize available field data, by employing broad range of expertise, such as predictive analytics and optimization, supply chain management, market design and economics. The team is 1) conducting data-driven analysis to improve the efficiency of Unified Market Platform (UMP) in Karnataka; 2) analyzing the impact of different government interventions (inventory release policies, consumer subsidy schemes etc.) on prices of essential commodities; 3) developing a recommendation tool that identifies optimal agricultural target markets for farmers; 4) developing analytical tools to mitigate risks related to economically motivated adulteration in food supply chains.

Matthew Sobiesk

Advisor(s): Dimitris Bertsimas, Rahul Mazumder

Optimal Classification Trees with Hyperplanes are as Powerful as Classification Neural Networks

We investigate the modeling power of neural networks in comparison with optimal classification trees with hyperplanes (OCT-Hs). We prove that a variety of neural networks can be transformed to decision trees with hyperplanes, showing that OCT-Hs are at least as powerful as neural networks. Conversely, we show that a given classification OCT-H can be transformed to a classification neural network with the perception activation function, showing that these neural networks are at least as powerful as OCT-Hs, that is, OCT-Hs and these neural networks are equivalent in terms of modeling power.

Bradley Sturt

Advisor(s): Dimitris Bertsimas

A Tractable, Data-Driven Approach for Multi-Stage Optimization

Stochastic optimization can effectively describe many decision problems where uncertainty is revealed over multiple stages. A central question is how to estimate the joint distribution of the uncertain parameters, in order to find policies with good out-of-sample average performance. In addition, any such approach must be solvable in practical computation times using off-the-shelf software. In this work, we develop a novel data-driven framework for multi-stage decision making, in which we choose the policies that perform best when the historical data realizations are slightly perturbed. First, we establish asymptotic optimality guarantees for near-optimal policies, which hold when the uncertain parameters are (arbitrarily) correlated between stages. Second, we develop an approximation algorithm for finding near-optimal policies in mixed-integer settings based on piecewise linear decision rules. By exploiting duality, these policies are found by solving a compact optimization problem which scales very efficiently with the size of the dataset. We demonstrate the practical impact of the proposed approach via computational examples on an inventory control problem with short lifecycle products.

Yuchen Wang

Advisor(s): Dimitris Bertsimas

Optimal Nonlinear Trees for Regression

MARS is a greedy method for constructing a decision tree with nonlinear prediction functions in the leaves. We show that we can formulate the process of building such trees as a global optimization problem, which gives rise to our new method, Optimal Nonlinear Trees. We show in a collection of real-world datasets that our Optimal Nonlinear Trees improve substantially over MARS. Optimal Nonlinear Trees also perform better than tree-based boosting methods like xgboost on small size datasets and have comparable performance on larger datasets.

Holly Wiberg

Advisor(s): Dimitris Bertsimas

Interpretable Clustering: An Optimal Trees Approach

State-of-the-art clustering algorithms use heuristics to partition the feature vectors while providing little insight regarding the interpretability of each cluster. We present a new Optimal Clustering Trees algorithm that leverages mixed-integer optimization (MIO) techniques to generate interpretable and globally optimal clustering tree models. We demonstrate that our method achieves superior performance on both synthetic and real-world datasets and scales even when the number of observations is high.

Joint work with Agni Orfanoudaki.

Julia Yan

Advisor(s): Dimitris Bertsimas

Data-Driven Transit Network Design at Scale

Transit authorities across the country have been re-designing their bus networks in an attempt to boost ridership. However, their analysis appears to rely on ad-hoc methods, for example, considering each bus line in isolation and using manual incremental adjustments with backtracking. We provide a holistic approach to optimally design a transit network using column-and-constraint generation. Our approach is tractable, scaling to hundreds of stops, and we demonstrate its usefulness on a case study with real data from Boston.

Ilias Zadik

Advisor(s): David Gamarnik

High Dimensional Linear Regression using Lattice Basis Reduction

In this talk, we focus on the high dimensional linear regression problem where the goal is to efficiently recover the unknown vector of coefficients from noisy linear observations of the features of the model. Unlike most of the literature on this model we make no sparsity assumption on the coefficients of the model. Instead we adopt a regularization based on assuming that the underlying coefficients have rational entries with the same denominator Q. We call this the Q-rationality assumption.

We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We establish that under the Q-rationality assumption, our algorithm recovers exactly the vector of coefficients for a large class of distributions for the feature matrix and non-zero noise level. We prove that it is successful under small noise, even when the learner has access to only one observation. Furthermore, we prove that in many parameter regimes our algorithm tolerates a nearly optimal information-theoretic level of the noise.

Joint work with David Gamarnik.

Renbo Zhao

Advisor(s): Robert Freund

Primal-Dual Algorithms for Doubly Stochastic Three-Composite Bilinear Saddle Point Problems

We develop primal-dual first-order algorithms for a class of doubly stochastic three-composite bilinear saddle point problems. We consider both convex-concave and strongly-convex-concave settings. In the former setting, the (first-order) oracle complexity of our method not only improves upon the existing methods, but also matches a lower bound derived for all first-order methods that solve this problem. In the latter setting, by leveraging the primal strong convexity, we design a restart scheme that significantly improves the oracle complexity as compared to the convex-concave case. In addition, we investigate the problem under primal relative smoothness and primal relative strong convexity, and develop and analyze algorithms for these settings accordingly.

Poster Sessions