
Author  Title and Abstract 

Lennart Baardman Advisor(s): Georgia Perakis 
Learning Optimal Online Advertising Portfolios with Periodic Budgets The tremendous growth in online personalized data and ad channels enables publishers to auction off targeted advertising slots to advertisers. Everyday, advertisers bid on a portfolio of targets to maximize advertising revenue within their budget constraint. As the advertiser needs to learn the value of targets, we use an explorationexploitation framework to model this online advertising portfolio optimization problem. We solve this problem using an optimisticrobust learning (ORL) algorithm based on UCB algorithms and robust optimization. Theoretically, we prove its expected regret is bounded, and computationally, we show its powerful performance on realworld data. 
Hari Bandi Advisor(s): Dimitris Bertsimas, Rahul Mazumder

Statistical Hypothesis Testing via Robust Optimization In this work, we develop a novel framework for statistical hypothesis testing when given access to i.i.d samples that are valid under the corresponding hypotheses. In classical hypothesis testing, the uncertainty in the sample is modeled using probability distributions and then the rejection region for the null hypothesis or the pvalue are computed based on the distribution of the chosen test statistic. In contrast, here we model the uncertainty in the sample using Wasserstein distance bounded uncertainty sets and design tests that maximally separate the two hypotheses using an affine combination of statistics by solving a sample robust optimization problem. Similar to classical hypothesis testing, our framework can compute the pvalue for significance testing and confidence intervals for any parameters of interest. Furthermore, we also develop a novel bootstrap distributionally robust test that minimizes the maximum cumulative risk on all distributions that are close to the bootstrap distribution under the Wasserstein distance metric. 
Max Biggs Advisor(s): Georgia Perakis 
Optimizing over Objective Functions Determined from Random Forests We examine a datadriven optimization approach, where an uncertain objective function is estimated using a random forest. We explore how to make optimal decisions, as evaluated by the random forest, while decisions can be constrained by an arbitrary polyhedral set to model business constraints. We model this optimization problem as a Mixed Integer Linear Program (MILP) and show this formulation can be solved to optimality efficiently using Benders cuts. For large problem instances, we consider a random forest approximation that consists of only a subset of trees and establish that this gives rise to nearoptimal solutions by proving analytical guarantees. This is joint work with Rim Hariss and Prof. Georgia Perakis. 
Zachary Blanks Advisor(s): Rahul Mazumder

Label Reduction: A Novel Technique for Hierarchical Classification Hierarchical classification is strategy for multiclass classification which involves exploiting either a known or unknown label hierarchy to improve classifier performance as a consequence of label similarity. We focus on the case of an unknown label hierarchy and propose to find it by solving a mixedinteger linear optimization problem. Using the structure of the formulation we employ a heuristic local search which allows us to scale to problem sizes of interest. In case studies from benchmark datasets, we demonstrate that our approach yields a statistically significant improvement in classifier performance relative to the traditional multiclass approach. 
Louis Chen Advisor(s): David SimchiLevi

Distributionally Robust Max Flows Given an arbitrary flow network with stochastic arc capacities, we study how to find the worstcase coupling of the arc capacities that yields the smallest expected maxflow value. We approach this problem via a primaldual pair of linear programs that together yield both the smallest expected maxflow and an optimal coupling. 
Christopher Coey Advisor(s): Juan Pablo Vielma

Practical Conic Optimization in Julia and Mixedinteger Extensions Conic optimization research and solver implementations have almost exclusively focused on the symmetric cones (linear, secondorder, and positive semidefinite cones), which are not sufficient for expressing many convex optimization problems of practical interest. We implement a new algorithm for conic optimization with generic convex cones for which a primal barrier function is known (for example, the sumofsquares cones useful for polynomial and moment optimization). We also implement an outerapproximationbased mixedinteger conic optimization solver that combines the power of our conic solver with commercial branchandcut MILP solvers in order to provably converge to a global solution. 
Tamar Cohen Advisor(s): Georgia Perakis 
Detecting Customer Trends for Optimal Promotion Targeting Detecting trends in fashion can help retailers determine effective personalized promotion plans more easily. Social media data is valuable in understanding these trends, but it is often unavailable. We introduce a personalized demand model that captures customer trends from transaction data instead. We construct an efficient algorithm that is based on instrumental variables. This allows us to draw causal conclusions on the effects that targeted promotions have. The promotion targeting problem is NPhard to solve, so we develop an efficient and provablygood greedy approach. Finally, we estimate our customer trend model and test the promotion targeting algorithm. 
Ryan CoryWright Advisor(s): Dimitris Bertsimas 
A Scalable Algorithm for Sparse and Robust Portfolios A central problem in financial economics concerns investing capital to maximize a portfolio's expected return while minimizing its variance, subject to an upper bound on the number of positions, minimum investment and linear inequality constraints, at scale. Existing approaches to this problem do not provide provably optimal portfolios for realworld problem sizes with more than 300 securities. In this work, we present a cuttingplane method which solves problems with $1000$s of securities to provable optimality, by exploiting a dual representation of the continuous Markowitz problem to obtain a closed form representation of the problem's subgradients. We improve the performance of the cuttingplane method in three different ways. First, we implement a localsearch heuristic which applies our subgradient representation to obtain highquality warmstarts. Second, we embed the localsearch heuristic within the cuttingplane method to accelerate recovery of an optimal solution. Third, we exploit a correspondence between the convexified sparse Markowitz problem and a rotated secondorder cone problem to obtain a tight lower bound which is sometimes exact. Finally, we establish that the cuttingplane method is 34 orders of magnitude more efficient than existing methods, and construct provably optimal sparsityconstrained frontiers for the S&P 500, Russell 1000, and Wilshire 5000. 
Mathieu Dahan Advisor(s): Saurabh Amin

Defending Critical Infrastructure Networks against Correlated Failures We focus on the design of monitoring and inspection strategies to improve the resilience of critical infrastructure networks to correlated failure events. For strategic planning, we consider the problem of optimal allocation of monitoring resources for ensuring a target detection performance against worstcase attacks. This problem involves solving a large strategic game between the network operator and attacker. Our solution approach is based on combinatorial optimization and gametheoretic arguments, and provides good optimality guarantees. For operational response, we develop a modeling approach that utilizes failure predictions for designing optimal network inspection strategies. This problem is posed as a stochastic orienteering problem with probing constraints. We exploit the problem structure to develop integer programming formulations that can be solved for largescale instances. Finally, we discuss how our integrated design can be used to defend realworld natural gas pipeline networks against security attacks and earthquakes. 
Arthur Delarue and Sebastien Martin Advisor(s): Dimitris Bertsimas

From School Buses to Bell Times: Driving Policy with Optimization Spreading start times allows school districts to reduce transportation costs by reusing buses between schools. However, assigning each school a time involves both estimating the impact on transportation costs and reconciling additional competing objectives. These challenges force many school districts to make myopic decisions, leading to an expensive and inequitable status quo. For instance, most American teenagers start school before 8, despite evidence of significant associated health issues. We propose the first algorithm to jointly solve the school bus routing and bell time assignment problems. Our application in Boston led to $5 million in yearly savings (maintaining service quality despite a 50bus fleet reduction) and to the unanimous approval of the first school start time reform in thirty years. 
Julia Gaudio Advisor(s): David Gamarnik, Patrick Jaillet 
Attracting Random Walks We introduce the Interacting Random Walks model, which is an attractive interacting particle system. In the model, particles move between adjacent vertices of a graph G, with transition probabilities that depend positively on particle counts at neighboring vertices. From an applied standpoint, the model captures systems in which quantity is attractive. We mention some possible applied areas, but the focus is on developing the properties of the associated Markov chain. The phase transition of the stationary distribution is of particular interest. When G is the complete graph, the model is a projection of the Potts model, whose phase transition is known. We show the existence of phase transition on all graphs. Joint work with David Gamarnik, Patrick Jaillet, Yury Polyanskiy, Yuval Peres, Eyal Lubetzky, and Reza Gheissari. 
Emma Gibson Advisor(s): Jonas Jonasson

Optimal Routing and Scheduling in Sample Transportation for Diagnostic Networks Due to limited resources, diagnostic and disease monitoring services in subSaharan Africa are delivered through a network of clinics and laboratories. An ongoing challenge is to develop costeffective sample transportation systems to ensure short turnaround times of results. Using data from Riders for Health in Malawi, we develop an algorithm for the daily scheduling and route optimization of couriers in a diagnostic network and evaluate its impact on turnaround times and distance driven. 
Rim Hariss Advisor(s): Georgia Perakis, Y. Karen Zheng 
Pricing for Heterogeneous Products: Analytics for Ticket Reselling In collaboration with a major secondary ticket exchange, we study a trading strategy for buying and reselling tickets for events. We look at classifying whether each individual ticket will sell at a given price, in the presence of confounding factors. We construct an optimization based ticket trading strategy that is the reseller is piloting as well as a novel method for heterogeneous treatment effect estimation for classification. On synthetic datasets this approach beats stateoftheart machine learning approaches for estimating treatment effects. We show using our approach to trade on NBA ticket listings yields a seller's revenue up to $9.6 million (27% return) for a single season. This is joint work with Michael Alley, Max Biggs, Charles Herrmann, Michael Li and Prof. Georgia Perakis. 
Hussein Hazimeh Advisor(s): Rahul Mazumder 
Fast Algorithms for Best Subset Selection We consider the L0regularized least squares problem, which is generally perceived as the "gold standard" under many sparse learning regimes. In spite of worstcase computational intractability results, recent work has shown that advances in mixed integer optimization can be used to obtain nearoptimal solutions to this problem for instances where the number of features is in the order of thousands. While these methods lead to estimators with excellent statistical properties, often there is a price to pay in terms of a steep increase in computation times, especially when compared to the highly efficient popular algorithms for sparse learning (e.g., based on L1regularization). In this work, we propose a hierarchy of necessary optimality conditions for the problem, which are more restrictive than the conditions commonly considered in the literature. We develop very fast algorithms, based on coordinate descent and local combinatorial search, that are guaranteed to satisfy the proposed optimality conditions. We show empirically that our framework can handle problem instances with p ~ 10^6 and works well in terms of both optimization and statistical properties, compared to stateoftheart sparse learning algorithms. Our new sparse learning toolkit L0Learn reaches up to a threefold speedup compared to popular toolkits such as glmnet and ncvreg. 
Michael Hu Advisor(s): Retsef Levi

Evaluating the Importance of Nonfacetime Work in Patient Panel Complexity Primary care physicians (PCPs) have traditionally been paid according to activitiesbased compensation models. Health systems are now moving to patient panelbased payments. This requires determining the workload associated with different patient panels. Until now, panels have been evaluated using risk scores that measure clinical complexity. However, risk scores do not capture nonfacetime work, such as visit documentation & staff communication, which consumes substantial PCP time. We demonstrate that risk scores do not accurately measure the workload of patient panels. We then create prediction models that reliably estimate the workload required to manage different patients. 
Haihao (Sean) Lu Advisor(s): Robert Freund

Scalable Linear Programming via First Order Methods Traditional methods for solving linear programming use either simplex method or interior point method. When facing the hugescale problems arising in big data era, both methods do not work well due to the fact that they require solving a linear equation in each iteration, which is hard to be implemented in the distributed setting. We designed and implemented a firstorder method approach for solving hugescale linear programming. The numerical experiments show the power of the new approach for hugescale problems traditional solvers can not solve. This is a joint work with Miles Lubin and David Applegate from Google. 
Christopher McCord Advisor(s): Dimitris Bertsimas

Optimization over Continuous and Multidimensional Decisions with Observational Data We consider the optimization of an uncertain objective over continuous and multidimensional decision spaces in problems in which we are only provided with observational data. We propose a novel algorithmic framework that is tractable, asymptotically consistent, and superior to comparable methods on example problems. Our approach leverages predictive machine learning methods and incorporates information on the uncertainty of the predicted outcomes for the purpose of prescribing decisions. We demonstrate the efficacy of our method on examples involving both synthetic and real data sets. 
Emily Meigs Advisor(s): Asuman Ozdaglar 
Information and Learning in Traffic Routing We give a broad overview of two related projects: 1) How a simple distributed learning algorithm by individual users can converge to the Wardrop equilibrium almost surely. 2) A model of how routing applications (e.g. Waze, Google Maps) should optimally provide information to users to encourage necessary experimentation, while achieving a lower cost (than cost under equilibrium) and maintaining incentive compatibility. 
Konstantina Mellou Advisor(s): Patrick Jaillet

Dynamic Redistribution of Bike Sharing Systems Spatial and temporal imbalances of the user demand of bike sharing systems often lead to empty and full bike stations. Since reliable service is essential for the viability of these systems, operators use a fleet of vehicles to redistribute bikes across the network. We propose a mixed integer programming formulation for the dynamic version of the problem, and, combined with heuristics and decomposition techniques, we are able to solve large realworld instances. Since accurate estimation of the user demand is essential to plan for efficient rebalancing, we also develop approaches to consider lost and shifted demand, which very often are not taken into account. 
Agni Orfanoudaki Advisor(s): Dimitris Bertsimas

Personalized Treatment for Coronary Artery Disease Patients: A Machine Learning Approach Current clinical practice guidelines for managing coronary artery disease (CAD) account for general cardiovascular risk factors but do not present a framework that considers personalized, individual patientspecic factors. We developed a datadriven algorithm for personalized CAD management that significantly improves health outcomes relative to the standard of care. We used electronic medical records for 21,617 patients who received treatment for CAD at Boston Medical Center (BMC). Leveraging the patient's medical history and the clinical examination results, we developed predictive and prescriptive algorithms based on machine learning methods that provide personalized treatment recommendations to maximize the time from diagnosis to potential adverse event (myocardial infarction or stroke). We improve the expected time to adverse event after diagnosis by 15.3%, increasing from 73.4 to 84.6 months for the overall population. Our prediction model detects with 81.5% accuracy whether the patient will experience an adverse event due to CAD within a 10year time frame. The algorithm performs particularly well for female (17.5% improvement) and Hispanic (24.8% improvement) subpopulations. We notice a positive shift in the percentage of elderly and young patients assigned to coronary artery bypass graft (CABG) surgery and in the proportion of middleaged patients assigned to percutaneous coronary intervention (PCI). A personalized approach with modern machine learning techniques yielded substantial improvements in the management of CAD relative to the standard of care, providing physicians with an interpretable, accurate, readily implementable and effective tool. 
Elisabeth Paulson Advisor(s): Retsef Levi, Georgia Perakis 
Fresh food: Reducing waste and increasing consumption Elisabeth's work with Retsef and Georgia centers around access to, and consumption of, fresh food. The work thus far focuses on two interconnected problems: (i) Interventions to increase consumption of fruits and vegetables among low income households, and (ii) Reducing farmlevel food waste in order to both improve farmers' economic viability and simultaneously increase the supply of fruits and vegetables to people in need. I n the first stream of work, Elisabeth has done empirical research to better understand the drivers of consumer decisionmaking with respect to food purchasing, and developed a mathematical model to determine a central planner's optimal budget allocation between various interventions, with the goal of maximizing consumers' consumption of fresh food. The second stream of work includes empirical estimation of farmlevel food waste, understanding the behavioral and economic drivers of farmlevel food waste, and developing interventions to reduce waste and increase throughput of fruits and vegetables, particularly to lowincome areas, in a way that is financially viable to the farmer. This ongoing work is in collaboration with Deishin Lee and the Boston Area Gleaners (a nonprofit whose mission is to rescue surplus crop from farms and distribute it to people in need). 
Jean Pauphilet Advisor(s): Dimitris Bertsimas 
Sparse regression: Scalable algorithms and empirical performance In this work, we review stateoftheart methods for feature selection in statistics with an applicationoriented eye. We compare the empirical behaviors of five methods, the cardinalityconstrained formulation, its Boolean relaxation, Lasso regularization and two methods with nonconvex penalties, in various regimes of noise and correlation. In terms of accuracy – the number of correct features selected – we demonstrate that nonconvex formulations should be preferred over Lasso, for they do not require mutual incoherence condition to hold and are generally more accurate. In terms of false detection – the number of incorrect features selected – cardinalityconstrained formulations have a clear edge over both Lasso and nonconvex regularizations. In addition, apart from the discrete optimization approach which requires a substantial, yet often affordable, computational time, all methods terminate in times comparable with the GLMNet package for Lasso. We released code for methods that were not publicly implemented. Jointly considered, accuracy, false detection and computational time provide a comprehensive assessment of each feature selection method and shed light on alternatives to the Lassoregularization which are not as popular in practice yet. 
Nicholas Renegar Advisor(s): Retsef Levi 
Analytic and Network Approaches to Food Safety Regulation Despite extensive regulatory efforts, food safety has remained a significant problem both in the US and abroad. This presentation includes an overview of two pieces of work, developing models and tools to aid in food safety regulatory efforts. The first piece is aimed identifying risky US importers of shrimp, through the understanding that various networks (shipping, product, website) give significant indicators of risky behavior. The second piece of work is aimed at informing domestic food safety regulatory policy in China. Through analytics, applied machine learning, and automated scraping algorithms, we create a comprehensive database of food safety tests from largely independent regulatory agencies, and are able to understand the data in the context of the supply chain. This comprehensive dataset yields several clear recommendations for locationallytargeted food safety testing in China. 
Somya Singhvi Advisor(s): Retsef Levi, Y. Karen Zheng

DataDriven market design to improve efficiency in India’s agricultural markets This research effort is focused on agriculture supply chains and markets in India. The goal of this project is to develop, test and evaluate a range of interventions and decision support tools to improve the efficiency and safety of agriculture supply chains, as well as increasing the welfare of farmers and consumers. The MIT team collaborates with partner organizations on the ground in India to utilize available field data, by employing broad range of expertise, such as predictive analytics and optimization, supply chain management, market design and economics. The team is 1) conducting datadriven analysis to improve the efficiency of Unified Market Platform (UMP) in Karnataka; 2) analyzing the impact of different government interventions (inventory release policies, consumer subsidy schemes etc.) on prices of essential commodities; 3) developing a recommendation tool that identifies optimal agricultural target markets for farmers; 4) developing analytical tools to mitigate risks related to economically motivated adulteration in food supply chains. 
Matthew Sobiesk Advisor(s): Dimitris Bertsimas, Rahul Mazumder

Optimal Classification Trees with Hyperplanes are as Powerful as Classification Neural Networks We investigate the modeling power of neural networks in comparison with optimal classification trees with hyperplanes (OCTHs). We prove that a variety of neural networks can be transformed to decision trees with hyperplanes, showing that OCTHs are at least as powerful as neural networks. Conversely, we show that a given classification OCTH can be transformed to a classification neural network with the perception activation function, showing that these neural networks are at least as powerful as OCTHs, that is, OCTHs and these neural networks are equivalent in terms of modeling power. 
Bradley Sturt Advisor(s): Dimitris Bertsimas 
A Tractable, DataDriven Approach for MultiStage Optimization Stochastic optimization can effectively describe many decision problems where uncertainty is revealed over multiple stages. A central question is how to estimate the joint distribution of the uncertain parameters, in order to find policies with good outofsample average performance. In addition, any such approach must be solvable in practical computation times using offtheshelf software. In this work, we develop a novel datadriven framework for multistage decision making, in which we choose the policies that perform best when the historical data realizations are slightly perturbed. First, we establish asymptotic optimality guarantees for nearoptimal policies, which hold when the uncertain parameters are (arbitrarily) correlated between stages. Second, we develop an approximation algorithm for finding nearoptimal policies in mixedinteger settings based on piecewise linear decision rules. By exploiting duality, these policies are found by solving a compact optimization problem which scales very efficiently with the size of the dataset. We demonstrate the practical impact of the proposed approach via computational examples on an inventory control problem with short lifecycle products. 
Yuchen Wang Advisor(s): Dimitris Bertsimas

Optimal Nonlinear Trees for Regression MARS is a greedy method for constructing a decision tree with nonlinear prediction functions in the leaves. We show that we can formulate the process of building such trees as a global optimization problem, which gives rise to our new method, Optimal Nonlinear Trees. We show in a collection of realworld datasets that our Optimal Nonlinear Trees improve substantially over MARS. Optimal Nonlinear Trees also perform better than treebased boosting methods like xgboost on small size datasets and have comparable performance on larger datasets. 
Holly Wiberg Advisor(s): Dimitris Bertsimas

Interpretable Clustering: An Optimal Trees Approach Stateoftheart clustering algorithms use heuristics to partition the feature vectors while providing little insight regarding the interpretability of each cluster. We present a new Optimal Clustering Trees algorithm that leverages mixedinteger optimization (MIO) techniques to generate interpretable and globally optimal clustering tree models. We demonstrate that our method achieves superior performance on both synthetic and realworld datasets and scales even when the number of observations is high. Joint work with Agni Orfanoudaki. 
Julia Yan Advisor(s): Dimitris Bertsimas 
DataDriven Transit Network Design at Scale Transit authorities across the country have been redesigning their bus networks in an attempt to boost ridership. However, their analysis appears to rely on adhoc methods, for example, considering each bus line in isolation and using manual incremental adjustments with backtracking. We provide a holistic approach to optimally design a transit network using columnandconstraint generation. Our approach is tractable, scaling to hundreds of stops, and we demonstrate its usefulness on a case study with real data from Boston. 
Ilias Zadik Advisor(s): David Gamarnik 
High Dimensional Linear Regression using Lattice Basis Reduction In this talk, we focus on the high dimensional linear regression problem where the goal is to efficiently recover the unknown vector of coefficients from noisy linear observations of the features of the model. Unlike most of the literature on this model we make no sparsity assumption on the coefficients of the model. Instead we adopt a regularization based on assuming that the underlying coefficients have rational entries with the same denominator Q. We call this the Qrationality assumption. We propose a new polynomialtime algorithm for this task which is based on the seminal LenstraLenstraLovasz (LLL) lattice basis reduction algorithm. We establish that under the Qrationality assumption, our algorithm recovers exactly the vector of coefficients for a large class of distributions for the feature matrix and nonzero noise level. We prove that it is successful under small noise, even when the learner has access to only one observation. Furthermore, we prove that in many parameter regimes our algorithm tolerates a nearly optimal informationtheoretic level of the noise. Joint work with David Gamarnik. 
Renbo Zhao Advisor(s): Robert Freund 
PrimalDual Algorithms for Doubly Stochastic ThreeComposite Bilinear Saddle Point Problems We develop primaldual firstorder algorithms for a class of doubly stochastic threecomposite bilinear saddle point problems. We consider both convexconcave and stronglyconvexconcave settings. In the former setting, the (firstorder) oracle complexity of our method not only improves upon the existing methods, but also matches a lower bound derived for all firstorder methods that solve this problem. In the latter setting, by leveraging the primal strong convexity, we design a restart scheme that significantly improves the oracle complexity as compared to the convexconcave case. In addition, we investigate the problem under primal relative smoothness and primal relative strong convexity, and develop and analyze algorithms for these settings accordingly. 