Data Mining Papers

Information Retrieval and Set Completion

Benjamin Letham, Cynthia Rudin and Katherine Heller.

Growing a List.Data Mining and Knowledge Discovery (ECML-PKDD journal track).

bib

Featured on Boston Public Radio (WGBH) "A New Way To Google"Meeting Analysis

Been Kim and Cynthia Rudin.

Learning About Meetings.Accepted, Data Mining and Knowledge Discovery

bib

Shorter version:

Been Kim and Cynthia Rudin.Machine Learning for Meeting Analysisin AAAI 2013 Late Breaking Track

bib

This paper was the topic of several popular press articlesCrime Pattern Detection

Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri.

Learning to Detect Patterns of Crime.Proceedings of ECML, 2013.

bib

Shorter Version:

Tong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri.Detecting Patterns of Crime with Series Finder.Proceedings of AAAI Late Breaking Track, 2013.

bib

This paper was the topic of several popular press articlesJonathan Huggins and Cynthia Rudin.

A Statistical Learning Theory Framework for Supervised Pattern Discovery.Proceedings of SIAM Conference on Data Mining (SDM) 2014

bibTong Wang, Cynthia Rudin, Daniel Wagner, and Rich Sevieri.

Finding Patterns with a Rotten Core: Data Mining for Crime Series with Core Sets.Technical Report 2014.

Collective Intelligence

Seyda Ertekin, Haym Hirsh, Cynthia Rudin.

Approximating the Wisdom of the Crowd.Proceedings of the Second Workshop on Computational Social Science and the Wisdom of Crowds (NIPS 2011).

bibHaym's SlidesSeyda Ertekin, Haym Hirsh, Cynthia Rudin.

Learning to Predict the Wisdom of Crowds.Proceedings of Collective Intelligence, 2011.

bib

Longer Version

Energy Grid Projects

Cynthia Rudin, David Waltz, Roger N. Anderson, Albert Boulanger, Ansaf Salleb-Aouissi, Maggie Chow, Haimonti Dutta, Philip Gross, Bert Huang, Steve Ierome, Delfine Isaac, Arthur Kressner, Rebecca J. Passonneau, Axinia Radeva, Leon Wu.

Machine Learning for the New York City Power Grid.IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No 2. February 2012.(Spotlight Paper for the February 2012 Issue.)

bibCynthia Rudin, Rebecca Passonneau, Axinia Radeva, Haimonti Dutta, Steve Ierome, Delfina Isaac.

A Process for Predicting Manhole Events in Manhattan.Machine Learning, Volume 80, pages 1-31, 2010.

bibCynthia Rudin, Rebecca Passonneau, Axinia Radeva, Steve Ierome, Delfina Isaac.

21st-Century Data Miners Meet 19th-Century Electrical Cables.IEEE Computer, volume 44 no. 6, pages 103-105, June 2011.(One of three articles featured on the cover.)

bibCynthia Rudin, Seyda Ertekin, Rebecca Passonneau, Axinia Radeva, Ashish Tomar, Boyi Xie, Stanley Lewis, Mark Riddle, Debbie Pangsrivinij, Tyler McCormick.

Analytics for Power Grid Distribution Reliability in New York City.Interfaces, Accepted, 2014.

bibSeyda Ertekin, Cynthia Rudin, and Tyler McCormick.

Predicting Power Failures with Reactive Point Processes.Proceedings of AAAI Late Breaking Track, 2013.

bib

Longer Version

supplement pdfBoyi Xie, Rebecca J. Passonneau, Haimonti Dutta, Jing-Yeu Miaw, Axinia Radeva, Ashish Tomar, and Cynthia Rudin.

Progressive Clustering with Learned Seeds: An Event Categorization System for Power GridProceedings of the 24th International Conference on Software Engineering & Knowledge Engineering (SEKE), pages 100-105, 2012.

bib

Rebecca Passonneau, Cynthia Rudin, Axinia Radeva, Ashish Tomar and Boyi Xie.

Treatment Effect of Repairs to an Electrical Grid: Leveraging a Machine Learned Model of Structure Vulnerability.Proceedings of the KDD Workshop on Data Mining Applications in Sustainability (SustKDD), 17th Annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2011.

bibRebecca Passonneau, Cynthia Rudin, Axinia Radeva, Zhi An Liu.

Reducing Noise in Labels and Features for a Real World Dataset: Application of NLP Corpus Annotation Methods.Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing, 2009.

bibAxinia Radeva, Cynthia Rudin, Rebecca Passonneau and Delfina Isaac.

Report Cards for Manholes: Eliciting Expert Feedback for a Machine Learning Task.Proceedings of the International Conference on Machine Learning and Applications, 2009.(Winner of Best Poster Award.)

bibHaimonti Dutta, Cynthia Rudin, Becky Passonneau, Fred Seibel, Nandini Bhardwaj, Axinia Radeva, Zhi An Liu, Steve Ierome, Delfina Isaac.

Visualization of Manhole and Precursor-Type Events for the Manhattan Electrical Distribution System.Workshop on GeoVisualization of Dynamics, Movement and Change, 11th AGILE International Conference on Geographic Information Science, 2008.

bibLeon Wu, Timothy Teravainen, Gail Kaiser, Roger Anderson, Albert Boulanger, and Cynthia Rudin.

Estimation of System Reliability Using a Semiparametric Model.Proceedings of IEEE EnergyTech, 2011.

bibLink

Leon Wu, Gail Kaiser, Cynthia Rudin, David Waltz, Roger Anderson, Albert Boulanger, Ansaf Salleb-Aouissi, Haimonti Dutta, and Manoj Poolery.

Evaluating Machine Learning for Improving Power Grid Reliability.Proceedings of the ICML 2011 workshop on "Machine Learning for Global Challenges," International Conference on Machine Learning, 2011.

bib

Leon Wu, Gail Kaiser, Cynthia Rudin, Roger Anderson.

Data Quality Assurance and Performance Measurement of Data Mining for Preventive Maintenance of Power Grid.Proceedings of the KDD Workshop on Data Mining for Service and Maintenance (KDD4Service), 17th Annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2011.

bibThis project is the winner of the 2013 INFORMS Innovative Applications in Analytics Award.

Papular press articles about this topic:

- Energy Daily article: MIT Sloan Professor's Ranking of Manholes Prioritizes Repairs and Maintenance

- Science News article: Machine vs. Manhole, appearing also in U.S. News and World Report, WIRED Science, Slashdot, Discovery News / Discovery Channel

- CIO Magazine: "Don't blow your top," Finish section, Sept 1 issue, 2010

- Featured in bookBig Data: The Data Revolutionby Viktor Mayer-Schonberger and Kenneth Cukier

Machine Learning and Decision Making

Cynthia Rudin and Gah-Yi Vahn.The Big Data Newsvendor: Practical Insights from Machine Learning Analysis.

Working Paper.

Theja Tulabandhula and Cynthia Rudin.

Machine Learning with Operational Costs.Journal of Machine Learning Research, Volume 14, pages 1989-2028, 2013.

bib

Shorter version:

Theja Tulabandhula and Cynthia Rudin.The Influence of Operational Cost on Estimation.Proceedings of the International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2012.

bib

Theja Tulabandhula and Cynthia Rudin.

On Combining Machine Learning with Decision Making.Submitted, 2014.Theja Tulabandhula and Cynthia Rudin.

Robust Optimization using Machine Learning for Uncertainty Sets.Proceedings of the International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2014.

bibTheja Tulabandhula and Cynthia Rudin.

Generalization Bounds for Learning with Linear and Quadratic Side Knowledge.Proceedings of the International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2014.

bib

Theja Tulabandhula, Cynthia Rudin, Patrick Jaillet.

The Machine Learning and Traveling Repairman Problem.Proceedings of the Second International Conference on Algorithmic Decision Theory (ADT), 2011.

bib

Longer Version

bibInterpretable Modeling and Modeling with Rules

Benjamin Letham, Cynthia Rudin, Tyler McCormick and David Madigan.

Building Interpretable Classifiers with Rules using Bayesian Analysis.Available as tech report:

bibpython code (zip)

- Winner of Data Mining Best Student Paper Award, INFORMS 2013.

- Winner of Student Paper Competition sponsored by the Statistical Learning and Data Mining section (SLDM) of the American Statistical Association, 2014.

Shorter Version:

Benjamin Letham, Cynthia Rudin, Tyler McCormick and David Madigan.An Interpretable Stroke Prediction Model using Rules and Bayesian Analysis.Proceedings of AAAI Late Breaking Track, 2013.

bibBerk Ustun, Stefano Traca, and Cynthia Rudin.

Supersparse Linear Integer Models for Interpretable Classification.Available as tech report:

bibShorter Version:

Berk Ustun, Stefano Traca, and Cynthia Rudin.Supersparse Linear Integer Models for Predictive Scoring Systems.Proceedings of AAAI Late Breaking Track, 2013.

bibSiong Thye Goh and Cynthia Rudin.

Box Drawings for Learning with Imbalanced Data.ArXiv, 2014.

Fulton Wang, Tyler McCormick, and Cynthia Rudin.

Modeling Recovery Curves With Application to Prostatectomy.Tech report, 2014.

Allison Chang, Cynthia Rudin, and Dimitris Bertsimas.

Ordered Rules for Classification: A Discrete Optimization Approach to Associative Classification.Available on DSPACE here: OR 386-11

bibCynthia Rudin, Benjamin Letham and David Madigan.

Learning Theory Analysis for Association Rules and Sequential Event PredictionJournal of Machine Learning Research (JMLR), Volume 14, pages 3385-3436, 2013

bib

Shorter Version:

Cynthia Rudin, Benjamin Letham, Ansaf Salleb-Aouissi, Eugene Kogan and David Madigan.Sequential Event Prediction with Association Rules.Proceedings of the 24th Annual Conference on Learning Theory (COLT), 2011.

bib

Tyler McCormick, Cynthia Rudin, David Madigan.

A Hierarchical Model for Association Rule Mining of Sequential Events: An Approach to Automated Medical Symptom Prediction.Annals of Applied Statistics, 2012.

bib

Supervised Ranking

Benjamin Letham, Cynthia Rudin, and David Madigan.Sequential Event Prediction.Machine Learning, Volume 93, pages 357-380, 2013.

bib

Allison Chang, Cynthia Rudin, Michael Cavaretta, Robert Thomas and Gloria Chou.

How to Reverse-Engineer Quality Rankings.

Machine Learning. Volume 88, Issue 3, pp 369-398, September 2012.

bibPapular press article about this topic:

- Businessweek article: How to Improve Product Rankings

- MIT Sloan Experts blog article: Product quality ratings: New research shows secret formulas yield questionable resultsSeyda Ertekin and Cynthia Rudin.

On Equivalence Relationships Between Classification and Ranking Algorithms.Journal of Machine Learning Research, Volume 12, pages 2905-2929, 2011.

bibDimitris Bertsimas, Allison Chang, Cynthia Rudin.

A Discrete Optimization Approach to Supervised Ranking.Proceedings of the 5th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2010), 2010.

bib

Longer Version on DSPACE, paper OR 388-11 here.Finalist for Data Mining Best Student Paper Award, INFORMS 2011.

bibCynthia Rudin.

The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List.Journal of Machine Learning Research, Volume 10, pages 2233-2271, 2009.

bibCynthia Rudin.

Ranking with a P-Norm Push.Proceedings of the Nineteenth Annual Conference on Computational Learning Theory (COLT), pages 589-604, 2006.

bibHeng Ji, Cynthia Rudin, Ralph Grishman.

Re-Ranking Algorithms for Name Tagging.In Proc. Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting (HLT-NAACL) Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, 2006.

bibCynthia Rudin and Robert E. Schapire.

Margin-Based Ranking and an Equivalence Between AdaBoost and RankBoost.Journal of Machine Learning Research, Volume 10, pages 2193-2232, 2009.

bib

Cynthia Rudin, Corinna Cortes, Mehryar Mohri, Robert E. Schapire.

Margin-Based Ranking and Boosting Meet in the Middle.Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT), pages 63-78, 2005.

bibConvergence of Boosting Algorithms

Cynthia Rudin, Ingrid Daubechies, Robert E. Schapire.

Does AdaBoost Always Cycle?JMLR: Workshop and Conference Proceedings, Published as a COLT Open Problem.

bibIndraneel Mukherjee, Cynthia Rudin, and Robert Schapire.

The Rate of Convergence of AdaBoost.Proceedings of the Twenty-fourth Annual Conference on Learning Theory (COLT), 2011.

bibIndraneel Mukherjee, Cynthia Rudin, and Robert Schapire.

The Rate of Convergence of AdaBoost.(Longer version of COLT paper) Volume 14, pages 2315-2347, August 2013.

LinkbibCynthia Rudin, Ingrid Daubechies, Robert E. Schapire.

The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins.Journal of Machine Learning Research, 5 (Dec): 1557-1595, 2004.

bibCynthia Rudin, Ingrid Daubechies, and Rob Schapire.

On the Dynamics of Boosting.Advances in Neural Information Processing Systems (NIPS) 16, 2003.

bibCynthia Rudin, Robert E. Schapire, Ingrid Daubechies.

Analysis of Boosting Algorithms using the Smooth Margin Function.Annals of Statistics, Volume 35, Number 6, pages 2723-2768, 2007.

bibCynthia Rudin, Robert E. Schapire, and Ingrid Daubechies.

Precise Statements of Convergence for AdaBoost and arc-gv.In Proc. AMS-IMS-SIAM Joint Summer Research Conference: Machine Learning, Statistics, and Discovery 131-145, 2007.

bibCynthia Rudin, Robert E. Schapire, and Ingrid Daubechies.

Boosting Based on a Smooth Margin.Proceedings of the Seventeenth Annual Conference on Computational Learning Theory (COLT), 2004.

bibOther Papers

Cynthia Rudin and Kiri L. Wagstaff.

Machine Learning for Science and Society.Machine Learning, 2013.

Cynthia Rudin.

Teaching "Prediction: Machine Learning and Statistics."Proceedings of the ICML Workshop on Teaching ML, 2012.

bibRyan Roth, Owen Rambow, Nizar Habash, Mona Diab, and Cynthia Rudin.

Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking.The 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL/HLT), 2008.

bibCynthia Rudin.

Stability of Learning algorithms.Computer Science ArXiV.

Cynthia Rudin and Brian Spencer.

Equilibrium Island Arrays in Strained Solid Films.Journal of Applied Physics, November 15, 1999 - Volume 86, Issue 10, pages 5530-5536.

Edited Collections

Eds. Peter Qian, Yilu Zhou, and Cynthia Rudin. Proceedings of the 2011 INFORMS Data Mining and Health Informatics (DM-HI) Workshop