9.520: Statistical Learning Theory and Applications, Spring 2012

Class Times: Monday and Wednesday 10:30-12:00

Units: 3-0-9 H,G

Location: 46-5193

Instructors:
Tomaso Poggio (TP), Lorenzo Rosasco (LR), Charlie Frogner (CF), Guille D. Canas (GJ)

Office Hours: Friday 1-2 pm in 46-5156, CBCL lounge (by appointment)

Email Contact : 9.520@mit.edu

Previous Class: SPRING 11

Course Description

Prerequisites

Grading

Scribe Notes

Problem Sets

Projects

Syllabus

Reading List

Course description

The class introduces the theory and algorithms of computational learning in the framework of statistics and functional analysis. It gives an in-depth discussion of state of the art machine learning algorithms, for regression and classification, variable selection, manifold learning and transfer learning. The class focuses on the unifying role of regularization.

Many problems in applied science are inverse problems and most inverse problems are ill-posed: the solution does not satisfy the basic requirement of existence, uniqueness and stability. As it turns out most sensory problems are inverse and ill-posed problems. In a sense, intelligence is the ability of solving effectively inverse problems. Probably the most interesting inverse and ill-posed problem -- and the one which is at the very core of intelligence -- is the problem of learning from experience.

The theory and algorithms of regularization provide principled ways to solve ill-posed problems and restore well-posedness. Not surprisingly, most of the successful machine learning algorithms, such as MobilEye's vision system for cars and new system for intelligent assistants, are based on regularization techniques.

The goal of this class is to provide students with the knowledge needed to use and develop effective computational learning solutions to challenging problems.

Prerequisites
We will make extensive use of linear algebra, basic functional analysis (we cover the essentials in class and during the math-camp), basic concepts in probability theory and concentration of measure (also covered in class and during the mathcamp). Students are expected to be familiar with Matlab.
Grading
Requirements for grading (other than attending lectures) are: scribing one lecture, 2 problems sets, and a final project.

Scribe Notes

In this class, there will be three to five unscribed lectures; of the remaining lectures, new scribe notes for classes #5,6,9,11 will be created, while those of lectures #2 - #8, #10, #12, and lectures #14 - #18 will be edited from existing notes. Each student taking the class for credit will be required to work on improving and updating, or creating the scribe notes of one lecture. Scribe notes should be a natural integration of the presentation of the lectures with the material in the slides. The lecture slides are available on this website for your reference. Good scribe notes are important both for your grades, and for other students to read. In particular, please make an effort to present the material in a clear, concise, and comprehensive manner.

Scribe notes must be prepared with Latex, using the provided template. Scribe notes (.tex file and all additional files) should be submitted to 9.520@mit.edu no later than one week after the class. Please make sure to proofread the notes carefully before submitting. We will review the scribe notes to check the technical content and quality of writing. We will also give feedback and ask for a revised version if necessary. Completed scribe notes will be posted on this website as soon as possible.

You can sign up here to scribe a lecture. If you have problems opening or editing the page, please send us an email at 9.520@mit.edu. In addition, if you have any questions of concerns about the scribing requirement, please feel free to send us an email.

Problem Sets
Problem set #1: due Monday March 19st. Data
Problem set #2: due 4-25-2012. Data

Projects
Project abstract submission due Monday April 2nd.
Final project due Friday May 18st.

The final project can be either a wikipedia entry or a research project (we recommend a Wikipedia entry).
We envision 2 kinds of research project:

Applications: evaluate an algorithm on some interesting problem of your choice;
Theory and Algorithms: study theoretically or empirically some new machine learning algorithm/problem.
For the Wikipedia article, we suggest a short one using the Wikipedia standard article format; for the research project you should use this template. Reports should be 8 pages maximum, including references. Additional material can be included in the appendix.

Updated project suggestions: Spring 2012 projects

Syllabus

Follow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Parameter

Date Title Instructor(s)

Class 01 Wed 08 Feb The Course at a Glance TP

Class 02 Mon 13 Feb The Learning Problem and Regularization TP

Class 03 Wed 15 Feb Reproducing Kernel Hilbert Spaces LR

Mon 20 Feb - President's Day
Class 04 Tues 21 Feb Mercer Theorem and Feature Maps LR

Class 05 Wed 22 Feb Tikhonov Regularization and the Representer Theorem LR
Class 06 Mon 27 Feb Regularized Least Squares and Support Vector Machines LR

Class 07 Wed 29 Feb Generalization Bounds, Intro to Stability LR

Class 08 Mon 05 Mar Stability of Tikhonov Regularization LR

Class 09 Wed 07 Mar Regularization Parameter Choice: Theory and Practice LR

Class 10 Mon 12 Mar Bayesian Interpretations of Regularization CF

Class 11 Wed 14 Mar Nonparametric Bayesian methods CF

Class 12 Mon 19 Mar Spectral Regularization LR

Class 13 Wed 21 Mar Loose ends, Project discussions

Mon 26 - Fri 30 Mar - Spring break

Class 14 Mon 02 Apr Manifold Regularization LR

Class 15 Wed 04 Apr Sparsity Based Regularization LR

Class 16 Mon 09 Apr Regularization with Multiple Kernels LR

Class 17 Wed 11 Apr Regularization for Multi-Output Learning LR

Mon 16 Apr - Patriot's day

Class 18 Wed 18 Apr On-line Learning LR

Class 19 Mon 23 Apr Hierarchical Representation for Learning: Visual Cortex TP

Class 20 Wed 25 Apr Hierarchical Representation for Learning: Mathematics LR

Class 21 Mon 30 Apr Hierarchical Representation for Learning: Computational Model TP

Class 22 Wed 02 May Learning Data Representation with Regularization GC/LR

Class 23 Mon 07 May TBA

Class 24 Wed 09 May Machine Learning for Humanoid Robotics Giorgio Metta - IIT

Class 25 Mon 14 May Project Presentations

Class 26 Wed 16 May Project Presentations

Math Camp Mon 13 Feb (7-9pm) Functional analysis: slides, notes. Probability theory: notes

Old Math Camp Slides XX Functional analysis

Old Math Camp Slides XX Probability theory

Reading List
There is no textbook for this course. All the required information will be presented in the slides associated with each class. The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of suggested readings will also be provided separately for each class.
Primary References

Bousquet, O., S. Boucheron and G. Lugosi. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning Lecture Notes in Artificial Intelligence 3176, 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, Heidelberg, Germany (2004)

F. Cucker and S. Smale. On The Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 2002.

F. Cucker and D-X. Zhou. Learning theory: an approximation theory viewpoint. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2007.

L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1997.

T. Evgeniou and M. Pontil and T. Poggio. Regularization Networks and Support Vector Machines. Advances in Computational Mathematics, 2000.

T. Poggio and S. Smale. The Mathematics of Learning: Dealing with Data. Notices of the AMS, 2003

I. Steinwart and A. Christmann. Support vector machines. Springer, New York, 2008.

V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.

V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

N. Cristianini and J. Shawe-Taylor. Introduction To Support Vector Machines. Cambridge, 2000.

Background Mathematics References

A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, Dover Publications, 1975.

A. N. Kolmogorov and S. V. Fomin, Elements of the Theory of Functions and Functional Analysis, Dover Publications, 1999.

Luenberger, Optimization by Vector Space Methods, Wiley, 1969.

Neuroscience Related References

Serre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. "Object Recognition with Cortex-like Mechanisms", IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.

Serre, T., A. Oliva and T. Poggio."A Feedforward Architecture Accounts for Rapid Categorization", Proceedings of the National Academy of Sciences (PNAS), Vol. 104, No. 15, 6424-6429, 2007.

S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio. "Mathematics of the Neural Response", Foundations of Computational Mathematics, Vol. 10, 1, 67-91, June 2009.

Class Times:	Monday and Wednesday 10:30-12:00
Units:	3-0-9 H,G
Location:	46-5193
Instructors:	Tomaso Poggio (TP), Lorenzo Rosasco (LR), Charlie Frogner (CF), Guille D. Canas (GJ)
Office Hours:	Friday 1-2 pm in 46-5156, CBCL lounge (by appointment)
Email Contact :	9.520@mit.edu
Previous Class:	SPRING 11

	Date	Title	Instructor(s)
Class 01	Wed 08 Feb	The Course at a Glance	TP
Class 02	Mon 13 Feb	The Learning Problem and Regularization	TP
Class 03	Wed 15 Feb	Reproducing Kernel Hilbert Spaces	LR
Mon 20 Feb - President's Day
Class 04	Tues 21 Feb	Mercer Theorem and Feature Maps	LR
Class 05	Wed 22 Feb	Tikhonov Regularization and the Representer Theorem	LR
Class 06	Mon 27 Feb	Regularized Least Squares and Support Vector Machines	LR
Class 07	Wed 29 Feb	Generalization Bounds, Intro to Stability	LR
Class 08	Mon 05 Mar	Stability of Tikhonov Regularization	LR
Class 09	Wed 07 Mar	Regularization Parameter Choice: Theory and Practice	LR
Class 10	Mon 12 Mar	Bayesian Interpretations of Regularization	CF
Class 11	Wed 14 Mar	Nonparametric Bayesian methods	CF
Class 12	Mon 19 Mar	Spectral Regularization	LR
Class 13	Wed 21 Mar	Loose ends, Project discussions
Mon 26 - Fri 30 Mar - Spring break
Class 14	Mon 02 Apr	Manifold Regularization	LR
Class 15	Wed 04 Apr	Sparsity Based Regularization	LR
Class 16	Mon 09 Apr	Regularization with Multiple Kernels	LR
Class 17	Wed 11 Apr	Regularization for Multi-Output Learning	LR
Mon 16 Apr - Patriot's day
Class 18	Wed 18 Apr	On-line Learning	LR
Class 19	Mon 23 Apr	Hierarchical Representation for Learning: Visual Cortex	TP
Class 20	Wed 25 Apr	Hierarchical Representation for Learning: Mathematics	LR
Class 21	Mon 30 Apr	Hierarchical Representation for Learning: Computational Model	TP
Class 22	Wed 02 May	Learning Data Representation with Regularization	GC/LR
Class 23	Mon 07 May	TBA
Class 24	Wed 09 May	Machine Learning for Humanoid Robotics	Giorgio Metta - IIT
Class 25	Mon 14 May	Project Presentations
Class 26	Wed 16 May	Project Presentations

Math Camp	Mon 13 Feb (7-9pm)	Functional analysis: slides, notes. Probability theory: notes
Old Math Camp Slides	XX	Functional analysis
Old Math Camp Slides	XX	Probability theory