Antonio Torralba


Computer Science and Artificial Intelligence Laboratory
Dept. of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Office: 32-D462
Address: 32 Vassar Street, Cambridge, MA 02139
Assistant: Fern Deolivera

My research is in the areas of computer vision, machine learning and human visual perception. I am interested in building systems that can perceive the world like humans do. Although my work focuses on computer vision I am also interested in other modalities such as audition and touch. A system able to perceive the world through multiple senses might be able to learn without requiring massive curated datasets. Other interests include understanding neural networks, common-sense reasoning, computational photography, building image databases, ..., and the intersections between visual art and computation.

Lab Members

Jonas Wulff

Jun-Yan Zhu

Adrià Recasens
Grad. student

Bolei Zhou
Grad. student

David Bau
Grad. student

Hang Zhao
Grad. student

Manel Baradad
Grad. student

Shuang Li
Grad. student

Wei-Chiu Ma
Grad. student

Xavier Puig Fernandez
Grad. student

Yunzhu Li
Grad. student

Past students and postdocs

Carl Vondrick (Graduated 2017), Javier Marin (Postdoc), Yusuf Aytar (Postdoc) Andrew Owens (Graduated 2016), Aditya Khosla (Graduated 2016), Agata Lapedriza (Visiting professor, UOC), Joseph J. Lim (Graduated 2015), Lluis Castrejon (Visiting student, 2015), Hamed Pirsiavash (Postdoc), Zoya Gavrilov (Grad. Student). Josep Marc Mingot Hidalgo (Visiting student), Tomasz Malisiewicz (Postdoc), Jianxiong Xiao (Graduated 2013), Dolores Blanco Almazan (Visiting student, 2012), Biliana Kaneva (Graduated 2011), Jenny Yuen (Graduated 2011), Tilke Judd (Graduated 2011) Myung "Jin" Choi (Graduated 2011), James Hays (Postdoc), Hector J.Bernal (Visiting student), Gunhee Kim (Visiting student), Bryan C. Russell (Graduated 2008).


MIT Quest for intelligence: I have been named inaugural director of the MIT Quest for Intelligence. The Quest is a campus-wide initiative to discover the foundations of intelligence and to drive the development of technological tools that can positively influence virtually every aspect of society.

Network dissection: Quantifying Interpretability of Deep Visual Representations, CVPR 2017 paper, and Code release. Also related to: Object Detectors Emerge in Deep Scene CNNs.

Auditory scene analysis: using vision to teach audition. NIPS paper by Yusuf and Carl. Check also Andrew's ECCV paper on using audition to teach vision.

Multimodal scene recognition. The data for this work has thousands of linedrawings and textual descriptions of scenes, done by AMT workers. The dataset is organized with the same categories as the Places database.

Aligning books and movies. Learning to see and read by watching movies and reading books. Check also the MovieQA dataset: MovieQA: Story Understanding Benchmark.

Gaze following demo, and dataset. It follows the gaze of the people inside a picture or video and predicts what are they looking. In this video, frames are first processed independently and then the output is smoothed temporaly.

Places database and scene recognition demo. More details about the demo appear in: "Learning Deep Features for Scene Recognition using Places Database," B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014 (pdf). The Places database has two releases: Places release 1, contains 205 scene categories and 2,5 million of images. Places release 2, contains 400 scene categories and 10 million of images. Pre-trained models available here.

Check the LabelMe App for iPhone and iPad. The app connects with your LabelMe account online and allows you to take pictures and label them on the device. You can then recover the images and anotations with the LabelMe matlab toolbox. Developed by Josep Marc Mingot Hidalgo, Dolores Blanco, Aina Torralba, David Way and Antonio Torralba.

Cool news

Late show with Stephen Colbert on the work by Carl and Hamed, Anticipating Visual Representations from Unlabeled Video. CVPR 2016.

The Marilyn Monroe/Albert Einstein hybrid image by Aude Oliva on BBC.

German TV science show on accidental cameras. Details about accidental cameras and some of our videos are available here.


ADE20K dataset. 22.210 fully annotated images with objects and many with parts. Check the scene parsing challenge website.

Places database. The database contains more than 10 million images comprising 400+ scene categories. The dataset features 5000 to 30,000 training images per class.

360-SUN Database. A database of 360 degrees panoramas organized along the SUN categories. Xiao et al, CVPR 2012. (pdf)

CMPlaces. CMPlaces is designed to train and evaluate cross-modal scene recognition models. It covers five different modalities: natural images, sketches, clip-art, text descriptions, and spatial text images. (pdf)

Out of context objects. The database contains 218 fully annotated images with at least one object out-of-context. Can you detect the out of context object? Project page

3D IKEA dataset. Dataset for IKEA 3D models and aligned images. J. Lim, H. Pirsiavash, and A.Torralba. ICCV 2013.

80 Million tiny images: explore a dense sampling of the visual world. A portion of this dataset was used to create the CIFAR datasets. By the way, since the web page went online, we have been collected anotations for a portion of the dataset. We haven't used for anything yet, but you can download them here and here. The annotations has all the users' votes, as {1,0,-1} corresponding to {correct, undefined, incorrect}. A very simple visualization of the annotations is available here.

Indoor Scene Recognition Database: 67 indoor scene categories. A. Quattoni, and A.Torralba. CVPR 2009.