Antonio Torralba


Computer Science and Artificial Intelligence Laboratory
Dept. of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Office: 32-D462
Address: 32 Vassar Street, Cambridge, MA 02139
Assistant: Fern Deolivera

My research is in the areas of computer vision, machine learning and human visual perception. I am interested in building systems that can perceive the world like humans do. Although my work focuses on computer vision I am also interested in other modalities such as audition and touch. A system able to perceive the world through multiple senses might be able to learn without requiring massive curated datasets. Other interests include understanding neural networks, common-sense reasoning, computational photography, building image databases, ..., and the intersections between visual art and computation.


Network dissection: Quantifying Interpretability of Deep Visual Representations, CVPR 2017 paper, and Code release. Also related to: Object Detectors Emerge in Deep Scene CNNs.

Auditory scene analysis: using vision to teach audition. NIPS paper by Yusuf and Carl. Check also Andrew's ECCV paper on using audition to teach vision.

ADE20K dataset. 22.210 fully annotated images with objects and many with parts. A scene parsing challenge is being held jointly with ILSVRC'16. Winners will be invited to present at ILSVRC and COCO joint workshop at ECCV 2016. Check the scene parsing challenge website.

Multimodal scene recognition. The data for this work has thousands of linedrawings and textual descriptions of scenes, done by AMT workers. The dataset is organized with the same categories as the Places database.

Aligning books and movies. Learning to see and read by watching movies and reading books. Check also the MovieQA dataset: MovieQA: Story Understanding Benchmark.

Gaze following demo, and dataset. It follows the gaze of the people inside a picture or video and predicts what are they looking. In this video, frames are first processed independently and then the output is smoothed temporaly.

Places database and scene recognition demo. More details about the demo appear in: "Learning Deep Features for Scene Recognition using Places Database," B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014 (pdf). The Places database has two releases: Places release 1, contains 205 scene categories and 2,5 million of images. Places release 2, contains 400 scene categories and 10 million of images. Pre-trained models available here.

Check the LabelMe App for iPhone and iPad. The app connects with your LabelMe account online and allows you to take pictures and label them on the device. You can then recover the images and anotations with the LabelMe matlab toolbox. Developed by Josep Marc Mingot Hidalgo, Dolores Blanco, Aina Torralba, David Way and Antonio Torralba.

Cool news

Late show with Stephen Colbert on the work by Carl and Hamed, Anticipating Visual Representations from Unlabeled Video. CVPR 2016.

The Marilyn Monroe/Albert Einstein hybrid image by Aude Oliva on BBC.

German TV science show on accidental cameras. Details about accidental cameras and some of our videos are available here.

Lab Members

Adrià Recasens (Grad. student), Bolei Zhou (Grad. student), David Bau (Grad. student), Hang Zhao (Grad. student), Javier Marin (Postdoc), Xavier Puig Fernandez (Grad. student), Yunzhu Li (Grad. student)

Past students and postdocs

Carl Vondrick (Graduated 2017), Yusuf Aytar (Post-doctoral Fellow) Andrew Owens (Graduated 2016), Aditya Khosla (Graduated 2016), Agata Lapedriza (Visiting professor, UOC), Joseph J. Lim (Graduated 2015), Lluis Castrejon (Visiting student, 2015), Hamed Pirsiavash (Post-doctoral Fellow), Zoya Gavrilov (Grad. Student). Josep Marc Mingot Hidalgo (Visiting student), Tomasz Malisiewicz (Post-doctoral Fellow), Jianxiong Xiao (Graduated 2013), Dolores Blanco Almazan (Visiting student, 2012), Biliana Kaneva (Graduated 2011), Jenny Yuen (Graduated 2011), Tilke Judd (Graduated 2011) Myung "Jin" Choi (Graduated 2011), James Hays (Post-doctoral Fellow), Hector J.Bernal (Visiting student), Gunhee Kim (Visiting student), Bryan C. Russell (Graduated 2008).


Places database. The Places database contains 205 scene categories and 2,5 millions of images.

3D IKEA dataset. Dataset for IKEA 3D models and aligned images. J. Lim, H. Pirsiavash, and A.Torralba. ICCV 2013.

SUN Database. Scene UNderstanding Database. A database for scene recognition (900 scene categories) and multiclass object detection (>15000 fully segmented images). Xiao et al, CVPR 2010. (pdf)

360-SUN Database. A database of 360 degrees panoramas organized along the SUN categories. Xiao et al, CVPR 2012. (pdf)

Out of context objects. The database contains 218 fully annotated images with at least one object out-of-context. Can you detect the out of context object? Project page

LabelMe: the open annotation tool. Explore the online query tool, Matlab toolbox, Wordnet hierarchy, and the 3D LabelMe toolbox

LabelMe video: Jenny Yuen et al, ICCV 09. (pdf)

80 Million tiny images: explore a dense sampling of the visual world Antonio Torralba, Rob Fergus, William T. Freeman

Indoor Scene Recognition Database: 67 indoor scene categories. A. Quattoni, and A.Torralba. CVPR 2009.


Scene Understanding Symposium (SUnS) Aude Oliva, Thomas Serre, Antonio Torralba 2006, 2007, 2008, 2009, 2011.

The context challenge: How far can you go before having to run an object detector?

Gallery: A selection of some of the images that I like the most resulting from the research.


Gist, scene recognition.

A simple object detector with boosting.

Eye movements and attention.

SIFT Flow.

HOGgles: Visualizing Object Detection Features.

Talks and tutorials

Course on Recognizing and Learning Object Categories Li Fei-Fei, Rob Fergus, Antonio Torralba. ICCV 2005, CVPR 2007, ICCV 2009.