Antonio Torralba


Computer Science and Artificial Intelligence Laboratory
Dept. of Electrical Engineering and Computer Science
Massachusetts Institute of Technology

Assistant: Fern Deolivera
Office: 32-D530
Address: 32 Vassar Street, Cambridge, MA 02139

My research is in the areas of computer vision, machine learning and human visual perception. I am interested in scene and object recognition, among other things.

Scene and object recognition are two related visual tasks generally studied separately. However, by devising systems that solve these tasks in an integrated fashion I believe it is possible to build more efficient and robust recognition systems.


A scene parsing challenge is being held jointly with ILSVRC'16. Winners will be invited to present at ILSVRC and COCO joint workshop at ECCV 2016. Check the scene parsing challenge website.

Multimodal scene recognition. The data for this work has thousands of linedrawings and textual descriptions of scenes, done by AMT workers. The dataset is organized with the same categories as the Places database.

Aligning books and movies. Learning to see and read by watching movies and reading books. Check also the MovieQA dataset: MovieQA: Story Understanding Benchmark.

Gaze following demo, and dataset. It follows the gaze of the people inside a picture or video and predicts what are they looking. In this video, frames are first processed independently and then the output is smoothed temporaly.

Places2 scene classification challenge, held in conjunction with ILSVRC at ICCV 2015.

Interactive visualization of deep networks: Places-CNN and ImageNet-CNN. For more details on understanding the internal representation built by a CNN trained for scene recognition: Object Detectors Emerge in Deep Scene CNNs.

Places database and scene recognition demo. The Places database contains 205 scene categories and 2,5 millions of images. More details about the demo appear in: "Learning Deep Features for Scene Recognition using Places Database," B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014 (pdf).

Saliency benchmark. It has 300 test images and fixations from 39 viewers. The fixations from 39 viewers per image are not public such that no model can be trained using this data set.

Check the LabelMe App for iPhone and iPad. The app connects with your LabelMe account online and allows you to take pictures and label them on the device. You can then recover the images and anotations with the LabelMe matlab toolbox. Developed by Josep Marc Mingot Hidalgo, Dolores Blanco, Aina Torralba, David Way and Antonio Torralba.

Cool news

Late show with Stephen Colbert on the work by Carl and Hamed, Anticipating Visual Representations from Unlabeled Video. CVPR 2016.

The Marilyn Monroe/Albert Einstein hybrid image by Aude Oliva on BBC.

German TV science show on accidental cameras. Details about accidental cameras and some of our videos are available here.

Lab Members

Aditya Khosla (Grad. student), Adria Recasens (Grad. student), Agata Lapedriza (Visiting professor, UOC), Andrew Owens (Grad. student with Bill Freeman), Bolei Zhou (Grad. student), Carl Vondrick (Grad. student), Xavier Puig Fernandez (Visiting student), Yusuf Aytar (Post-doctoral Fellow)

Past students and visitors

Joseph J. Lim (Graduated 2015), Lluis Castrejon (Visiting student, 2015), Hamed Pirsiavash (Post-doctoral Fellow), Zoya Gavrilov (Grad. Student). Josep Marc Mingot Hidalgo (Visiting student), Tomasz Malisiewicz (Post-doctoral Fellow), Jianxiong Xiao (Graduated 2013), Dolores Blanco Almazan (Visiting student, 2012), Biliana Kaneva (Graduated 2011), Jenny Yuen (Graduated 2011), Tilke Judd (Graduated 2011) Myung "Jin" Choi (Graduated 2011), James Hays (Post-doctoral Fellow), Hector J.Bernal (Visiting student), Gunhee Kim (Visiting student), Bryan C. Russell (Graduated 2008).


Places database. The Places database contains 205 scene categories and 2,5 millions of images.

3D IKEA dataset. Dataset for IKEA 3D models and aligned images. J. Lim, H. Pirsiavash, and A.Torralba. ICCV 2013.

SUN Database. Scene UNderstanding Database. A database for scene recognition (900 scene categories) and multiclass object detection (>15000 fully segmented images). Xiao et al, CVPR 2010. (pdf)

360-SUN Database. A database of 360 degrees panoramas organized along the SUN categories. Xiao et al, CVPR 2012. (pdf)

Out of context objects. The database contains 218 fully annotated images with at least one object out-of-context. Can you detect the out of context object? Project page

LabelMe: the open annotation tool. Explore the online query tool, Matlab toolbox, Wordnet hierarchy, and the 3D LabelMe toolbox

LabelMe video: Jenny Yuen et al, ICCV 09. (pdf)

80 Million tiny images: explore a dense sampling of the visual world Antonio Torralba, Rob Fergus, William T. Freeman

Indoor Scene Recognition Database: 67 indoor scene categories. A. Quattoni, and A.Torralba. CVPR 2009.


Scene Understanding Symposium (SUnS) Aude Oliva, Thomas Serre, Antonio Torralba 2006, 2007, 2008, 2009, 2011.

The context challenge: How far can you go before having to run an object detector?

Gallery: A selection of some of the images that I like the most resulting from the research.


Gist, scene recognition.

A simple object detector with boosting.

Eye movements and attention.

SIFT Flow.

HOGgles: Visualizing Object Detection Features.

Talks and tutorials

Course on Recognizing and Learning Object Categories Li Fei-Fei, Rob Fergus, Antonio Torralba. ICCV 2005, CVPR 2007, ICCV 2009.