Carl Vondrick

Ph.D. Student, MIT
Email: vondrick@mit.edu
Office: 32-D475B

ResumeGithubScholarTwitter

About Me

My research studies computer vision and machine learning. The goal of my research is to develop models that understand images and videos well enough to anticipate what may happen in the future.

I am a Ph.D. student at MIT where I am advised by Antonio Torralba. I completed my bachelors degree at UC Irvine advised by Deva Ramanan. I spent some fun summers at Google and Google X.

Thank you to Google and the NSF for fellowships that support my research!

News

Papers

Predictive Vision

Although lacking annotations, unlabeled video and text is abundantly available and contains rich signals about the world. How do we use this resource to develop more powerful perceptual systems?

Generating the Future with Adversarial Transformers New!
Carl Vondrick, Antonio Torralba
CVPR 2017
Paper

Generating Videos with Scene Dynamics New!
Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
NIPS 2016
Paper Project Page Code NBC Scientific American New Scientist MIT News

SoundNet: Learning Sound Representations from Unlabeled Video New!
Yusuf Aytar*, Carl Vondrick*, Antonio Torralba
NIPS 2016
Paper Project Page Code NPR New Scientist Week Junior MIT News

Anticipating Visual Representations with Unlabeled Video
Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
CVPR 2016
Paper Project Page NPR CNN AP Wired Stephen Colbert MIT News

Predicting Motivations of Actions by Leveraging Text
Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba
CVPR 2016
Paper Dataset

Cross-Modal Transfer

Objects and events manifest in many modalities (e.g., natural images, cartoons, sound, text). How can we represent concepts agnostic to their modality? How can we transfer between modalities?

Cross-Modal Scene Networks New!
Yusuf Aytar*, Lluis Castrejon*, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
In Submission
Paper Project Page

Learning Aligned Cross-Modal Representations from Weakly Aligned Data
Lluis Castrejon*, Yusuf Aytar*, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
CVPR 2016
Paper Project Page Demo

See also: SoundNet: Learning Sound Representations from Unlabeled Video


Action Understanding

The ability to understand people from vision is important for human-machine interaction. How can we train machines to better understand people's activities and intentions?

Following Gaze Across Views New!
Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba
In Submission
Paper Videos

Who is Mistaken? New!
Benjamin Eysenbach, Carl Vondrick, Antonio Torralba
In Submission
Paper Project Page

Where are they looking?
Adria Recasens*, Aditya Khosla*, Carl Vondrick, Antonio Torralba
NIPS 2015
Paper Project Page Demo

Assessing the Quality of Actions
Hamed Pirsiavash, Carl Vondrick, Antonio Torralba
ECCV 2014
Paper Project Page

See also: Anticipating Visual Representations with Unlabeled Video

See also: Predicting Motivations of Actions by Leveraging Text


Visualization

In order to improve upon computer vision models, it is instructive to understand and diagnose their failures. We are interested in analyzing and visualizing computer vision models. How much training data do you need? What bottlenecks prevent us from effectively capitlaizing on big data?

Visualizing Object Detection Features
Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba
IJCV 2016
Paper Project Page Slides MIT News

Do We Need More Training Data?
Xiangxin Zhu, Carl Vondrick, Charless C. Fowlkes, Deva Ramanan
IJCV 2015
Paper Dataset

Learning Visual Biases from Human Imagination
Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba
NIPS 2015
Paper Project Page Technology Review

HOGgles: Visualizing Object Detection Features
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba
ICCV 2013
Paper Project Page Slides MIT News

Do We Need More Training Data or Better Models for Object Detection?
Xiangxin Zhu, Carl Vondrick, Deva Ramanan, Charless C. Fowlkes
BMVC 2012
Paper Dataset


Video Annotation

Large labeled datasets have enabled significant advancements in image understanding. However, there has not been as much progress in video understanding, possibly because labeled video data is much more expensive to annotate. We seek to develop better methods to annotate video efficiently.

Efficiently Scaling Up Crowdsourced Video Annotation
Carl Vondrick, Donald Patterson, Deva Ramanan
IJCV 2012
Paper Project Page

Video Annotation and Tracking with Active Learning
Carl Vondrick, Deva Ramanan
NIPS 2011
Paper Project Page

A Large-scale Benchmark Dataset for Event Recognition
Sangmin Oh, et al.
CVPR 2011
Paper Project page

Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces
Carl Vondrick, Deva Ramanan, Donald Patterson
ECCV 2010
Paper Project Page

* indicates equal contribution

Media and Talks

Press about learning predictive models from millions of raw videos:

A few talks about our work:


Visual Anticipation, RE-WORK 2016

Visualizing Object Detection Features, ICCV 2013