I am a Ph.D. student at MIT in computer science.
My research studies computer vision and machine learning. Scene understanding models excel with large amounts of labeled data, however this is expensive to scale. My work explores how to efficiently leverage human effort and unlabeled data in order to create more powerful perception systems.
There is an abundance of unlabeled data available that contain rich signals. We hope to leverage this resource to expand the capabilities of perceptual models. Our prediction system below has watched 600 hours of unlabeled video in order to learn to anticipate some human actions one second into the future.
Anticipating Visual Representations with Unlabeled Video
Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Predicting Motivations of Actions by Leveraging Text
Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba
Learning Aligned Cross-Modal Representations from Weakly Aligned Data
Lluis Castrejon*, Yusuf Aytar*, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
The ability to understand people from vision is crucial for fluid human-machine interaction. We are interested in developing visual models for machines to reason about people's activities.
Where are they looking?
Adria Recasens*, Aditya Khosla*, Carl Vondrick, Antonio Torralba
Assessing the Quality of Actions
Hamed Pirsiavash, Carl Vondrick, Antonio Torralba
In order to improve upon computer vision models, it is instructive to understand and diagnose their failures. We are interested in analyzing and visualizing computer vision models. How much training data do you need? What bottlenecks prevent us from effectively leveraging big data?
Visualizing Object Detection Features
Carl Vondrick, Aditya Khosla, Hamed Pirsiavash, Tomasz Malisiewicz, Antonio Torralba
Do We Need More Training Data?
Xiangxin Zhu, Carl Vondrick, Charless C. Fowlkes, Deva Ramanan
Learning Visual Biases from Human Imagination
Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba
HOGgles: Visualizing Object Detection Features
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba
Do We Need More Training Data or Better Models for Object Detection?
Xiangxin Zhu, Carl Vondrick, Deva Ramanan, Charless C. Fowlkes
Large labeled datasets have enabled significant advancements in image understanding. However, there has not been as much progress in video understanding, possibly because labeled video data is much more expensive to annotate. We seek to develop better methods to annotate video efficiently. Our research developed a model that can annotate massive datasets for a fraction of the cost typically needed.
Efficiently Scaling Up Crowdsourced Video Annotation
Carl Vondrick, Donald Patterson, Deva Ramanan
Video Annotation and Tracking with Active Learning
Carl Vondrick, Deva Ramanan
A Large-scale Benchmark Dataset for Event Recognition
Sangmin Oh, et al.
Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces
Carl Vondrick, Deva Ramanan, Donald Patterson
I'm normally not a praying man, but if you're up there, please save me, Superman. — Homer Simpson