The grammar of vision: probabilistic grammar-based models for visual scene understanding and object categorization

One-day workshop at NIPS 2007

December 8, 2007

Organizers: Josh Tenenbaum (jbt@mit.edu), Virginia Savova (savova@mit.edu), Alan Yuille (yuille@stat.ucla.edu), Leslie Kaelbling (lpk@csail.mit.edu) 


Overview

The human ability to acquire a visual concept from a few examples, and to recognize instances of that concept in the context of a complex scene, poses a central challenge to the fields of computer vision, cognitive science, and machine learning. Representing visual objects and scenes as the human mind does is likely to require structural sophistication, something akin to a grammar for image parsing, with multiple levels of hierarchy and abstraction, rather than the "flat" feature vectors which are standard in most statistical pattern recognition. Grammar-based approaches to vision have been slow to develop, largely due to the absence of effective methods for learning and inference under uncertainty. However, recent advances in machine learning and statistical models for natural language have inspired a renewed interest in structural representations of visual objects, categories, and scenes. The result is a new and emerging body of research in computational visual cognition that combines sophisticated probabilistic methods for learning and inference with classical grammar-based approaches to representation. The goal of our workshop is to explore these new directions, in the context of several interdisciplinary connections that converge distinctively at NIPS. We will focus on these challenges: How can we learn better probabilistic grammars for machine vision by drawing on state-of-the-art methods in statistical machine learning or natural language learning? What can probabilistic grammars for machine vision tell us about human visual cognition? How can human visual cognition inspire new developments in computational vision and machine learning?

Schedule