The context challenge

How far can you go before running an object detector?



 

(a car and a person? the blob on the right is identical to the one on the left after a 90deg rotation) (paper.pdf)



 
 

In the absence of enough local evidence about an object's identity, the scene structure and prior knowledge of world regularities provide additional information for recognizing and localizing an object. Even when objects can be identified via intrinsic information, context can simplify the object discrimination by cutting down on the number of object categories, scales and positions that need to be considered.

The context challenge consists in trying to detect an object using exclusively contextual cues. Using context features can you predict if the object is present or absent? can you predict its location and scale in the image?

How can this work? There is a relationship between the aspect of the objects in a scene, and the aspect of the scene itself. For instance, the point of view of cars is correlated with the orientation of the street. But also, the location of the ground in the scene is correlated with the location of the objects in the scene. The goal is to devise algorithms that make use of these (and other) scene cues to predict the location of objects of interest.

Face detection in context: This image shows the output of our algorithm when trained to detect faces. Because the algorithm is only based on contextual features (there is no local face detection), the selected regions are large. However, the algorithm succesfully reduces the search space by selecting the image regions more likely to contain the target.

Car detection in context: This movie illustrates how context is used to select the image region likely to contain cars, and then a car detector is evaluated in those locations.

The dataset used to train the context-based car detector contains 2680 scenes including indoor and outdoor scenes. There are far views and close-up views of faces and cars. Only heads and cars (any view) are systematically labeled with a rectangular bounding box. In this dataset, all images are 256x256. Images come from several sources (digital camera, web, TV, etc.)

This dataset is available at LabelMe: dataset

LabelMe: the open annotation tool
Help building a large database of annotated images.
Bryan Russell, Antonio Torralba and William T. Freeman


Details of our approach can be found here:

Object detection and localization using local and global features
K. Murphy, A. Torralba, D. Eaton, W. T. Freeman
Lecture Notes in Computer Science (unrefeered). Sicily workshop on object recognition, 2005.

Contextual priming for object detection
A. Torralba
International Journal of Computer Vision, Vol. 53(2), 169-191, 2003.

Statistical context priming for object detection
A. Torralba, P. Sinha
Proceedings of the International Conference on Computer Vision, pp. 763-770, Vancouver, Canada, 2001.

More papers are available here.