We parse an input image (left) by recombining elements of a labeled dictionary of scenes (middle) to form a collage (right).
To quickly synthesize complex scenes, digital artists often collage together visual elements from multiple sources: for example, mountains from New Zealand behind a Scottish castle with wisps of Saharan sand in front. In this paper, we propose to use a similar process in order to parse a scene. We model a scene as a collage of warped, layered objects sampled from labeled, reference images. Each object is related to the rest by a set of support constraints. Scene parsing is achieved through analysis-by-synthesis. Starting with a dataset of labeled exemplar scenes, we retrieve a dictionary of candidate object segments that match a query image. We then combine elements of this set into a ``scene collage" that explains the query image. Beyond just assigning object labels to pixels, scene collaging produces a lot more information such as the number of each type of object in the scene, how they support one another, the ordinal depth of each object, and, to some degree, occluded content. We exploit this representation for several applications: image editing, random scene synthesis, and image-to-anaglyph.
Isola, P. and Liu, C. Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers.ICCV, 2013.
Supplemental materials
Demo of image-to-2.5D
Full results on LabelMe Outdoor Dataset
@inproceedings{SceneCollaging, author="Phillip Isola and Ce Liu", title="Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers", booktitle="ICCV", year="2013" }