Role of learning in 3D form perception

One of the most enduring questions about human vision is how we are able to perceive three-dimensionality in two-dimensional images, even in the absence of motion, stereo, shading and texture cues. Traditionally, researchers have posited the use of innately specified brain mechanisms, such as a preference for simplicity. To test these ideas, we have developed a computational system for recovering 3D structures from single 2D line-drawings, using a fixed set of constraints that partly capture the notion of perceptual simplicity. While the system is able to mimic human performance for a small set of inputs, it exhibits significant limitations when analyzing natural imagery. To account for these shortcomings, we have proposed a learning-based theory, and have gathered experimental data that provide strong evidence for a role of object-specific learning in the perception of 3D structure. Together, the computational and experimental studies provide a good foundation for building a more comprehensive account of 3D shape perception in single 2D images.

Sinha, P. & Adelson, E. H. (1993). Recovering reflectance and illumination in a world of painted polyhedra. Proceedings of the Fifth International Conference on Computer Vision, Berlin, Germany.

Sinha, P. & Poggio, T. (1996) The role of learning in 3-D form perception. Nature, Vol. 384, No. 6608, pp. 460-463.


Influence of learning on stereo-depth perception

The interaction between depth perception and learning processes has important implications for the nature of mental object representations and models of hierarchical organization of visual processing. It is often believed that the computation of depth influences subsequent high-level object recognition processes, and that depth processing is an early vision task that is largely immune to learned object-specific influences. We have found experimental evidence that challenges both these assumptions in the specific context of stereoscopic depth-perception. Our results suggest that observers can not only recognize depth-scrambled 3D objects, they are perceptually unaware of the depth anomalies. The first result points to the limited contribution of depth information to recognition while the second result is indicative of a top-down recognition-based influence whereby learned expectations about an object’s 3D structure can overwhelm true stereoscopic information.

Bülthoff, I., Bülthoff, H. H. & Sinha, P. (1998). Top-down influences on stereoscopic depth-perception. Nature Neuroscience, Vol. 1, No. 3, pp 254-257.


A computational approach for incorporating learning in early vision

Perceptual tasks such as estimation of three-dimensional structure, edge detection and image segmentation are considered to be low-level or mid-level vision problems and are traditionally approached in a bottom-up, generic and hard-wired way. However, as described above, we have found experimental evidence that suggests a top-down, learning-based scheme. To complement our empirical results, we have developed a simple computational model for incorporating learned expectations in perceptual tasks. The results generated by our model when tested on edge-detection and view-prediction tasks for three-dimensional objects are consistent with human perception and are more tolerant to input degradations than conventional bottom-up strategies. This lends support to the idea that even some of the supposedly ‘hard-wired’ perceptual skills in the human visual system might, in fact, incorporate learned top-down influences.

Jones, M. J., Sinha, P., Vetter, T., & Poggio, T. (1997). Top-down learning of low-level vision tasks. Current Biology, 7: 991-994.

Sinha, P. and Poggio, T. (2002) High-level learning of early perceptual tasks. In Perceptual Learning, Ed. Manfred Fahle, MIT Press, Cambridge, MA.