Acquiring Visual Classifiers from Human Imagination

Carl Vondrick Hamed Pirsiavash Aude Oliva Antonio Torralba
Massachusetts Institute of Technology

Download the full paper


The human mind can remarkably imagine objects that it has never seen, touched, or heard, all in vivid detail. Motivated by the desire to harness this rich source of information from the human mind, this paper investigates how to extract classifiers from the human visual system and leverage them in a machine. We introduce a method that, inspired by well-known tools in human psychophysics, estimates the classifier that the human visual system might use for recognition, but in computer vision feature spaces. Our experiments are surprising, and suggest that classifiers from the human visual system can be transferred into a machine with some success. Since these classifiers seem to capture favorable biases in the human visual system, we present a novel SVM formulation that constrains the orientation of the SVM hyperplane to agree with the human visual system. Our results suggest that transferring this human bias into machines can help object recognition systems generalize across datasets. Moreover, we found that people's culture may subtly vary the objects that people imagine, which influences this bias. Overall, human imagination can be an interesting resource for future visual recognition systems.

To learn more, download our paper and scroll down for some results.

Acquiring Classifiers

Human psychophysics researchers have developed procedures to estimate and interpret the classifiers that the human visual system uses for recognition. We are building off these tools to extract classifiers from the human visual system, and transfer it into a machine.

We describe our method in detail in our paper. On a high level, our approach works by showing humans white noise in feature space and asking them to indicate whether they perceive an object in the noise. By applying basic statistics to their responses, we can estimate the decision boundary that the human visual system is using for discrimination. Since we use white noise, and not real images, we avoid priming the subjects, and instead capture subtle details from people's imagination.

Noise in Feature Space Human Visual System Classifier for Car

Crowdsourced Imagination

We extracted some classifiers that the human visual system of workers on Amazon Mechanical Turk might use to recognize objects. We visualize some of their imaginary classifiers below:

Although the visualizations are blurry, structure emerges in many cases. In the car classifier, we can see a vehicle-like object in the center sitting on top of a dark road and light sky. The television shows a rectangular structure, and the fire hydrant reveals a red hydrant with two arms on the side.


Our paper tries to scientifically understand how well we can acquire classifiers from the human visual system and leverage them computationally. To quantify this, we evaluated the imaginary classifiers on their ability to recognize objects. We show some of the images that the imaginary classifiers scored highly:

The full quantitative results in our paper suggest that the imaginary classifiers are capturing some signal from the human visual system. In nearly every case, these classifiers do contain some discriminative power.

Transferring into Machines

Since the imaginary classifiers are estimated from the human visual system, we expect it to capture good biases about the visual world. We are able to transfer this human bias into SVMs by constraining the hyperplane \(w \in \mathbb{R}^d\) to be oriented at most \(\cos^{-1}(\theta)\) degrees away from the imaginary classifier \(c \in \mathbb{R}^d\): $$ \begin{aligned} \min_{w,b,\xi} \enspace &||w||_2^2 + \lambda\sum_{i=1}^n \xi_i \\ \mathrm{s.t.} \enspace &y_i \left(w^T x_i + b\right) \ge 1 - \xi_i, \quad \xi_i \ge 0, \\ &\theta \le \frac{w^T c}{||w||_2||c||_2} \end{aligned} $$ The above is a standard linear SVM, except the last constraint forces the solution to be oriented similar to the classifier acquired from the human visual system.

The experiments in our paper suggest that transferring the bias in the imaginary classifiers into machine learning may benefit object recognition on generalizing across datasets.

Mental Images

Our culture and experiences can subtly influence the objects that we imagine. We created classifiers for sports balls for people in different geographic locations:

India United States

Even though both sets of workers were labeling noise from the same distribution, Indian workers seemed to imagine red balls, while American workers tended to imagine orange/brown balls. Remarkably, the most popular sport in India is cricket, which is played with a red ball, and popular sports in the United States are American football and basketball, which are played with brown/orange balls. We hypothesize that Americans and Indians may have different mental images of sports balls, biased by their culture.

Learn More

We encourage readers to see our paper to learn more. Our results may surprise you!