Jodi Davenport <jodi@mit.edu>

Research project:
Examine view-independent, view-dependent hypotheses using the same tasks, both between class recognition and within class recognition, to see whether one model could hold for the different stimuli.

Short answer:
Briefly outline the view-dependent and view-independent hypotheses for object recognition. Give at least 2 claims of each model.

Which of the following can be determined by viewing point-light walkers (i.e. biological motion sequences)?
a) gender
b) the activity being engaged in
c) the amount of weight being lifted
d) all of the above

--------------------------------------------------------------------------

Keith Thoresz <thorek@ai.mit.edu>

1. The Bülthoff et al study of biological motion recognition using
impoverished stimuli argued in favor of a 2D view-based recognition. Is
it possible that the use of impoverished stimuli was partly responsible
for the results?

2. From the same study, two reasons were given for why the subjects
might have ignored the depth-scrambled sequences. The subjects may or
may not have been perceptually aware of the scrambling. Elaborate on one
of these.

3. According to Marr, in which order does the human visual system
process scenes?
a. early processing => primal sketch => 2 ½ D => 3D
b. early processing => 2 ½ D => primal sketch => 3D
c. primal sketch => early processing => 2 ½ D => 3D

--------------------------------------------------------------------------

Kennet Belenky <kbelenky@mit.edu>

Open question:
All the models we discussed in class perform recognition once the object
has been separated from the background and features have been determined.
(Both of those two steps, I believe, are at least as difficult as
recognition). Investigate ways of using side-information from the
recognition process (information other than a simple object classification,
like "It looks like a cheeta, minus one leg") to provide top-down
information to the feature-detector and object-outlining layers. This would
most likely create a network which requires several iterations to settle
into a final answer, as information from one layer modifies the output of
the other two layers.

Short Answer:
Discuss the tradeoffs in terms of biological plausibility, computational
power, and accuracy of the three different recognition systems we talked
about in class.

Multiple Choice:
Of the following visual confounders, which ones can the human visual system
handle, while the artificial systems we discussed in class can't.
1) opaque occluding patterns.
2) altered angle of the perspective field of view.
3) Low intensity white noise masking.

a) None of the above
b) 1
c) 1&2
d) 1&3
e) All of the above.

--------------------------------------------------------------------------

Janice Chen <kanile@mit.edu>

1. Repeated exposure to visual illusions seems (by my own experience) to decrease the effectiveness of the display. An interesting study might be one in which subjects were exposed to novel illusions, after which repeated testing (for example, with the Ebbignhaus illusion test described in Goodale's article) could determine the subjects' susceptibility to the illusion after experience.
2. Describe the method by which subjects were tested on their perception of size in the Ebbinghaus illusion.
3. Which of these is NOT a perceptual problem of patient DF?
a. She was unable to associate visual stimuli with meaningful semantic information.
b. She could not discriminate between objects of different orientations.
c. She could not locate stable grasp points when picking up an object. (correct)
d. She could not indicate the width of an object by opening her index finger and thumb.

--------------------------------------------------------------------------

"Amrys O. Williams" <amrys@mit.edu>

Research Question: 
Do humans instinctively see pictures as representations of real things?

Short Answer Question: 
Describe one experiment which supports the 2D, view-based model of
recognition.

Multiple Choice Question: 
David Marr's model of object recognition includes all of the following except:
a) 2D memory representation
b) 3D memory representation
c) primal sketch
d) 2 1/2-D sketch


--------------------------------------------------------------------------

Andrew Yip <ayip@mit.edu>

Open research question/project idea:
The motion sequences that have been used are simple skeletal models, which seem to contain relatively little depth information. Perhaps testing subjects with simple polygonal humanoid motion-sequences would show greater recogntion of non-canonical views, suggesting a corresponding increase in the use of depth information.

Short answer question:
Why is inverted script so difficult to read?

Multiple choice question:
Recognition of biological motion sequences is most dependent on
a) level of view familiarity
b) depth information
c) whether the points used are at joints or on the limbs themselves

--------------------------------------------------------------------------

Liina Pylkkanen <liina@mit.edu> 

1. Open question/project idea

(i) Are differences in the recognition of different types of
motion sequences
qualitative, i.e. due to the different types (such as
locomotive or social), or could they be frequency effects? Would a 
a locomotive sequence be faster to recognitze than a frequency matched
social or instrumental sequence? (I don't know how one would
control for frequency in this domain though...)

(ii) If 2D perception is default, or more automatic than 3D perception,
2D similarity should cause priming effects in paradigms where "late"
processing is blocked, such as a masked priming paradigm, whereas
3D similarity shoudln't. To test this we'd need stimuli where 2D and
3D similarity weren't correlated. 

2. Short answer question

What's the significance of Dittrich's result that inter-joint
Johansson displays show only slightly impaired motion recognition?

3. Multiple choice:

Stereodepth information makes biological motion perception
(a) more viewpoint dependent
(b) less viewpoint dependent
(c) does not afect perception
--------------------------------------------------------------------------

"Jennifer C. Shieh" <jcshieh@mit.edu>

1. What type of experiment would be able to establish whether or not
humans use feature descriptions for recognition? And if it does become
established that humans use feature descriptions, how could salient
features be determined?
2. Describe the difference between explicit and implicit naming tasks,
providing examples of each.
3. Multiple answer multiple choice- choose all that are correct
What are some problems with using familiar objects to test object recognition?
a. Priming effect
b. Difficulty controlling subjects’ prior exposure to objects
c. Does not allow for investigation of entry-level categorization 
d. Difficulty controlling types of features available to distinguish
between different objects

--------------------------------------------------------------------------

Charisse Massay <charisse@mit.edu>

1. Open research question: I'm sorry, I couldn't think of one. I had a
good one involving gorillas until I realized that it would be better suited
for the next day's class so I sent it to Jia instead. It kind of crosses
over both days so here is a copy of the text: 
What happens if you remove a primate from its natural environment and then
expose it to certain objects from that environment? For example, we remove
a gorilla from an interactive zoo environment (for simplicity, the Bronx
Zoo has great environments) and then take certain aspects of those
surroundings, like a specific tree or rock and test how the gorilla reacts
to it. If one can find the gnostic cell, then the experimenter can test
other cell in that area through a simplification task. This would test
whether or not the IT complex is organized in a column like setup.

2. Describe the concept of a gnostic cell. Why is this sometimes called
the "grandmother cell"?

3. Multiple choice: Which of the following is evidence for a two
dimensional view based model of recognition?
a. a steady level of recognition as a 3D object rotates
b. the ability to recognize 3D objects
c. a fall off in recognition performance with deviation from the training
view
d. those cool magic eye things

** Because I did not come up with an open research question for this class
period, I thought I would throw in another short answer question:
If the model for recognition in the human brain is two dimensional, then
why can we recognize certain objects from all angles (e.g. teapot, cube,
persons)?

--------------------------------------------------------------------------

Richard Russell <rrussell@mit.edu> 

1) open research question/project idea:

Studies of human object recognition have primarily used objects such as
abstract blobs or 'paper clips' in order to avoid having the subjects map
the novel objects onto representations of similar looking familiar objects.
However, it is possible that this strategy is commonly used by people when
encountering novel objects, whether consciously ('looks like___') or
unconsciously. A research project might involve investigate this by
looking at how similarities between novel objects and familiar objects
affect object learning. 

2) Short answer:

If 3D object representations require more front-end image processing, how
might 2D object representations be computationally more expensive?

3) Multiple choice:
The evidence from anthropological studies of pictorially illiterate
cultures for innateness of the perception of 2D images as depictions: 
A) indicates that pictures are universally perceived as depictions of
real 3D objects
B) is quite mixed and often methodologically flawed
C) suggests that images are inherently perceived as magical
D) suggests that the perception of 2D images as depicting 3D objects is
learned

--------------------------------------------------------------------------

Janice Chen <kanile@mit.edu>

1.      The problem of storing an infinite number of views does not seem to be well-stated.  In a movie, 24 frames per second is enough for the perception of smooth movement.   A Quicktime VR movie, which allows the user to manipulate 2D views of a 3D object (very similar to mental imaging of a 3D object) requires only 20-30 views for a panoramic latitude.  If only this many, or even 10 times that number, were necessary for accurate recognition of an object, the "combinatorial explosion" problem of look-up tables would be drastically reduced.  Perhaps a study of novel object recognition with subjects shown an increasing number of different views could determine whether recognition performance plateaus at a certain number of views.

2.      Describe the "grandmother neuron theory" of object recognition, and the problems specified by Poggio.

3.      About how many synapses does the human brain have?

a.      10^5
b.      10^14 
c.      10^25
d.      10^100