1. Vision
Input-Output Processing is at the heart of Human Intelligence
Some of the research in Artificial Intelligence research has been
oriented mainly towards building a general computational theory of
intelligence, of which human intelligence would be only one
instance. However, some of the recent research in Artificial
Intelligence has been primarily directed towards understanding human
intelligence, taking the results of neuroscience and psychology into
large consideration.
Our research fits in the second category, as a part of the effort
to understand general intelligence through a study of animal and
particularly human intelligence, the only form of intelligence we
know. We use methods of Computer Science, and especially Computer
Engineering, as tools to exploit the facts revealed in biology,
psychology and neuroscience. Most importantly, we believe that
perception and interaction with the real world is at the heart of
intelligence.
In this perspective, our research is aimed at expressing, with
computational tools, a model of human perceptual information
processing. In particular, this project is a proposal to explore an
attention-based model for auditory processing, inspired by the
model proposed by Rao [1] and Ullman [2] for visual spatial
perception.
Why Audition, Why Attention?
A lot of work has been done in fields related to visual information
processing, for example computer vision, character recognition,
etc. However, less work has been done in auditory information
processing, in areas other than language processing. Furthermore, only
a small fraction of the work on vision is directly aimed at
understanding practically how the human brain does it. Similarly,
little work has been done in understanding how we listen to and react
to sounds other than conversations, for instance, music. How do we
understand, interpret, recognize music? Moreover, why does it affect
our emotions, our mood?
For the human visual system, Ullman [1] proposed a theory of
cognitive routines to account for the flexibility and polyvalence of
our visual system in performing various complex information extraction
tasks. Rao [2] proposed a model on how the routines can be performed
and learned using a generic architecture around attention.
Our task in this project was to try out a model similar to Rao's
model for musical perception. We chose music/audition instead of
vision, to see if the generic model for spatial information processing
is also a generic model for other senses, as Rao speculates. The
concept of an attention-based model is an attractive one, for two
reasons, which are related to each other:
- It is what makes Rao's model generic for various visual tasks. It
reduces their complexity by organizing them into serial routines,
sequences of attentional states. It would make the model generic to
all senses if we can apply it here.
- Attention is at the center of the mystery of the human mind. It is
goes together with the unresolved issue of how a machine could have
consciousness, and be self-aware. Here, our engineer's approach is the
following: if we include manifestations of attention as a
computational tool in our model, maybe that will give us insights on
the consciousness issue.
Our Goals, for this project and beyond
- Understand the human auditory system
- Unify the workings of the brain
: explain audition and vision under the same architecture, understand the hierarchy of the brain
- Extend Rao's language of attention to auditory perception
- Extend the notion of implicit memory (routines) to understand explicit memory
- Attempt to recreate music in a learned style
Our Focus for this project
- Perception of music
: to narrow down the extend of this
first project, we abstracted out the lower-level auditory mechanisms,
and made reasonable assumptions about them. In vision, the evidence
from neuroscience and psychology shows that the brain separates the
task in two: object recognition and spatial information processing. It
allowed Rao to do a correct abstraction of object recognition while he
focused on spatial information processing. Similarly, we abstract out
the sound recognition process (which among other things, lets us
recognize a particular instrument, or spoken language), and focus on
the perception of music. To this end we made a set of reasonable
assumptions: that the lower-level mechanisms yield to our system a
notion of pitch, volume and duration.
- Learning
: we will focus on learning musical
routines. Understanding music composition will be left for later
work.
Our Contributions
- Auditory World in terms of Attentional Patterns
: we
proposed a new understanding of the auditory world in terms of changes
in attentional patterns. Applying Rao's work to audio was no easy
task, since at first sight, the two worlds have nothing in
common.
- Periodicity is Learned, not hard coded
: New theory
of how periodicity emerges from matching successively larger patterns,
and understanding emergent patterns.
- Language of Attention
: Provided a language of attention for
the understanding of musical pieces. Adding a new category to Rao's
system, and filing in the auditory equivalents of Rao's visual
routines.
- Unifying visual and auditory world
: Same architecture of
attention can be used to explain both visual and auditory
worlds. Implications on evolution of those behaviors and on the
separation and specialization of our brains to the different tasks it
accomplishes today.