1. Vision

Input-Output Processing is at the heart of Human Intelligence

Some of the research in Artificial Intelligence research has been oriented mainly towards building a general computational theory of intelligence, of which human intelligence would be only one instance. However, some of the recent research in Artificial Intelligence has been primarily directed towards understanding human intelligence, taking the results of neuroscience and psychology into large consideration.

Our research fits in the second category, as a part of the effort to understand general intelligence through a study of animal and particularly human intelligence, the only form of intelligence we know. We use methods of Computer Science, and especially Computer Engineering, as tools to exploit the facts revealed in biology, psychology and neuroscience. Most importantly, we believe that perception and interaction with the real world is at the heart of intelligence.

In this perspective, our research is aimed at expressing, with computational tools, a model of human perceptual information processing. In particular, this project is a proposal to explore an attention-based model for auditory processing, inspired by the model proposed by Rao [1] and Ullman [2] for visual spatial perception.

Why Audition, Why Attention?

A lot of work has been done in fields related to visual information processing, for example computer vision, character recognition, etc. However, less work has been done in auditory information processing, in areas other than language processing. Furthermore, only a small fraction of the work on vision is directly aimed at understanding practically how the human brain does it. Similarly, little work has been done in understanding how we listen to and react to sounds other than conversations, for instance, music. How do we understand, interpret, recognize music? Moreover, why does it affect our emotions, our mood?

For the human visual system, Ullman [1] proposed a theory of cognitive routines to account for the flexibility and polyvalence of our visual system in performing various complex information extraction tasks. Rao [2] proposed a model on how the routines can be performed and learned using a generic architecture around attention.

Our task in this project was to try out a model similar to Rao's model for musical perception. We chose music/audition instead of vision, to see if the generic model for spatial information processing is also a generic model for other senses, as Rao speculates. The concept of an attention-based model is an attractive one, for two reasons, which are related to each other:

It is what makes Rao's model generic for various visual tasks. It reduces their complexity by organizing them into serial routines, sequences of attentional states. It would make the model generic to all senses if we can apply it here.
Attention is at the center of the mystery of the human mind. It is goes together with the unresolved issue of how a machine could have consciousness, and be self-aware. Here, our engineer's approach is the following: if we include manifestations of attention as a computational tool in our model, maybe that will give us insights on the consciousness issue.

Our Goals, for this project and beyond

Understand the human auditory system

Unify the workings of the brain: explain audition and vision under the same architecture, understand the hierarchy of the brain

Extend Rao's language of attention to auditory perception

Extend the notion of implicit memory (routines) to understand explicit memory

Attempt to recreate music in a learned style

Our Focus for this project

Perception of music: to narrow down the extend of this first project, we abstracted out the lower-level auditory mechanisms, and made reasonable assumptions about them. In vision, the evidence from neuroscience and psychology shows that the brain separates the task in two: object recognition and spatial information processing. It allowed Rao to do a correct abstraction of object recognition while he focused on spatial information processing. Similarly, we abstract out the sound recognition process (which among other things, lets us recognize a particular instrument, or spoken language), and focus on the perception of music. To this end we made a set of reasonable assumptions: that the lower-level mechanisms yield to our system a notion of pitch, volume and duration.

Learning: we will focus on learning musical routines. Understanding music composition will be left for later work.

Our Contributions

Auditory World in terms of Attentional Patterns: we proposed a new understanding of the auditory world in terms of changes in attentional patterns. Applying Rao's work to audio was no easy task, since at first sight, the two worlds have nothing in common.

Periodicity is Learned, not hard coded: New theory of how periodicity emerges from matching successively larger patterns, and understanding emergent patterns.

Language of Attention: Provided a language of attention for the understanding of musical pieces. Adding a new category to Rao's system, and filing in the auditory equivalents of Rao's visual routines.

Unifying visual and auditory world: Same architecture of attention can be used to explain both visual and auditory worlds. Implications on evolution of those behaviors and on the separation and specialization of our brains to the different tasks it accomplishes today.