1. Vision

Input-Output Processing is at the heart of Human Intelligence

Some of the research in Artificial Intelligence research has been oriented mainly towards building a general computational theory of intelligence, of which human intelligence would be only one instance. However, some of the recent research in Artificial Intelligence has been primarily directed towards understanding human intelligence, taking the results of neuroscience and psychology into large consideration.

Our research fits in the second category, as a part of the effort to understand general intelligence through a study of animal and particularly human intelligence, the only form of intelligence we know. We use methods of Computer Science, and especially Computer Engineering, as tools to exploit the facts revealed in biology, psychology and neuroscience. Most importantly, we believe that perception and interaction with the real world is at the heart of intelligence.

In this perspective, our research is aimed at expressing, with computational tools, a model of human perceptual information processing. In particular, this project is a proposal to explore an attention-based model for auditory processing, inspired by the model proposed by Rao [1] and Ullman [2] for visual spatial perception.

Why Audition, Why Attention?

A lot of work has been done in fields related to visual information processing, for example computer vision, character recognition, etc. However, less work has been done in auditory information processing, in areas other than language processing. Furthermore, only a small fraction of the work on vision is directly aimed at understanding practically how the human brain does it. Similarly, little work has been done in understanding how we listen to and react to sounds other than conversations, for instance, music. How do we understand, interpret, recognize music? Moreover, why does it affect our emotions, our mood?

For the human visual system, Ullman [1] proposed a theory of cognitive routines to account for the flexibility and polyvalence of our visual system in performing various complex information extraction tasks. Rao [2] proposed a model on how the routines can be performed and learned using a generic architecture around attention.

Our task in this project was to try out a model similar to Rao's model for musical perception. We chose music/audition instead of vision, to see if the generic model for spatial information processing is also a generic model for other senses, as Rao speculates. The concept of an attention-based model is an attractive one, for two reasons, which are related to each other:

  1. It is what makes Rao's model generic for various visual tasks. It reduces their complexity by organizing them into serial routines, sequences of attentional states. It would make the model generic to all senses if we can apply it here.
  2. Attention is at the center of the mystery of the human mind. It is goes together with the unresolved issue of how a machine could have consciousness, and be self-aware. Here, our engineer's approach is the following: if we include manifestations of attention as a computational tool in our model, maybe that will give us insights on the consciousness issue.

Our Goals, for this project and beyond

Our Focus for this project

Our Contributions