The central goal of the Artificial Intelligence Laboratory is to develop a computational theory of intelligence extending from the manifestation of behavior to operations at the neural level. Current work focuses especially on understanding the role of vision, language, and motor computation in general intelligence.
The Laboratory also seeks to develop high-impact practical applications in areas such as information transportation, enhanced reality, the human-computer interface, modeling and simulation, image understanding, decision analysis and impact prediction, model-based engineering design, medicine, and supporting computing technology.
Professor Patrick H. Winston works on the problem of learning from precedents. Professor Robert C. Berwick studies fundamental issues in natural language, including syntactic and semantic acquisition. Professor Lynn Andrea Stein works on integrated architectures for intelligence. Professor W. Eric L. Grimson, Professor Berthold K. P. Horn, Professor Tomaso A. Poggio, and Professor Paul Viola do research in computer vision. Professor Rodney A. Brooks, Professor Tomas Lozano-Perez, Professor Gill Pratt, and Dr. J. Kenneth Salisbury work on various aspects of robotics. Professor Randall Davis and Dr. Howard E. Shrobe work on expert systems that use both functional and physical models. Professor Carl E. Hewitt studies distributed problem-solving and parallel computation. Dr. Thomas Knight and Professor William J. Dally work on new computing architectures. Professor Gerald J. Sussman and Professor Harold Abelson lead work aimed, in part, at creating sophisticated problem-solving partners for scientists and engineers studying complex dynamic systems.
The Laboratory's 168 members include 16 faculty members, 24 academic staff, 35 research and support staff, and 93 graduate students active in research activities funded by the Advanced Research Projects Agency, Air Force Office of Sponsored Research, Alphatech, Inc., Army Research Office, Fujitsu, International Business Machines, Jet Propulsion Laboratory, Korean Atomic Energy Research Institute, Loral Systems Company, Mitsubishi, NASA, National Science Foundation, Office of Naval Research, and Sumitomo Metal Industries.
Professor Grimson's group has developed a system that performs Image Guided Surgery, by registering video images with information produced by Magnetic Resonance Imaging machines. This reality-enhancing system, developed collaboratively with researchers at the Brigham and Women's Hospital, makes it possible to superimpose previously recorded Magnetic Resonance Imaging information with a current view of the head of a person with a brain tumor. Extensions to the system include a trackable surgical probe, which enables the surgeon to touch points in the operating field and see, in real time, cross-sectional slices of MR data showing the position of the probe. Neurosurgeons at Brigham and Women's are now routinely using this system for intraoperative guidance and navigation. Similar methods are under development for other surgical applications, such as sinus surgery, endoscopic surgery, and surgeries performed under surgical microscopes.
Professor Grimson's group also works on methods for recognizing objects in cluttered, noisy, and unstructured environments. Recent efforts have focused on the role of visual attention in recognition, and on the development of efficient methods for indexing into large libraries of objects. These efforts have been integrated into a system that uses a movable eye-head to find objects hidden in a room by focusing attention on interesting points in the room, and then using grouping and recognition methods to identify such objects. Recent thrusts include applying existing methods to automatic target recognition and visual database search. The latter approach involves a novel method for classifying images on the basis of their content, such as "snowy mountain," "field," "waterfall," and so on. This method uses relative photometric relationships between image regions connected in a flexible template, and supports classification and indexing in large databases, independent of illumination, spatial layout, and color shift.
Professor Horn and his students work on problems in motion vision. Because recovery of information about the world from a single cue such as motion parallax, binocular stereo disparity, or shading in images tends to be unreliable, Professor Horn works on the integration of information from multiple cues at a low level, building systems that interlace iterations of different schemes for recovering shape. Preliminary results in integrating motion vision and shape from shading, and in integrating binocular stereo and shape from shading, both show great promise.
On another front, a new special-purpose early-vision analog VLSI chip will be completed soon. This chip determines an image's focus of expansion---that point in the image towards which the camera appears to be moving. It does this without the need to detect image features. The result can be used to compute time to impact and possibly to recover shape information. The chip is expected to operate at 1000 frames per second. Although it has just a 32 x 32 array of sensors and processing elements, it is expected to be able to recover the focus of expansion with sub-pixel resolution. Work is also starting on the development of a chip that can deal with arbitrary combinations of translational and rotational motions, provided that the scene being viewed is approximately planar. This chip will be an order of magnitude more complex than the previous one and require considerable innovation before the circuitry can be fitted into the available space.
As part of an effort in the "Intelligent Highway" arena, the focus-of-expansion chip may be adapted to become a component in a time-to-impact warning system. A time-to-impact chip differs from a focus-of-expansion chip in that the focus-of-expansion chip factors out distances and velocities to recover the direction of motion, whereas a time-to-impact chip takes the direction of motion to be of less interest, while distances and velocities are considered significant. The time-to-impact chip will not be able to determine distances to objects or velocities in absolute terms, but it will obtain the ratio of these quantities, which yields the time to impact. Using such a chip, a warning system will be useful particularly in warning drivers of rapidly approaching objects in their so-called blind spot.
Professor Horn is also exploring diaphanography, the science of recovering the spatial distribution of absorbing material in a highly scattering medium from optical measurements on the surface of a volume. This is a third domain of image understanding, which stands alongside the usual optical domain of opaque objects in a transparent medium (where pin-hole optics rules) and tomography (where the observed quantities are line integral of absorbing density).
In highly scattering media, the directionality of injected photon flux is quickly lost and the situation is best modeled as a diffusional process. The governing equation is Poisson's equation, and convenient physical models include heat flow in a solid and current in a conductive medium that also has leakage paths to ground. Forward analysis---computing how a photon flux will distribute itself given a distribution of absorbing material---is not hard to compute numerically (although computationally expensive). For simple situtations, closed form solutions are also available. The inverse problem---recovering the distribution of absorbing material from a surface image---presents a challenge, however. Under certain combinations of scattering constant, average background absorption, and spacial scale, the problem is not well posed mathematically, and Professor Horn is trying to discover under what circumstances the problem is better posed, and how spatial resolution in the reconstruction is likely to vary with depth.
Potential medical applications include mammography and the imaging of testicular cancers, thyroid tumors, and of brain tumors in infants. Also, in some drug manufacturing and vaccine production, the techniques may enable the dermination of whether an egg contains a live embryo. Because imaging using heat-flow analysis is a related problem, other potential applications include flaw detection in aircraft parts and geological mineral exploration.
Professor Berwick and his group have concentrated on the problem of learning a lexicon. In particular, they have come up with a general framework for language acquisition in which linguistic parameters like words are built by perturbing a composition of existing parameters and related it to "traditional" coding schemes like Lempel-Ziv. The important idea is to get the learned representation to match linguistic intuitions, rather than some arbitrary code. They illustrate the power of the representation can via several examples in text segmentation and compression, acquisition of a lexicon from raw speech, and the acquisition of mappings between text and artificial representations of meaning.
Professor Berwick and his group also have developed a new compression scheme for natural language. The algorithm was also run as a compressor on a lower-case version of the Brown corpus with spaces and punctuation left in. All bits necessary for exactly reproducing the input were counted. Compression performance is 2.12 bits/char, significantly lower than popular algorithms like gzip 2.95 bits/char. This is the best text compression result on this corpus that they are aware of, and should not be confused with lower figures that do not include the cost of parameters. Furthermore, because the compressed text is stored in terms of linguistic units like words, it can be searched, indexed, and parsed without decompression.
Also, Professor Berwick and Dr. Partha Niyogi have been formalizing linguists' intuitions about language change by advancing a dynamical systems model for language change derived from a model for language acquisition. Rather than having to posit a separate model for diachronic change, as is typically done in the diachronic literature by drawing on assumptions from population biology, this new model dispenses with the need for these independent assumptions by showing how the behavior of individual language learners leads to emergent, global population characteristics of linguistic communities over several generations as the end product of two types of learning misconvergence. As the simplest case, they formalize the example of two grammars (languages) differing by exactly one binary parameter, and show that even this situation leads directly to a quadratic (nonlinear) dynamical system, including regions with chaotic behavior.
Under the supervision of Professor's Brooks and Stein the humanoid robot COG became provisionally operational during the last year, and graduate students have started carrying out experiments with the hardware.
Mr. Matthew Williamson has installed his six degree of freedom series elastic actuator arm on COG. He has implemented a low level control system to make the joints act like spring muscles, and then on top of this has implemented the first robotic version of equilibrium point control, a theory developed by Professor Emilio Bizzi of the Brain and Cognitive Science department at MIT.
Mr. Brian Scasselati, Mr. Matthew Marjanovic, and Mr. Matthew Williamson have cooperated in building a system which is able to learn how to move COG's arm in reaching motions based on visual feedback.
Ms. Cynthia Ferrell has integrated a system for visual saccades with a head motion system so that COG can look around the room, much like a human does, coordinating eye and head motion to maintain a steady view of the world.
Professor Brooks has continued to work with NASA and JPL on developing technology for autonomous exploration of Mars.
Ms. Liana Lorigo took the Pebbles robot out to JPL and successfully tested her visual navigation algorithms in both their indoor Mars facility and their outdoor "Mars yard."
Ms. Chandana Paul implemented a control system for a small manipulator for Pebbles---this manipulator was deisnged and built the previous year by Mr. Matthew Williamson and makes use of series elastic actuators so that the arm can hammer, pry, and dig. Ms. Yoky Matsuoka built a new end-effector for this arm.
Mr. James McLurkin developed an operational 30 gram micro-rover. He commenced design work on a 15 gram rover, designed to look for evidence of micro fossils on the surface of Mars.
Professor Brooks and Dr. Una-May O'Reilly worked on evolutionary techniques to develop programs automatically. The first domain in which they are working is face detection. This was chosen because there has been a lot of work done over the last few years in trying to write programs that can do this well, and the results leave a lot to be desired. Thus if progress can be demonstrated with evolutionary techniques it will validate the techniques as being able to succeed on hard problems that have eluded human efforts.
Professor Pratt's group has been working on robotic systems that are specialized for interaction with natural environments. Unlike traditional industrial robots, these robots are far less concerned with positional accuracy and maximal slew rate. Instead, they must have smooth and stable force-controlled interactions with unknown environments, including tolerance to unexpected shock. Such requirements have led Professor Pratt and his students to develop the concept of series-elastic actuators, which differ from conventional actuators in their purposeful placement of mechanical elasticity between the transmission and the load. This has both been shown theoretically, and demonstrated in prototype hardware, to provide a significant improvement in terminal characteristics. Best of all, this series-elastic design methodology may soon allow for an order of magnitude reduction in the cost of high-quality force actuators because inexpensive gear trains may be used. Given the high number of degrees of freedom required for natural robots, this cost reduction is important.
Robots that interact with natural environments also need control systems that differ from those used in industrial robots, and developing such control systems is a second thrust of Professor Pratt's group. Path planning and accuracy are unimportant, as the environment is dynamic and unpredictable. Rather, what is needed is an appropriate behavioral abstraction to reduce control bandwidth and increase smoothness. In much the same way that early vision extracts discrete features from, and lowers the bandwidth of visual information, this "late motor" processing converts discrete behaviors into smooth interactions and expands the bandwidth of information used to modulate motor action. Professor Pratt's research in this area is presently focused on Virtual Model Control, where simulations of physical systems are used as the processing mechanism for control. Recent efforts have shown this method to be superior to conventional inverse kinematics for describing the control of robot posture.
To support experimental work, Professor Pratt and his students have been constructing a bipedal walking robot and an arm for COG.
Dr. Salisbury's group focuses on three areas: sensor guided grasping, study of human and robot hands, and the development of haptic (touch) interfaces and rendering techniques.
In conjunction with Professor Slotine from the Department of Mechanical Engineering, Dr. Salisbury's group has developed a system that can grasp stationary and freely moving objects. The system is being applied to NASA efforts to enable remote planetary probes to autonomously collect geological samples.
Dr. Salisbury's study of human and robot hands, in collaboration with Dr. Srinivasan at the Research Laboratory for Electronics, has focused on the development of touch perception algorithms to enable robots to deduce contact conditions from simple force-sensing fingertips. Work currently is in progress on a new force-controllable multi-finger hand, and on planning algorithms that enable the hand to continuously reorient objects held in its fingertips.
The group's research in the field of haptics has focused on developing algorithms for rendering the feel of movable rigid objects, and for rendering compliant visco-elastic objects, both using the PHANToM Haptic interface. Additional haptic investigations address "palpating in'" texture and compliant properties to populate their object representations, and the development of heat flow and vibratory stimulators. Applications currently include Naval training tasks, surgical simulation and robotic reactor maintenance modeling.
The Intelligent Room is a new facility constructed during the past year for work on Human Computer Interaction (HCI). The work aims to develop a computational infrastructure that is aware of people, their actions, and their goals through the use of vision and sound understanding. During the first year, students of Professors Brooks, Lozano-Perez and Stein have developed systems that enables presenters to use speech and gesture to navigate through a presentation prepared on the World Wide Web and to use gesture to assign video sources to monitors. Other software from Dr. Victor Zue's group at the Laboratory for Computer Science and from Jim Hollan's group at University of New Mexico have been integrated into the Room.
Professor Stein's group also works on software and information agents. Mr. Michael Coen's SodaBot/SodaBotL is a software agent environment and construction system designed to facilitate the rapid prototyping and deployment of personal and application-based software agents, as well as communities of semi-autonomous associate systems.
Prior to SodaBot, building agents has generally involved a multi-layered approach ranging from low-level "system-hacking" (e.g. of mailers, networks, etc.) to high-level application development (e.g., a meeting scheduler), and everything in between. Each layers can require a substantial amount of independent implementation and debugging. Additionally, it can be difficult to distribute new agents: they tend to be site-specific in intricate ways, and disconnecting them from their local dependencies can be technically involved.
Mr. Mark Torrance's "Active Notebook" is a tool that organizes collected information according to a user's personal conceptual taxonomy. Over the next year, Active Notebook will be expanded to better facilitate groupware applications.
Professor Winston's group has concentrated recently on developing representations that enable learning and reasoning by analogy. One important component of this research concerns the representation of change in a qualitative manner, such that a remembered sequence of changes can be used as a precedent for understanding how some subsequent situation is evolving. This work, led by Dr. Gary Borchardt, is grounded in the key insight that there is much to be gained by viewing the world from a transition-centered perspective, rather than a state-centered perspective. In principle, a world's state embodies enough knowledge of the world to determine its future evolution, but this state-centered perspective has led to all sorts of practical obstacles (including the "frame" problem and the "context" problem). In contrast, from the transition-centered perspective, transitions cause transitions. Accordingly, the transition space representation focuses on what is changing rather than on the static properties of things.
During the past year, the core functionality of version 2.0 of the IMPACT system was completed. This system uses the transition space representation to support interactive development, modification and tracking of plans on the basis of real-world observables. Users of IMPACT work with a "spreadsheet for events" which integrates the collection and reconciliation of reports from the field, provides a graphic presentation of that information, propagates constraints, and supplies "what if" reasoning functionality regarding possible events and their consequences.
Work on the IMPACT system forms one component of a larger effort called the Infolab Project, which is focused on the development of systems that help humans solve problems on human terms, interacting through the concentrated use of simple language, images, diagrams, and other modes of expression that are intrinsically meaningful to humans and engage intuitive human problem-solving skills. To this end, Dr. Boris Katz has been developing the START natural language query retrieval system. During the past year, members of Professor Winston's group have assisted Dr. Katz in constructing the START Bosnia Information Server, a World Wide Web resource that provides access to multi-media information in response to natural language questions about the U.S. mission in Bosnia-Herzegovina.
The research of the Project for Mathematics and Computation (Project MaC), under the direction of Professors Abelson and Sussman, is working to demonstrate breakthrough applications that exploit new computer representations and reasoning mechanisms that they have developed. These mechanisms enable intelligent systems to autonomously design, monitor, and understand complex physical systems through appropriate mixtures of numerical computing, symbolic computing, and knowledge-based methods. They call this mixed approach intelligent simulation.
Systems incorporating intelligent simulation can automatically prepare numerical experiments from high-level domain descriptions. They automatically select and configure appropriate numerical methods. They actively monitor numerical and physical experiments. They automatically analyze the results of such experiments, using domain knowledge to interpret the numerical results, and they report these results to their human users in high-level qualitative terms. In favorable cases intelligent simulation programs can automatically configure special-purpose hardware for efficient execution of computationally demanding numerical experiments.
The group has demonstrated the basic capabilities of intelligent simulation systems. They have implemented computer programs that interpret numerical simulations of nonlinear systems, automatically producing summary descriptions similar to those in the published literature.
Recently, the group has also demonstrated that intelligent simulation can help in creating dynamically stabilized structures. Such structures will be sensitive and active, incorporating networks of high-performance controllers. They have constructed and demonstrated a prototype column that is actively stabilized by piezo-electric actuators. This column supports 5.6 more load than a passive column of the same size could support. The have also demonstrated a truss bridge that uses actively stabilized members to support greater loads than would be possible without active control.
Over the past four years, five of the group's recent graduates have received National Young Investigator awards, largely based on their work on the development and application of intelligent simulation technology. Professor Jack Wisdom, a collaborator with the group, was awarded a MacArthur Fellowship, partly on the basis of work done here.
Professor Davis, Dr. Shrobe, and their associates are building knowledge-based systems that use models of structure, function, and causality to perform a wide range of problem solving and reasoning tasks. Their systems reason about how devices work and how they fail in a manner similar to an experienced engineer. This is an important advance in the art of knowledge-based systems construction, because it provides the system with a more fundamental understanding of the device than is possible using traditional approaches.
Recent work is focused on understanding how things work in domains that include simple mechanical devices and mechanistic explanations of biological phenomena. Examples of understanding include the ability to produce descriptions of device behavior from a description of their structure, the ability to predict behavior under unusual circumstances, and the ability to redesign to fit those new circumstances.
Professor Davis has also been leading the Intelligent Information Infrastructure project, which is concerned with the next generation of ideas and software to support the National Information Infrastructure. The basic assumption is that the National Information Infrastructure should have intelligence embedded into it, allowing it to understand the information it is carrying and enabling it to provide the foundation for new ways to gather, organize, and transmit knowledge, as well as new ways to operate organizations to take advantage of new knowledge structures.
The members of the project have built a variety of systems, including the publication/distribution system used by the White House Office of Media affairs, in use routinely since January 20, 1993 to distribute OMA publications nationally and internationally, and an on-line surveying system used to determine the size and character of the audience receiving the documents. They have also developed and used the START system to provide a natural-language based information resource.
The Symbolic Parallel Architecture group, under the direction of Dr. Knight, has been developing technology for the next generation of parallel computer systems. The group is developing, for example, compilers that automatically feed back experience in running code for use in the layout and compilation process. Work on extremely low power computing using reversible logic approaches also continues, with funding in place for implementing the first fully reversible computer system using the group's low power technology. The Abacus SIMD vision processor component has been fabricated, and is undergoing initial testing. Abacus is a designed for high-speed processor-per-pixel image handling for early vision applications. Finally, chip-to-chip freespace signaling technologies are under development for use in novel self-assembling arrays of processors. Such processor chip arrays are under study for applications in sensor and effector systems, as well as for implementing simple, local communication for modeling physical systems.
The Concurrent VLSI Architecture Group, under the direction of Professor Dally, develops techniques for applying VLSI technology to solve information-processing problems. The group has been developing the M-Machine, an experimental parallel computer that tests new concepts for the control of multiple arithmetic units, interprocessor communication, and memory addressing. During the past year, the group has completed most of the register-transfer-level design of the M-Machine's multi-ALU processor (MAP) chip. Circuit design and layout of portions of the chip have been performed in collaboration with the Microelectronics Center of North Carolina (MCNC). With collaborators at the California Institute of Technology, they have been adapting the Multiflow compiler to generate parallel code for the M-Machine, and they have been writing an operating system that provides coherent shared memory in software.
The group has also been developing the Reliable Router, a multicomputer network component. The Reliable Router demonstrates new algorithms for adaptive routing and fault tolerance in interconnection networks. It also demonstrates new circuit techniques for simultaneous bidirectional signalling (sending bits in both directions simultaneously over one wire) and plesiochronous synchronization.
The study of participation in Multi-Agent Systems (MAS) can help develop scalable, plural, open information infrastructures, comprised of humans, equipment, and services. Participation in MAS depends on a synergy of interdependent, overlapping, and mutually supporting information infrastructures: availability infrastructures and participatory infrastructures. Availability infrastructures impact development of participatory infrastructures: e.g., mobile wireless infrastructure platforms can be immediately informed on-site about current whereabouts and participation, providing information needed for message screening infrastructures to manage interruptions. Participatory infrastructures impact development of availability infrastructures: e.g., message screening infrastructures reduce unwanted interruptions of the sort that cannot be tolerated in wireless communications.
Availability Infrastructures make information services available, enabling MAS participants. Availability infrastructures provide accessibility, immediacy, responsiveness, and transparency. Participatory infrastructures provide services for participating and accounting for participation. They provide services which semantically link activities for operating, predicting, assessing, and planning.
Professor Hewitt's group is developing foundations to address how participation is described and processed using telecomputer services. They call their approach Participatory Semantics. Their approach concentrates on participation how, when, who, what, where, etc. They take participation as distributed, open (multi-agent) activity. They take semantics of participation broadly: it includes influences on all subsequent participation. Criteria they have identified for participatory semantics are concurrency, scalability, plurality, and openness.
Patrick Henry Winston
MIT Reports to the President 1995-96