MIT Reports to the President 1996-97

LABORATORY FOR COMPUTER SCIENCE

The MIT Laboratory for Computer Science (LCS) is an interdepartmental laboratory whose principal goal is research in computer science and technology toward a better life for all people.

Founded as Project MAC in 1963, the Laboratory developed one of the world's earliest time-shared computer systems. This early research (1960-1970) on the Compatible Time Sharing System (CTSS) and its successor, Multics laid the foundation for many of today's commonplace software systems and approaches, such as operating systems written in high-level programming languages, virtual memory, tree directories, on-line scheduling algorithms, line and page editors, secure operating systems, concepts and techniques for access control, computer-aided design, and two of the earliest computer games, space wars and computer chess. Our partner in the Multics effort, AT&T, used many of the early ideas in their design of Unix.

These early developments laid the foundation for the Laboratory's work on knowledge based systems -- the Macsyma program for symbolic mathematics -- natural language understanding, and (with BBN) the development and use of packet networks in the Arpanet. In the late 1970s, Project MAC, renamed as the MIT Laboratory for Computer Science, embarked on research in clinical decision making, public cryptography, distributed systems and languages and parallel systems. These led to the RSA encryption algorithm, data abstractions which served as foundations of object oriented programming, the Clu and Argus distributed systems, the dataflow principle and associated languages and architectures of parallel systems (Monsoon, Id and StarT), local area networks, program specification and workstation development, where the Laboratory contributed the earliest UNIX ports and compilers and the Nubus architecture. This research also led to the X Window System, a computer intercommunication and user interface system which was further developed by the Laboratory's X-Consortium and was widely used in over one thousand different software products.

The Laboratory's current research falls into four principal categories: Information Infrastructure and Distributed Systems; Human Interaction/Intelligent Systems; Computationally Intense Systems; and Theory. The principal goals of these four categories are as follows:

In the areas of Information Infrastructure and Distributed Systems, we wish to understand principles and pursue technologies for the architecture and use of highly scaleable information infrastructures. Transactions among such distributed systems will involve the purchase, sale and free exchange of information and information work toward electronic commerce and shopping, health care, education, business, government and many other uses. We wish to explore new emerging forces such as groupwork across space and time and automation of computer-to-computer actions. This research is expected to have a broad impact on future systems because virtually every machine will be connected to some information infrastructure and such infrastructures are expected to last for a very long time. The Laboratory's World Wide Web Consortium, which, with the help of its participating companies, helps set the standard for a continuously evolving Web is a significant and major focus of our work in this area.

In the Human Interaction/Intelligent Systems area, our technical goals are to understand and construct programs and machines that have greater and more useful sensory and cognitive capabilities so that they may communicate with one another and with people toward useful ends. On the human-interaction side, the two principal areas of our focus are conversational spoken dialogue systems between people and machines and graphics systems used predominantly for output. On the Intelligent Systems side, we focus on programs that reason about clinical issues and help in clinical decision making and health care.

In the Computationally Intense Systems area, we strive to harness the power and economy of numerous processors working on the same task. Research in the area involves the analysis and construction of various parallel hardware architectures, programming languages and operating systems that yield cost-performance improvements of several orders of magnitude relative to single processors. We are also carrying out research on the uses of computationally intense systems in several application areas for the purpose of improving architectures and programs that we develop, based on actual experience and utility.

Taken together, these three thrusts in infrastructure, human interaction/intelligent systems and computationally intense systems define the Laboratory's overarching goal: development, understanding and better human communication with tomorrow's information systems.

In the Laboratory's fourth category of research, Theory, we strive to discover and understand the fundamental forces, rules, and limits of Information Science and Technology. As a result, theoretical work permeates our research efforts in the other three areas; for example, in the pursuit of parallel algorithms, fault tolerant computer networks, and privacy and authentication of communications. Theory also touches on the logic of programs, the inherent complexity of computations, and the use of cryptography and randomness in the formal characterization of knowledge. The Laboratory expends a great deal of effort in theoretical computer science because its impact upon our world is expected to continue its past record of improving our understanding and helping us pursue new frontiers with new models, concepts, methods, and algorithms.

Research highlights during the reporting period are as follows:

The World Wide Web Consortium (W3C): As of this report, 180 organizations have joined this consortium in order to participate in and contribute to the orderly evolution of the World Wide Web (W3). The team currently is very close to its planned size of some 15 full-time equivalents at each site, plus students. We have signed an agreement with Keio University in Japan similar to that with INRIA so that we may have Asian representation and contributions to the evolution of W3. The Consortium's PICS effort for parental control of Internet/Web sites based on third parties' ratings has stimulated the concept of metadata, now for possible inclusion in the Web standard. With metadata, people and machines will be able to represent and, therefore, write and read characterizations about information such as its quality and veracity and appropriateness for designated purposes.

Spoken Language Systems: Our Spoken Language Systems Group has expanded and strengthened the capabilities of their demonstration systems through continued improvements of human language technologies, including speech recognition/synthesis and language understand/generation. For speech, we now focus our attention almost exclusively on telephone input. At the system level, we continue to address multilingual issues, with particular attention paid to Mandarin and Spanish, and to "display-less" dialogue for human computer interactions. During the past year, we introduced WebGalaxy, a conversational interface that is completely embedded in a web browser. More recently, we have also developed Jupiter, a telephone based weather information service that users can call to obtain on-line weather information via a toll free number.

Multiprocessor Architectures: The group has had a busy year building the StarT family of parallel computers. The group will be completing its flagship StarT-Voyager machine at the end of 1997. StarT-Voyager consists of 32 IBM workstations connected to the Arctic switch fabric designed at the Laboratory. Each workstation will be equipped with a network interface unit (NIU) on its memory bus in an aggressive design intended to provide low-latency and high-bandwidth. The Voyager NIU is programmable, making the machine an ideal platform to study distributed shared memory protocols. The first Arctic routing chips have become operational in the past few months and have been tested using the StarT-JR NIU completed last year. The StarT-JR NIU offers easy access to the Arctic network to any machine with a standard PCI bus, though at lower performance than the StarT-Voyager NIU. Still, the great speed (30 MB/sec) of the StarT-JR NIU relative to existing commercial networking technology has led us to plan its incorporation into the Xolas, Digital, and future Intel clusters at the Laboratory for Computer Science. In addition, the group has also been active in all aspects of parallel software research through the exploration of the implicitly parallel programming languages pH and Id. Two new optimizing compilers, based on novel code-generation techniques, will be released this year and are expected to provide greatly improved execution performance.

Theory of Computation: Hot spots arise in the world wide web (and other distributed systems) when a single item or site becomes extremely popular. The server responsible becomes so heavily loaded that it is unable to respond to all the requests. This can lead to extremely long delays or even crash the unfortunate server. Caching is a scheme that can be used to eliminate hot spots---other lightly loaded servers are given copies of the popular items and serve some of the requests for it, reducing the load faced by the original server. To make caching work, one needs good rules for deciding which caches are going to serve which hot items. This is especially complicated in a domain like the internet, where machines are added and removed with great frequency. We developed a new scheme, consistent hashing, that provides the load-balancing benefits of standard hashing but allows for the kind of changing and incomplete information about global state (e.g., which machines are presently up) that is typical of an environment like the web. Combined with a particular randomized embedding of hierarchies of caches in the network, it lets us give provably good bounds on the load experienced by any cache, and guarantee that no hot spots will arise.

Computational Biology: Professor Berger's virus shell assembly work has led to the ongoing development of an integrated system for the discovery of a new class of antivirals called Capsid Assembly Targeted (CAT) antivirals. CAT antivirals are chemotherapeutic agents targeted directly at the protein/protein interactions required for viral capsid assembly. It is anticipated that CAT antivirals will be highly effective, specific, and robust, while at the same time having a short lead time for discovery. The system for their discovery consists of novel computational tools, as well as novel applications of cutting edge biotechnology. This work was reported in a Genetic Engineering News article. The group has also developed programs for identifying new potential coiled coils in protein sequences. Over the course of the last year, their PairCoil program has been run or downloaded over the Web by more than a thousand different people. Their PairCoil and LearnCoil programs have been useful at identifying coiled coil regions in HIV and other viruses such as influenza and Moloney murine leukemia virus, where the coiled coil is thought to be the mechanism by which the virus binds to the cell membrane during infection. The structures of these viruses have since been solved by Prof. Peter S. Kim's lab, whose work on HIV was reported in many press releases, including the NYT's, ABC News, and Science News.

Software Devices and Systems: Dr. Tennenhouse, leader of the Telemedia, Networks and Systems Group, has gone to DARPA where he will direct the Information Technology Office. His group has been merged with Professor Guttag's group, the Systematic Program Development Group. The combined group, called Software Devices and Systems, headed by Professor Guttag, is already well on its way with the development of Spectrumware -- a totally digital approach to the detection and processing of communication signals, from the antenna that receives them to the final function that they are intended to perform.

In June 1997, Business Week's special issue on "The Digital Frontiers" conducted a Delphi Poll that rated research laboratories worldwide. We are proud that LCS was ranked second.

During this reporting period, the Laboratory's Distinguished Lecturer Series included presentations by Dr. Whitfield Diffie, Distinguished Engineer, Sun Microsystems, Inc., Dr. Anita K. Jones, Director, Defense Research and Engineering, DOD, Dr. John Warnock, President and CEO, Adobe Systems, and Professor David R. Cheriton, Stanford University. The Laboratory is organized into 14 research groups, an administrative unit, and a computer service support unit. The Laboratory's membership comprises a total of 459 people, including 79 faculty and research staff, 141 graduate students, 129 undergraduate students, 75 visitors, affiliates, and postdoctoral associates and fellows, and 35 support staff. The academic affiliation of most of the Laboratory's faculty and students is with the Department of Electrical Engineering and Computer Science (EECS).

About 60% of the Laboratory's funding comes from the US Government's Advanced Research Projects Agency. The Laboratory is also funded by and has extensive links with industrial organizations. These include partnerships for the construction of major hardware systems, consortia for the development and maintenance of standards, such as the World Wide Web, and joint studies on research areas of common concern.

More information about the Laboratory can be found on the World Wide Web at http://www.lcs.mit.edu

Michael L. Dertouzos

MIT Reports to the President 1996-97