Received: from ATHENA-AS-WELL.MIT.EDU by po7.MIT.EDU (5.61/4.7) id AA17564; Tue, 22 Mar 94 10:12:06 EST Received: from MARINARA.MIT.EDU by MIT.EDU with SMTP id AA02062; Tue, 22 Mar 94 10:12:05 EST From: rdshydur@MIT.EDU Received: by marinara.MIT.EDU (5.57/4.7) id AA08345; Tue, 22 Mar 94 10:12:04 -0500 Message-Id: <9403221512.AA08345@marinara.MIT.EDU> To: eddie@media.MIT.EDU Cc: rdshydur@MIT.EDU Subject: full-text of _automating image services_ ... Date: Tue, 22 Mar 94 10:12:03 EST Automating Image Services Within the next decade, the majority of data carried over telecommunications links is likely to be visual material. At present, however, the technologies for storing, organising, searching, and presenting images for use in multimedia applications are still in their infancy. With the support of British Telecom, Professor Alex Pentland and his colleagues at MIT's Media Laboratory are working to develop mathematical and computer models that will allow images to be stored, searched, and edited easily and cost-effiectively. In order to retrieve video sequences, the computer must have some way of knowing what is in the video. Researchers have experimented with stream-based annotation, in which specific attributes describing what is happening in the video over time are identified and linked to relevant portions portions of a stream or shot. But because of the density of data in images, it is often impossible to completely annotate an image database. For instance, to compare 1,000 pictures of people based on their appearance would require nearly 500,000 database entries. It would be much better if the computer could "see" what is in the images, so that it could answer questions by actually looking at the pictures themselves. The challenge is to express the content of the image compactly. Pentland and Professor Rosalind Picard and their students are exploiting notions of semantic bandwidth compression to build Photobook, a powerful image database tool. This tool takes measurements of image features -- brightness, edges, texture measures, etc. -- and then uses transform coding techniques to obtain an optimally description of the set of images in terms of their salient characteristics. The resulting representation has been found to be more robust than the original for search and recognition tasks, and provides a highly compact code for image compression. Photobook has been used to search a database of 7,562 images of about 3,000 people. To respond to a command to find a person who looks like a person in a photo supplied by the operator, the system first uses Framer -- a persistent knowledge-representation language developed at MIT -- to help screen out the most likely search candidates based on logical groupings of attributes such as age, gender, race, etc. It then searches through the images themselves, displaying them in descending order of similarity of the original. Possible applications for Photobook range from customs, security, and criminal investigation to dating services. MIT researchers are also developing tools to analyse textures in images. Their work is being incorporated into Photobook for very different sorts of applications. In the fashion and retailing industries, for example, a designer or buyer might browse a large database of designs and fabrics while incorporating factors such as material composition and manufacturing costs in the search. Existing mathematical models fall short of capturing the variety that texture takes on in the natural world. To address this problem, Picard and her students are developing a new model based on Wold's Theorem, which allows one to break a regular texture into three mutually orthogonal components. The model is the first to address the three most important perceptual components simultaneoulsy in one mathematical framework. Another problem for image database tools like Photobook is how best to describe the shape of objects so that they can be identified and matched from view to view. Camera motion, non- rigid objects, and noise all make this difficult. Pentland and graduate student Stan Sclaroff are developing Modal Matching, a new mathematical method for describing and comparing object shape that characterises objects by their generalised symmetries, as defined by the object's physical vibration or deformation modes. The resulting "modal descriptions" can be used as a new type of image coding where similarities and differences are coded in terms of shape, rather than pixel brightness. The method is also useful for fusing data from different sensors, or for comparing data obtained at different times or under different conditions. lifted w/o permission from _the mit report_ february '94, page 3 -------- - r .