Annual Report Homepage   Previous Next
SMA Logo & Rationale
Front Cover Design
Preamble
Vision & Mission
Committee Members & Directors
Programme Co-Chairs
Messages
Milestones
Programmes
Research
Particpation by Industry & NRIs
Job Placement
Admissions
Faculty & Staff
Students & Alumni
Events
 
     
  Ph.D. Projects (2004/2005)  
  Project abstracts can be viewed from the CD-ROM which is enclosed or the SMA website (http://www.sma.nus.edu.sg).  
     
  HPCES Programme IMST Programme MEBCS Programme CS Programme  
     
  CS Programme  
     
 
Information Retrieval in Peer-to-Peer Networks
     
Student :
Mihai Lupu
     
Thesis Advisor (Singapore) :
Prof Ooi Beng Chin
     
Thesis Advisor (MIT) :
Prof Stuart Madnick
     
 
 

Project Abstract:

The recent exponential growth in the amount of data available in electronic format has boosted the research in Information Retrieval on one side, because of the need to find the data one is looking for – and here are applied techniques from artificial intelligence, natural language processing and so on; and, on the other side, in Peer-to-Peer networks because of the storage requirements. Centralised data solutions impose high stress in terms of actual hard disk space, network bandwidth, resilience and availability of data. The peer-to-peer concept makes all the computers connected to a network users, as well as providers of information and services. P2P networking brings a new solution to the cost and administration problems, but also brings new protocols and, most important for us, new opportunities for development. Our research focuses mainly on derivations of the vector space model for Information Retrieval on structured peer-to-peer networks. In this model, every document is represented as a vector of numbers - one for each term in the document in the most basic approach of this method. Many applications are possible and we are studying applications of Latent Semantic Indexing and, more recently, wavelets decompositions. The main problems we are faced with are the need for a bijective mapping between terms and numbers, the high dimensionality of the search space, the acquiring of global information like the IDF (inverse document frequency) and the synonimy and polysemy issues that are specific to vector space model based IR techniques. Wavelets seem to provide a solution by which one might use different levels of approximation of the functions describing documents, according to the bandwidth available between two nodes. Our aim is to develop a system where a query is answered in constant time (i.e. independent of the number of nodes in the network) while keeping the amount of communication and storage at a reasonable amount.

 
     
  - Go back to titles  
     
Back to Top   Previous Next