SMA - Annual Report 20045/2005


	Annual Report Homepage		Previous	Next



Project abstracts can be viewed from the CD-ROM which is enclosed or the SMA website (http://www.sma.nus.edu.sg).

HPCES Programme	IMST Programme	MEBCS Programme	CS Programme

CS Programme

Information Retrieval in Peer-to-Peer Networks

Student	:	Mihai Lupu

Thesis Advisor (Singapore)	:	Prof Ooi Beng Chin

Thesis Advisor (MIT)	:	Prof Stuart Madnick

Project Abstract:

The recent exponential growth in the amount of data available in electronic format has boosted the research in Information Retrieval on one side, because of the need to find the data one is looking for – and here are applied techniques from artificial intelligence, natural language processing and so on; and, on the other side, in Peer-to-Peer networks because of the storage requirements. Centralised data solutions impose high stress in terms of actual hard disk space, network bandwidth, resilience and availability of data. The peer-to-peer concept makes all the computers connected to a network users, as well as providers of information and services. P2P networking brings a new solution to the cost and administration problems, but also brings new protocols and, most important for us, new opportunities for development. Our research focuses mainly on derivations of the vector space model for Information Retrieval on structured peer-to-peer networks. In this model, every document is represented as a vector of numbers - one for each term in the document in the most basic approach of this method. Many applications are possible and we are studying applications of Latent Semantic Indexing and, more recently, wavelets decompositions. The main problems we are faced with are the need for a bijective mapping between terms and numbers, the high dimensionality of the search space, the acquiring of global information like the IDF (inverse document frequency) and the synonimy and polysemy issues that are specific to vector space model based IR techniques. Wavelets seem to provide a solution by which one might use different levels of approximation of the functions describing documents, according to the bandwidth available between two nodes. Our aim is to develop a system where a query is answered in constant time (i.e. independent of the number of nodes in the network) while keeping the amount of communication and storage at a reasonable amount.

- Go back to titles