Project Abstract:
The recent exponential growth in the amount of data available in
electronic format has boosted the research in Information Retrieval
on one side, because of the need to find the data one is looking for – and here are applied techniques from artificial intelligence,
natural language processing and so on; and, on the other side, in
Peer-to-Peer networks because of the storage requirements.
Centralised data solutions impose high stress in terms of actual
hard disk space, network bandwidth, resilience and availability
of data. The peer-to-peer concept makes all the computers
connected to a network users, as well as providers of
information and services. P2P networking brings a new solution to
the cost and administration problems, but also brings new protocols
and, most important for us, new opportunities for development. Our
research focuses mainly on derivations of the vector space model
for Information Retrieval on structured peer-to-peer networks.
In this model, every document is represented as a vector of
numbers - one for each term in the document in the most basic
approach of this method. Many applications are possible and we
are studying applications of Latent Semantic Indexing and, more
recently, wavelets decompositions. The main problems we are faced
with are the need for a bijective mapping between terms and
numbers, the high dimensionality of the search space, the acquiring
of global information like the IDF (inverse document frequency)
and the synonimy and polysemy issues that are specific to vector
space model based IR techniques. Wavelets seem to provide a
solution by which one might use different levels of approximation
of the functions describing documents, according to the bandwidth
available between two nodes. Our aim is to develop a system where
a query is answered in constant time (i.e. independent of the
number of nodes in the network) while keeping the amount of
communication and storage at a reasonable amount. |