MIT Search

 

Ultraseek Fact Sheet


Introduction
MIT recently purchased Ultraseek to serve as the main search engine for all of the campus' web content. This includes the main campus web server, web.mit.edu, as well as individual department servers.

Ultraseek was chosen for MIT because of its extensive functionality, including:

Full Text, Intelligent Searching
Ultraseek fully indexes the text of documents, allowing users to search for any word or phrase. The index has proper name and phrase recognition built in, as well as natural language queries, where users can ask questions like "What is MIT's grading policy?" Ultraseek also incorporates Xerox Lexical Technology which analyzes the query and picks out key phrases for greater precision.

Searching by Field, such as site: and url:
Ultraseek includes the ability to search by certain fields, such as title, URL, or hyperlink. Ultraseek is polite to "standard" meta tags, as well as custom fields a web publisher might create using meta tags.

Users may restrict their searches to a particular site using the site: field search. However, since much of MIT's content is on one "site", web.mit.edu, users will also want to use the url: field search to include more than the web.mit.edu in the query term.

Advanced (Assisted) Search
An important user feature is the easy to use Advanced search, which is an assisted search that allows complex queries to be done syntax-free. Users do not need to understand exactly how Boolean operators work with this index, as the assisted search guides the user through creating a query.

Through the advanced search, users also have the ability to search their current results only, refining their search query. Users can also search for documents published within a certain range of dates, as well as sort the results by date or relevance.

Spider and Index Features

Document Types
Ultraseek supports the indexing of multiple document types, including


Ultraseek ignores certain files (for example .tar files) because it indexing such files would cause unnecessary load on the server, and the information is not "web content".

Indexing Frames
Ultraseek's spider will follow links to content it can discover in framesets, but will return only the document that contains the information that was searched, not the entire frameset. If a document is part of a frameset, the frames will be ignored and only that single document will be displayed. Because of the complex coding required for frames, CWIS still does not recommend using them to display content.

Scoring Search Results
For each query, Ultraseek provides a percentage value that indicates how relevant to the query that document is. Documents with scores close to 100 would contain most, if not all, of the query terms (including any extracted phrases based on the linguistic processing), AND that the terms are only found in a relatively small number of documents.

Scoring documents takes a number of factors into account:



mit Comments to cwis-help@mit.edu
$Date: 2002/01/03 04:26:12 $