Search Guide for Web Publishers

Search MITSearching Tips
Search Specific Content Areas
Search Guide for Web Publishers
Contact search@mit.edu


The MIT Search Engine uses the Ultraseek server from Inktomi to index and search its all world-readable web pages in the mit.edu domain. Information Systems has been running this engine since 1998, and its usage has grown, especially as site maintainers create custom search forms restricting content searches to their own pages. The index is maintained by a spider, which continuously browses MIT web sites to find new and updated pages. New pages are added to the index when they are linked to from another page that's already in the index (e.g. the MIT Home Page).


Check for your Web Pages in the Search Engine
There are a number of ways to check if your pages are in the search engine. You can search for your site by putting in keywords describing your content, or you can look for pages matching your URL pattern. The url: search provides the most accurate view of which of your pages are in the index. To see if your web pages are in the search engine, search for your pages using the form below - just enter your site address (e.g. web.mit.edu/network) after url:    If your pages are not in the engine, you can have them included.


Include your Web Pages in the Search Engine
The search spider runs continously, periodically revisiting pages, and indexing new web content. The indexer is "smart", in that it will revisit pages that change frequently more often than pages that rarely change over time.

If you do make significant changes to your existing pages, or you create new pages that you want indexed, you can submit a request for indexing by contacting search@mit.edu with your web site address. To have the site stay in the engine, it will need to be linked from somewhere else on the MIT site, such as from the MIT Home Page or the MIT Community Home Pages.


Tips for Labeling your Web Pages
The search engine looks at many factors to rank web pages in the search engine, including the page title, meta tag embedded keywords and description, and how often a page is linked on other web pages. The search engine cannot "guarantee" putting a page above others, but a page has a better chance of being found if it is well-labeled and linked to from various MIT web sites.

For additional tips on labeling web pages, the W3C has recommendations for helping search engines index your web site in the HTML 4.0 specification.


Exclude your Web Pages from the Search Engine
There are a few ways available to exclude your pages from the MIT search engine and other search engines. Search engines and other automated web page visitors are referred to as "robots", and this term perpetuates in the solutions to exclude pages from search engines. MIT web publishers can restrict different levels of content: an individual web page, a directory, or an entire web server.

To exclude on a page by page basis, use a robots meta tag in the header. This will prevent all robot from indexing your page:

<head> <meta name="robots" content="noindex,nofollow"> </head>

To exclude a file or directory on web.mit.edu, put the phrase 'dontindex' somewhere in the URL. This is especially useful if you are developing a page or site and don't want it indexed yet. This will work with MIT's search engine, but not necessarily with public search engines such as go.com or AltaVista. For example, the following would not be indexed by the MIT Search Engine:

http://web.mit.edu/lockername/dontindex/
http://web.mit.edu/lockername/DontIndex/anyoldfile.html
http://abcxyz.mit.edu/dontindexme.html

To exclude an entire web server from both the MIT search engine and other robots, you can create a robots.txt file on the server. Documentation is available from a number of sources, including The Web Robots Pages and the W3.org. This text file is usually two lines to prevent all robots from indexing your site.

Web servers that restrict access via certificates are not included in the index. These restricted pages have an "s" in the url, such as https://mit.edu ("s" stands for "secure").


Create a Custom Search Form
MIT's search engine allows anyone to create a web form that searches specific areas of content. Sample custom forms include MIT Employment Opportunities, Libraries Search, and the top page of RLSLP: Living at MIT.

A simple search form is one that has a one-box form to search a specific web site, such as this one for searching the IS web site:


Search IS Web Site



    HTML CODE:
<form method=GET action="http://search.mit.edu/query.html">
<b>Search IS Web Site</b><br>
<input type=hidden name=qp value="url:mit.edu/is">
<input type=text name=qt size=35 value="" maxlength=500>
<input type=submit value="search" name="submit">
</form>

Note the hidden variable in this form - the "qp" variable, which restricts the search to web pages that only match the value of "url:mit.edu/is".

To create more complex forms, refer to the FAQ: Writing Custom Query Forms. If you need additional assistance, please contact search@mit.edu with a description of the type of search form you want to create, and the maintainers will guide you.



mit Comments to search@mit.edu