| Search Guide for Web Publishers |
|
The MIT Search Engine uses the Ultraseek server from Inktomi to index and search its all world-readable web pages in the mit.edu domain. Information Systems has been running this engine since 1998, and its usage has grown, especially as site maintainers create custom search forms restricting content searches to their own pages. The index is maintained by a spider, which continuously browses MIT web sites to find new and updated pages. New pages are added to the index when they are linked to from another page that's already in the index (e.g. the MIT Home Page).
Check for your Web Pages in the
Search Engine
There are a number of ways to check if your pages are in the search engine. You
can search for your site by putting in keywords describing your content, or you
can look for pages matching your URL pattern. The url: search provides the most
accurate view of which of your pages are in the index. To see if your web pages
are in the search engine, search for your pages using the form below - just enter
your site address (e.g. web.mit.edu/network) after
url: If your pages are not in the engine, you
can have them included.
Include your Web Pages in the
Search Engine
The search spider runs continously, periodically revisiting pages, and indexing
new web content. The indexer is "smart", in that it will revisit pages
that change frequently more often than pages that rarely change over time.
If you do make significant changes to your existing pages, or you create new pages that you want indexed, you can submit a request for indexing by contacting search@mit.edu with your web site address. To have the site stay in the engine, it will need to be linked from somewhere else on the MIT site, such as from the MIT Home Page or the MIT Community Home Pages.
Tips for Labeling your Web Pages
The search engine looks at many factors to rank web pages in the search engine,
including the page title, meta tag embedded keywords and description, and how
often a page is linked on other web pages. The search engine cannot "guarantee"
putting a page above others, but a page has a better chance of being found if
it is well-labeled and linked to from various MIT web sites.
For additional tips on labeling web pages, the W3C has recommendations for helping search engines index your web site in the HTML 4.0 specification.
Exclude your Web Pages from the
Search Engine
There are a few ways available to exclude your pages from the MIT search engine
and other search engines. Search engines and other automated web page visitors
are referred to as "robots", and this term perpetuates in the solutions to exclude
pages from search engines. MIT web publishers can restrict different levels of
content: an individual web page, a directory, or an entire web server.
To exclude on a page by page basis, use a robots meta tag in the header. This will prevent all robot from indexing your page:
<head> <meta name="robots" content="noindex,nofollow">
</head>
To exclude a file or directory on web.mit.edu, put the phrase 'dontindex' somewhere in the URL. This is especially useful if you are developing a page or site and don't want it indexed yet. This will work with MIT's search engine, but not necessarily with public search engines such as go.com or AltaVista. For example, the following would not be indexed by the MIT Search Engine:
http://web.mit.edu/lockername/dontindex/
http://web.mit.edu/lockername/DontIndex/anyoldfile.html
http://abcxyz.mit.edu/dontindexme.html
To exclude an entire web server from both the MIT search engine and other robots, you can create a robots.txt file on the server. Documentation is available from a number of sources, including The Web Robots Pages and the W3.org. This text file is usually two lines to prevent all robots from indexing your site.
Web servers that restrict access via certificates are not included in the index. These restricted pages have an "s" in the url, such as https://mit.edu ("s" stands for "secure").
Create a Custom Search Form
MIT's search engine allows anyone to create a web form that searches specific
areas of content. Sample custom forms include MIT
Employment Opportunities, Libraries
Search, and the top page of RLSLP: Living
at MIT.
A simple search form is one that has a one-box form to search a specific web
site, such as this one for searching the IS web site:
| HTML CODE: <form method=GET action="http://search.mit.edu/query.html"> <b>Search IS Web Site</b><br> <input type=hidden name=qp value="url:mit.edu/is"> <input type=text name=qt size=35 value="" maxlength=500> <input type=submit value="search" name="submit"> </form> |
Note the hidden variable in this form - the "qp" variable, which restricts the search to web pages that only match the value of "url:mit.edu/is".
To create more complex forms, refer to the FAQ: Writing Custom Query Forms. If you need additional assistance, please contact search@mit.edu with a description of the type of search form you want to create, and the maintainers will guide you.