DRAFT DRAFT DRAFT


NOTE: Some minor changes to Step 3 of the search algorithm are not reflected in this document. These changes were made after November 1998 and appear in the mitidsearch.c source code file.

UPDATED: 11/30/1998
CREATED: 11/16/1998
BY: Jonathan A. Ives
MIT ID Home Page

MITID Search Algorithm

This document outlines the MITID V2 search algorithm. Differences between V1.0 and 2.0 are identified were appropriate. Comments and questions should be emailed to mitid@mit.edu.

SEARCH OVERVIEW

The search algorithm follows a multiple step process to identify records that either match or possibly match given search criteria. Minimal required search criteria include, first and last names or SSN only or (in V2.0) MITID only. Data of birth and SSN are not required if first and last name are provided however it is best to provide this data when it is available.

The algorithm provides some tolerance for misspelled names as well as first and middle initials. Reversal of first and last name or maiden names taken as middle names is supported. People who have received multiple MITIDs (usually from non-integrated systems) may be "cleaned" up in the database which provides a mapping from old to new MITIDs. Searches on old MITIDs will be mapped to new MITIDs.

The first step of the search algorithm is to identify all of the potential matches by using selection criteria that will retrieve a generous subset of the records from the ID database. Then each field of these potential matches is compared with the corresponding field of the selection criteria. The field selection criteria is then used to determine if the record matches the selection criteria. Finally, the records are sorted and evaluated to determine if a match was found. The following sections details each step of the algorithm.

STEP 1 FIND POTENTIAL MATCHES - (Cast Wide Net)

In this step, many records are retrieved from the database based upon the following selection criteria:
 
Selection Criteria Notes
Matching SSN  Original to V1.0.
Matching soundex on 'lastName' Original to V1.0.
Matching soundex on 'maybe_lastName'  Original to V1.0. "maybe_lastName" includes phonetically similar names
'lastName Matches 'firstName'  Added for V2.0 to catch reversed names
'firstName' matches 'lastName'  Added for V2.0 to catch reversed names
'middleName' Matches 'lastName' Added for V2.0 to catch maiden names taken as first name
Matching MIT_ID  Added for V2.0 when search by MITID is enabled
New MIT_ID  Added for V2.0. If the search criteria specifies an OLD_MITID that ID will be mapped to the NEW_MITID if one has been specified in the database.
Matching first_name Removed as of 2.0 since this selection criteria does not include any records that are not already covered by the above cases AND that would result in either a MATCH_POSSIBLE or MATCH_EXACT for the record. 
Matching Date of Birth MMDD Removed as of 2.0 since this selection criteria does not include any records that are not already covered by the above cases AND that would result in either a MATCH_POSSIBLE or MATCH_EXACT for the record.
Matching Date of Birth DDMM (for date mix-ups) Removed as of 2.0 since this selection criteria does not include any records that are not already covered by the above cases AND that would result in either a MATCH_POSSIBLE or MATCH_EXACT for the record.
 

STEP 2 FIELD MATCH EVALUATION

As of V2.0, each field of every record returned by STEP 1, is evaluated to determine how well that field matches the corresponding search criteria field. Each field is given one of the following values based upon the logic outlined in the table below:
 
Field Match Value Field Match Logic
MISSING_DATA If either the search criteria field or the corresponding target record field is empty, the field is flagged as missing data.
MATCH_EXACT The following cases result in an exact match for the field. 
  • CHARACTER/POSITION % EXACT  - An exact field match occurs when 90% of the characters/digits match  AND 90% of positions match AND the field lengths match. (Percentages are server configurable) .
  • CHARACTER TRANSPOSED - An exact field match occurs when the number of matching characters is equal to the number in the search criteria and one character is transposed.
  • INITIALS OR MAIDEN NAMES - An exact field match occurs for First, Middle, and Last name fields if all characters and positions in the search criteria field match even if the search criteria field only contains part of the full field. 
MATCH_POSSIBLE The following cases result in a possible match for the field: 
  • CHARACTER/POSITION % POSSIBLE - A possible field match occurs when the sum of 1/2 of the character percent match and and 1/2 of the position percent match exceeds the minimal threshold for the given field regardless of field lengths. The threshold is server configurable for each field; current values are  70% for first, middle and last name fields and 90% for SSN, dob, and id fields. 
MATCH_NONE  When none of the above cases are met the field will be considered to be none matching. 

STEP 3 RECORD MATCH EVALUATION

As of V2.0, the field evaluation information from Step 2 is used to evaluate how well each record matches the search criteria. Each record is assigned one of the following values:

   MATCH_EXACT
   MATCH_POSSIBLE
   MATCH_NONE
 
The following tests are applied to each record in the order specified:
 

Test Case Evaluates To Description
ID CHANGED (V2.0) 
 
MATCH_POSSIBLE This is a special case for when an OLD_MITID is entered and a NEW_MITID has been specified for that record. When this happens the entire record is marked as MATCH_POSSIBLE regardless of the other fields.
SSN OR MITID MATCH 
 
MATCH_EXACT 
MATCH_POSSIBLE 
If SSN or MITID are MATCH_EXACT then if no fields are MATCH_NONE  then this record will be a MATCH_EXACT otherwise it will be a MATCH_POSSIBLE. No additional data is required for MATCH_EXACT.
FIRST AND LAST EXACT 
 
MATCH_EXACT 
MATCH_POSSIBLE 
If FIRST and LAST are MATCH_EXACT then if no fields are MATCH_NONE and either SSN or DOB are NOT MISSING_DATA then this record will be a MATCH_EXACT otherwise it will be a MATCH_POSSIBLE 
FIRST AND LAST REVERSED 
 
MATCH_EXACT 
MATCH_POSSIBLE 
If FIRST and LAST are reversed then if no other fields are MATCH_NONE and either SSN or DOB are NOT MISSING_DATA then this record will be a MATCH_EXACT otherwise it will be a MATCH_POSSIBLE 
FIRST AND LAST POSSIBLE 
 
MATCH_POSSIBLE 
MATCH_NONE 
If FIRST or LAST is MATCH_EXACT and the other is MATCH_POSSIBLE then if no other fields are MATCH_NONE then this record will be a MATCH_POSSIBLE otherwise it will be a MATCH_NONE 
FIRST OR LAST POSSIBLE  MATCH_POSSIBLE 
MATCH_NONE 
If FIRST or LAST is MATCH_EXACT and the other is MATCH_POSSIBLE or MATCH_NONE then if no other fields are MATCH_NONE and either SSN or DOB are NOT MISSING_DATA then this record will be a MATCH_POSSIBLE otherwise it will be a MATCH_NONE 
MATCH_NONE  MATCH_NONE None of the above cases met. 
 
 

STEP 4 RESULT SET EVALUATION

All results are sorted so that MATCH_EXACT records appear first in the result set and then the following tests are applied to the result set to determine if a match for the given search criteria was found:
 
Result Set Value Test Description Result Set Returns
MATCH_NONE No MATCH_EXACT or MATCH_POSSIBLE records found. No records will be returned to the client. 
MATCH_EXACT One MATCH_EXACT Record found.  One record will be returned to the client.
MATCH_POSSIBLE 
(Multiple exact records)
Two or more MATCH_EXACT records found. All MATCH_EXACT records returned to the client. 
MATCH_POSSIBLE 
(No exact records)
One or more MATCH_POSSIBLE records found. All MATCH_POSSIBLE records returned to the client.