DRAFT DRAFT DRAFT
MIT Information Technology
MIT ID Database V2.0 System Design
mitid@mit.edu
(http://web.mit.edu/mitid/www/v2)
Last modified: March 31, 1998
MIT ID Database V2.0 System Design
Table of Contents
-
Overview
-
MIT ID Database Design Overview
-
Issues
Overview
This System Design document outlines planned design changes to be made
to the MIT ID Database service as described in the Version
2.0 Project Charter. The new Graduate Admissions End User Application
which is planned to begin roll out to departmental offices during the spring
of 1998, will be used by departmental admissions offices to maintain applicant
data within the MITSIS Mid-tier database. Once all required applicant data
is received, a Graduate Admissions administrative application will be used
to "clean up" the data in the Mid-tier before migrating it to the MITSIS
Backend database. Graduate Admissions
Overview Version 2.0 changes to the MIT ID Database service include
those required to enable the Graduate Admissions administrative application
to utilize the MIT ID lookup and assignment functionality of the MIT ID
Database. Specifically these changes include the following functional changes:
-
Add previous student records in MITSIS to the Data Warehouse
-
Improve API usability within a batch processing environment
-
Implement Multiple ID Identification, Resolution, and Notification
-
Port DLL to 32 bit (VC++)
-
Port standalone application to 32bit (PB 5.0)
-
Port standalone application to Kerberos 5
-
Maintain compatibility for existing clients, if reasonable.
MIT ID Database Design Overview
The MIT ID Database service consists of the following primary components:
-
MITID DLL
-
MITID Server
-
MITID Database
The MITID DLL (Dynamically Linked Library) implements the MITID API (Application
Programmer Interface) to be integrated with Departmental Systems. Additionally,
a standalone application which uses the DLL is also provided. The standalone
application functions as a sample application and enables people
to use the service before it becomes fully integrated into their departmental
systems.
The MIT ID Server consists of a server daemon process handling transactional
lookup and assignment requests. All applications that use the MIT ID Database
do so through the MIT ID Server as it is this server that actually implements
the search algorithm.
The MITID Database is the Institute system of record for MIT ID numbers.
It is not the system of record for all information about people.
It includes a minimal set of biographical information about people in order
to support its lookup functionality. This information is entered when the
the ID is created. Changes to the biographical information is updated via
a feed from the Data Warehouse.
The following diagram depicts the dataflow betwen the different components
of the MIT ID Database service.
This high level design will not be changing in V2.0 however the implementations
of each component will change. These changes will be descried in the following
sections.
DLL Design Changes
The MIT ID Application Programer Interface (API), implemented via a Dynamically
Linked Library (DLL), is the mechanism through which departmental systems
gain access to the MIT ID Database.
Port to 32 bit
The V1.0 DLL (people.dll) is a 16 bit DLL developed for use under Windows
3.1 but does work for 16 bit applications under Windows 95 and Windows
NT. The V2.0 DLL (mitid32.dll) will be a port to a Windows NT 32 bit implementation.
Windows specific code will be used only where necessary so that the DLL
may be ported to additional platforms in the future.
Port to Kerberos V5
The version 1.0 DLL uses Kerberos V4 to authenticate the end user.
The new version will add support for Kerberos V5 using the GSSAPI.
Port to PowerBuilder 5.0
The MITID stand alone application will be ported to 32 bit PowerBuilder
5.0 to take advantage of the changes in the V2.0 changes. In addition the
application will be modified to take a formatted file of records and to
process this batch of records in such a way as to determine which records
have EXACT_MATCH, NO_MATCH and POSSIBLE_MATCH as per the new algorithm.
Improve API Usability
The API will undergo substantial re-design to enable better batch processing
support and to simplify its use while making it more configurable and extendable.
The V1.0 API was designed to work primarily in an interactive mode where
the displayed result list would be manually reviewed for each query to
determine if there was a match or not. The V2.0 matching algorithm will
be improved so that only questionable results will require a manual review.
The V1.0 API includes a number of data structures that each client application
must declare and utilize. V2.0 will combine these structures into a single
structure to be refereed to as the MITID_OBJECT. The MITID_OBJECT will
encompass all of the data to be used by the DLL. The V2.0 API will include
access functions for applications to read and write data to and from the
MITID_OBJECT.
The MITID_OBJECT will include references to the following structures:
PROCESS_INFO, QUERY_INFO, RESULTS_INFO, VERSION_INFO. Some of these structures
have existing counterparts in V1.0 but will be extended to include additional
information such as "host name" and "port id" for the server, connection_status,
and query_options. The V2.0 MITID_OBJECT structure will replace the following
1.0 structures: PROCESS_STRUCTURE (a.k.a. CONTEXT), PERSON_STRUCTURE, QUERY_PERSON_TYPE.
Since the 2.0 server will continue to support backward compatibility
for 16 bit clients using the 1.0 API and DLL, the 2.0 API need
not be backward compatible. NOTE: Future server updates will make
the 1.0 API obsolete so all applications should utilize the 2.0 API. The
following table illustrates the migration of the API from 1.0 to 2.0.
Specific changes to the Version 2.0 API can be found in the document MIT
ID 2.0 Application Programmer Interface.
Server Design Changes
The server component of the MIT ID Database is an Oracle PRO-C program
used in conjunction with the INET Daemon server which essentially enables
the server program to communicate with any Internet client application
via a Socket. The server authenticates its clients via Kerberos and then
determines if they are authorized to use the service. Authentication is
currently handled via Kerberos V4 but the server will be changed to also
handle Kerberos V5 clients. Authorizations are currently implemented via
the MIT ID Database but may in the future move to the Roles Database; no
major changes in the authorization mechanism are planned for this release.
Additional changes may be made for improved maintainability and operability
of server application.
Server backward compatibility is required for this release so that existing
deployed V1.0 clients do not need to be replaced until they are ready to
go to this new 32 bit implementation. While the server will continue to
support its current functionality to V1.0 clients, new functionality will
only be available to the V2.0 clients. This includes improvements to the
matching algorithm and results feedback.
The following tables summarize the V2.0 Server Connection Process and
Search Algorithm.
V2.0 Server Connection Process
Client connects |
No change |
Determine client version |
Per Connection Configurations enable simultaneous connections
from both V1.0 and V2.0 clients. |
Client connection authenticated (Kerberos) |
Either V4 or V5 |
Parse connection Request (search, assign) |
V2.0 Clients will be able to specify additional options |
Search/Assignment performed |
See the Search Algorithm below |
Results returned |
The V2.0 result set as well as each individual result
record will be identified as EXACT_MATCH, NO_MATCH or POSSIBLE_MATCH |
Connection closes |
No change |
Search Algorithm
Existing in V1.0
|
V2.0 Changes
|
Cast Wide Net - Get any/all remotely matching
records from database:
-
-Matching SSN
-
-Matching first name
-
-Matching soundex on 'lastName'
-
-Matching soundex on 'maybe_lastName'
-
"maybe_lastName" includes phonetically similar names:
-
PH -> F
-
F -> PH
-
KN -> N
-
N -> KN
-
C -> K
-
K -> C
-
-Matching MMDD
-
-Matching DDMM (for date mix-ups)
|
Clients will be able to specify search options to control
the size of this initial data set. (Specific options yet to be determined
but may include options to omit soundex records or suspected obsolete records.
An OBSOLETE_RECORD indicator may be used by the database to flag records
to be ignored by the search algorithm. See database changes for more
details.) |
Evaluate Results - Assign a numeric value for
each field and sum for record
-
SSN - MatchDigitPercent
-
DOB - MatchDigitPercent
-
LastName - matchAlphaPercentage
-
MiddleName - matchAlphaPercentage
-
FirstName - matchAlphaPercentage
MatchAlpaPercentage
-matching characters
-matching positions
-target length
MatchDigitPercentage
-Number of matching digits
-matching positions
-target length |
Version 1.0 clients will continue to use the existing
matching algorithm while V2.0 will use a different search algorithm which
can more specifically identify the individual results as MATCH, NO_MATCH,
and POSSIBLE_MATCH. (Specific changes yet to be determined.) |
Results Returned and displayed
-
SearchPersonMatch -One Exact Match (100.0)
-
SearchPersonList -Multiple Results Found
-
SearchPersonNone -No Results Found
|
(Specific changes yet to be determined.) |
Database Design Changes
The only significant changes to the database in V2.0 will be to assist
with the cleanup of MIT ID Data problems (See MIT
ID Data Problems ) and to improve the data feeds between the MIT
ID Database and the Warehouse.
Data Cleanup
When multiple MIT IDs are found for a single person, human interaction
will be required to determine which record (and therefore which MITID)
is to be used by the search algorithm in the server application. There
will be a manual mechanism to set an OBSOLETE_RECORD indicator that the
search algorithm can use to filter out these records. By default the OBSOLETE_RECORD
indicator will be zero (0). Other valid MIT_ID values will be used to either
reference the correct MIT ID or to indicate that the correct ID is unknown.
Keeping these records in the database for historical reasons will prevent
the MIT IDs from accidentally being used again.
Data Feeds
The Data feeds between the Warehouse and the MIT ID Database ensure that
data between these two databases remain synchronized. Updates are needed
in both directions. The current feed updates all records in the MIT ID
Database even if there are no changes and does not recognize when records
have disappeared from the feed. The new feed will only update changed records
and will flag records that have disappeared from the feed using the OBSOLETE_RECORD
indicator. Updating only changed records will be important as the number
of records is expected to increase significantly when past students are
added to the Warehouse.
Currently there is no data feed from the MIT ID database back to the
Warehouse; this will need to change in V2.0 so that changes in each of
these two system will remain synchronized.
Additionally, a set of scripts are needed to assist with the manual
cleanup of data problems as they are identified. In particular, a mechanism
is needed to better identify and resolve Multiple ID issues, that is when
there are multiple entries for a single person resulting in multiple ID.
With the addition of the OBSOLETE_RECORD indicator, these scripts do not
need to be in production immediately, however being able to identify multiple
ID proactively and being able to eliminate them from search results in
a timely manner is critical to the success of the MIT ID Database.
Open Issues and Questions
Open Issues and Questions
ISSUE/QUESTION
|
STATUS
|
DATE
|
No support for previous names:
Their currently is no mechanism to allow searches on
previous or alternative names, however some departmental systems do provide
this capability. |
OPEN
|
03/19/1998
|
Kerberos V4 and V5:
Will we need to support both Kerberos V4 and V5 in the
client? |
OPEN
|
03/19/1998
|
Server Search Options and algorithm changes:
Need to solidify Server Search Options to be available
in this release as well as changes to the search algorithm to determine
result status |
OPEN
|
03/19/1998 |
Data Feed:
Need to solidify Changes |
OPEN
|
03/19/1998 |
Vendor Product Integration:
Need to specify how vendor products not using the MITID
DLL will be handled. |
OPEN
|
03/19/1998
|