=================================
CREATED: 1994, Scott Thorne, MIT Data Administrator
EMAILED: TO: isdir, FROM: thorne DATE: Aug 17, 1994
=================================


                     People Related Projects

 Introduction:

        There are several ideas and projects underway  which are  closely 
related. This report attempts to:  identify all related activities;  explain 
the relation of these activities; highlight issues and problems; and recommend 
common solutions.   Known activities are:

1) MIT ID assignment:
        Transition from the  use of  Social Security number  as the MIT ID for 
privacy reasons. 

2) MIT ID card:
        Provide an MIT ID card which could be used to gain access to
different services around MIT.

3) Username assignment:
        Establish a single username for a person for access to
computer systems.

4) Person status information:
        Provide mechanisms for systems which need current status information 
on an individual to secure this information.  For example: employment status,  
system registration status, etc.

5) Person information in Warehouse:
        Provide mechanisms for authorized individuals to access  information 
about people at MIT from a warehouse.
        

     The MIT ID assignment, MIT ID card, and Username assignment all share the 
two-fold problem of both uniquely identifying an individual
and of assigning an identifier.   A number, a Username, and a card will all
be used as identification. The MIT ID number will be the most
universal identification. If the assignment of MIT ID numbers is done 
correctly, then the MIT ID card and a Username assignment process can draw on 
this information when making their assignment of a unique identifier.

     The definition of the set of people who should be  assigned an  MIT ID, 
and/or get and MIT ID card, and/or secure a Username for system access is also 
a shared problem.   

     Students and staff are obvious members of the set of individuals who 
require and MIT ID, an MIT Card and a Username for system access.  Other 
categories of people affiliated with the Institute are more difficult to define.  
For example, are  contractors, voucher employees, Lincoln Lab employees, 
Library users, and other people affiliated with the Institute a part of this 
set of People who should be tracked?  Defining and getting agreement on this 
set of people is fundamental.  An MIT ID can not function as a universal person 
identifier if it can not be used for all possible people. 

    Once the set of people has been defined, the custodian of the data about 
these people needs to be established. Employees and students have clear 
custodians. As other people in the set are identified,  their custodians must 
also be identified.  

Recommendations:        


*  Centrally Track all People defined to be a part of the set  (see People Set 
chart attached).  

*       IS will build and maintain the appropriate database.   Data custodians 
will be responsible for the integrity of the data.  

*   The Steering Committee will establish a policy which requires that all  
people defined as part of the set go to a predefined location to secure an MIT 
ID and MIT ID Card.

*   Establish the MIT Card Office as the custodian of all information about 
people who are not already maintained in the Personnel's or the Registrar's 
systems.   As people such as voucher employees or contractors begin to work 
with a department, each would be required to go to the MIT Card Office where a 
small amount of information would be entered to establish that person's MIT ID 
number. Through this process, an MIT ID card would be issued and a Username 
established.  It would be the MIT Card Office's responsibility to keep this 
individual's status current. This could be done in a variety of ways.  It would 
involve contacting the local department periodically to make sure the person's 
status has remained the same.  
        
Another variation of this idea would be to merge the Card Office with User 
Accounts to create an identification office.  This variation would provide "one 
stop shopping" for the establishment of an MIT identity including the MIT ID 
number, the MIT card and a Username and account if needed. 
        

Benefits:

This set of people will allow us to:   

1) Answer specific questions about MIT human resources.  

2) Provide information on granting and revoking the following:  keys, computer 
access, library privileges,  etc.  

3) Campus Police will have a mechanism to determine if a person should be on 
campus.

Process to Establish a People Database:

        The service we are providing is really an Identification service.  It 
assigns and stores identifiers of people needed by systems across the Institute.
        IS should build and maintain a "People" database. This
database would hold minimal information about all people who have some
affiliation with MIT. It would serve the following functions:

1) Provide a universal and unique way to identify all people
affiliated with the Institute without using Social Security number;  

2) Allow systems to query the People database to determine whether
a particular person already has an MIT ID number before generating a
unique MIT ID for a person;  

3) Facilitate the integration of information between systems; 

4) Provide status information to business people around the
Institute - for example, MIT Card, System Administrators, Libraries,
Campus Police, Athletics, etc.

        The People database will not keep track of each person's specific access
rights.  Nor will the database control these rights. Rather, it will provide a 
mechanism that another service can use to find out an individual's status, such 
as his/her employment status. The individual services will still make the 
determination about what people will have access to their service. 
        The People Database can solve directly MIT ID assignment and access to 
person status information.  Further,  it would provide the means to make MIT ID 
card and Username assignment much more straightforward. It would not attempt to 
address general data access about people at MIT. This function would be achieved
through the data warehouse.
        The actual "owner" of this service would be a small group
representing the organizations which have custodial responsibility for
the different sets of people. For example, Personnel, the Registrar's
Office, and whoever is determined custodian for the remaining
population. It would be this group who makes policy decisions such as
what systems can have access to the status information, and what
groups should be able to assign MIT IDs.
        IS would be the implementor and be responsible for the day to
day operation of this service. This responsibility would include
maintaining the machine, DBMS, database, and software interfaces as
well as doing regular backups and otherwise ensuring reliable service.
        The reason for this model is that at the moment there is no
one logical place where this service might fit. IS is well
suited to be the service provider but should not be put in the
position of making decisions about data for which someone else has 
custodial responsibility.
        The interface to this system for assigning IDs would be callable c 
routines which were made available for all platforms which needed this function. 
These c routines would be written by IS and would communicate in real time with 
the people database over the network.

Design Issues:

        Having a central system assign IDs solves many problems, but
it is not without risks. It creates dependencies, so that major
operational systems rely on this service. The machine, database,
software and network must all be reliable for this service to be a
success.
        This design assumes that we can hook the proper code into
the existing systems so they can interface with this service.
In the future as we replace systems, they also must have the
capability to partake in this scheme. If they were unable to participate, 
design modifications would need to be made to accommodate
them.
        A particularly sticky issue is the need to determine the minimal amount 
of information needed to assure a match with the correct person. SSN if one of 
the best ways to uniquely identify an individual. But asking for SSNs gives the 
impression of tracking by SSN which is one of the reasons we were moving away 
from SSN in the first place. Mother's maiden name, or date of birth both are 
considered sensitive, so we were going to use a combination of name and 
birthday to match with and hope this will be good enough. Because of data 
entry errors and misspellings, the matching algorithms will be quite complex. 
The Registrar's Office already has to do this in its system and has
given us the specification for its algorithm.

        Once the system establishes an MIT ID for an individual in
real time there would be batch processes which feed the system with
status information. This way the name, directory information and
status information would always be current. Other systems needing this
type of information, if approved for access, could get the information
from one place. The nature of these feeds would need to be worked out,
since currently there are many systems getting this information in a
variety of ways: tape, floppy, file transfer, etc.

        Lincoln Lab just finished creating an identification scheme. If we want 
to include Lincoln Lab's people in the People Database, then we will need to 
take its implementation into consideration.

        
ATTRIBUTES of MIT ID:

        Moving to a new MIT ID that is not based on SSN brings up
several issues. First, the purpose of the MIT ID is to have a
universal and unique way to identify people (a key) who have some connection to 
the Institute. This method facilitates the integration of information among 
systems. The attributes of this ID might be:

Unique -- As a key it is imperative that this be true. This also implies that 
numbers not be reused, because there is no knowing how long a system night need 
to reference a person.

Doesn't change -- Once assigned a person's ID must not change. Again if the 
number could change over time it wouldn't serve well as a key.

No meaning -- The ID should only have the function of an identifier and
should not have other information coded into it. One reason for this is that
information could change which would require changing the ID.

Public -- The ID needs to be public, since it is going to be passed among
systems. It should not be used as an authenticator.

        The numbers should be truly random to reduce the chance that
a typo will bring up an incorrect record.

        The identifiers can not overlap with  SSNs. If alpha
characters could be used the number of available identifiers would be
much greater (but we need to find out whether all systems could handle
this).The Registrar's System uses 9 digit numbers starting with
95,96,97 in order to handle this requirement. 


DESIGN Details

API
=====================
assign_id       input:  firstname, lastname , middle initial, birthday
                return code:
                output: mitid or list of person info records
?? Should we have optional "tie-breaker" arguments?
check_status    input: mitid
                output: is_student, is_staff

get_person_info input mitid
                output: base person info + list of affiliations with info

PROTOCOL
==============
        Since the logic for the generation of numbers and the "fuzzy"
name matching required might go beyond what can be easily coded in the
stored procedures of Sybase, it might be better to write our own
simple protocol. While this adds a little complexity it should make it
easier to port to anywhere we might need it. We are only talking about
a half dozen functions, and the database calls can be run from a
server process instead of directly from the client machines.
        The protocol could be an ASCII based request/response
protocol of which we have many examples.
         The message header from server to client might be something like:  
return code, message type, number of records, number of bytes, followed by a new 

line. The data portion would follow and end with CRLF. The data portion would be 

different depending on the record type. Theclient to server protocol could be a 
ASCII character representing the request type followed by parameters, with 
everything delimited with ':'.It would end with a new line. We could modify 
existing code to implement the communication piece quickly.

        One thing to note is that some of the systems reside on the secure LAN. 
This raises a few problems in the implementation of this service.

#  Possible structures for a people database (DRAFT)

# there would be only one of these person records per person.
table person(
        LAST_NAME               char(30),
        FIRST_NAME              char(30),
        MIDDLE_INITIAL          char(1),
        MIT_ID                  char(10),
        SOUNDEX_NAME            char(30),
        SSN                     char(10),
        BIRTHDAY                char(5),
        USERNAME                char(20),   #useful for mapping between 
                                                auth and MITID 
        EMAIL                   char(50)    #not necessary, but where are 
                                                contractors EMAIL addresses                                                          

                                            going to be kept?
)

# There could be 0 to several records here for a person.
table prior_name(
        LAST_NAME               char(30),
        FIRST_NAME              char(30),
        MIDDLE_INITIAL          char(1),
        MITID                   char(10),
)

#There could be several of these records per person, but maybe one primary one?
# Would we need to keep more than one affiliation record of the same
# type (student, staff etc.) for a person? history??

table affiliation(
        TYPE                    char(20),    # student, staff, others...
        MIT_ID                  char(10)
)

#There should be a corresponding status info record for each affiliation record
table staff_status_info(
        MIT_ID                  char(10),
        DEPT_NUM                char(6),
        TITLE                   char(70),
        IS_ACTIVE               boolean
)

table student_status_info(
        MIT_ID                  char(10),
        MAJOR                   char(30),
        LAST_TERM_REG           char(15),
        REGISTERABLE            boolean
)

# There will probably be status records for other types of people like
# contactors once we have defined this further.

#######################################
Directory information -- Is this useful to have here? save for later?

# several of these records are possible for an individual
table phone(
        MIT_ID                  char(10),
        PHONE                   char(10),
        PHONE_TYPE              char(15)        # EX: office, lab, fax, pager..
)

# several of these records are possible for an individual
table address(
        MIT_ID                   char(10),
        ADDRESS_TYPE            char(20),       # EX: office, lab, home, mail
        MIT_LOCATION            char(12),       # building and room format
#          or
        STREET
        CITY
        STATE
        ZIP
)