================================= CREATED: 1994, Scott Thorne, MIT Data Administrator EMAILED: TO: isdir, FROM: thorne DATE: Aug 17, 1994 ================================= People Related Projects Introduction: There are several ideas and projects underway which are closely related. This report attempts to: identify all related activities; explain the relation of these activities; highlight issues and problems; and recommend common solutions. Known activities are: 1) MIT ID assignment: Transition from the use of Social Security number as the MIT ID for privacy reasons. 2) MIT ID card: Provide an MIT ID card which could be used to gain access to different services around MIT. 3) Username assignment: Establish a single username for a person for access to computer systems. 4) Person status information: Provide mechanisms for systems which need current status information on an individual to secure this information. For example: employment status, system registration status, etc. 5) Person information in Warehouse: Provide mechanisms for authorized individuals to access information about people at MIT from a warehouse. The MIT ID assignment, MIT ID card, and Username assignment all share the two-fold problem of both uniquely identifying an individual and of assigning an identifier. A number, a Username, and a card will all be used as identification. The MIT ID number will be the most universal identification. If the assignment of MIT ID numbers is done correctly, then the MIT ID card and a Username assignment process can draw on this information when making their assignment of a unique identifier. The definition of the set of people who should be assigned an MIT ID, and/or get and MIT ID card, and/or secure a Username for system access is also a shared problem. Students and staff are obvious members of the set of individuals who require and MIT ID, an MIT Card and a Username for system access. Other categories of people affiliated with the Institute are more difficult to define. For example, are contractors, voucher employees, Lincoln Lab employees, Library users, and other people affiliated with the Institute a part of this set of People who should be tracked? Defining and getting agreement on this set of people is fundamental. An MIT ID can not function as a universal person identifier if it can not be used for all possible people. Once the set of people has been defined, the custodian of the data about these people needs to be established. Employees and students have clear custodians. As other people in the set are identified, their custodians must also be identified. Recommendations: * Centrally Track all People defined to be a part of the set (see People Set chart attached). * IS will build and maintain the appropriate database. Data custodians will be responsible for the integrity of the data. * The Steering Committee will establish a policy which requires that all people defined as part of the set go to a predefined location to secure an MIT ID and MIT ID Card. * Establish the MIT Card Office as the custodian of all information about people who are not already maintained in the Personnel's or the Registrar's systems. As people such as voucher employees or contractors begin to work with a department, each would be required to go to the MIT Card Office where a small amount of information would be entered to establish that person's MIT ID number. Through this process, an MIT ID card would be issued and a Username established. It would be the MIT Card Office's responsibility to keep this individual's status current. This could be done in a variety of ways. It would involve contacting the local department periodically to make sure the person's status has remained the same. Another variation of this idea would be to merge the Card Office with User Accounts to create an identification office. This variation would provide "one stop shopping" for the establishment of an MIT identity including the MIT ID number, the MIT card and a Username and account if needed. Benefits: This set of people will allow us to: 1) Answer specific questions about MIT human resources. 2) Provide information on granting and revoking the following: keys, computer access, library privileges, etc. 3) Campus Police will have a mechanism to determine if a person should be on campus. Process to Establish a People Database: The service we are providing is really an Identification service. It assigns and stores identifiers of people needed by systems across the Institute. IS should build and maintain a "People" database. This database would hold minimal information about all people who have some affiliation with MIT. It would serve the following functions: 1) Provide a universal and unique way to identify all people affiliated with the Institute without using Social Security number; 2) Allow systems to query the People database to determine whether a particular person already has an MIT ID number before generating a unique MIT ID for a person; 3) Facilitate the integration of information between systems; 4) Provide status information to business people around the Institute - for example, MIT Card, System Administrators, Libraries, Campus Police, Athletics, etc. The People database will not keep track of each person's specific access rights. Nor will the database control these rights. Rather, it will provide a mechanism that another service can use to find out an individual's status, such as his/her employment status. The individual services will still make the determination about what people will have access to their service. The People Database can solve directly MIT ID assignment and access to person status information. Further, it would provide the means to make MIT ID card and Username assignment much more straightforward. It would not attempt to address general data access about people at MIT. This function would be achieved through the data warehouse. The actual "owner" of this service would be a small group representing the organizations which have custodial responsibility for the different sets of people. For example, Personnel, the Registrar's Office, and whoever is determined custodian for the remaining population. It would be this group who makes policy decisions such as what systems can have access to the status information, and what groups should be able to assign MIT IDs. IS would be the implementor and be responsible for the day to day operation of this service. This responsibility would include maintaining the machine, DBMS, database, and software interfaces as well as doing regular backups and otherwise ensuring reliable service. The reason for this model is that at the moment there is no one logical place where this service might fit. IS is well suited to be the service provider but should not be put in the position of making decisions about data for which someone else has custodial responsibility. The interface to this system for assigning IDs would be callable c routines which were made available for all platforms which needed this function. These c routines would be written by IS and would communicate in real time with the people database over the network. Design Issues: Having a central system assign IDs solves many problems, but it is not without risks. It creates dependencies, so that major operational systems rely on this service. The machine, database, software and network must all be reliable for this service to be a success. This design assumes that we can hook the proper code into the existing systems so they can interface with this service. In the future as we replace systems, they also must have the capability to partake in this scheme. If they were unable to participate, design modifications would need to be made to accommodate them. A particularly sticky issue is the need to determine the minimal amount of information needed to assure a match with the correct person. SSN if one of the best ways to uniquely identify an individual. But asking for SSNs gives the impression of tracking by SSN which is one of the reasons we were moving away from SSN in the first place. Mother's maiden name, or date of birth both are considered sensitive, so we were going to use a combination of name and birthday to match with and hope this will be good enough. Because of data entry errors and misspellings, the matching algorithms will be quite complex. The Registrar's Office already has to do this in its system and has given us the specification for its algorithm. Once the system establishes an MIT ID for an individual in real time there would be batch processes which feed the system with status information. This way the name, directory information and status information would always be current. Other systems needing this type of information, if approved for access, could get the information from one place. The nature of these feeds would need to be worked out, since currently there are many systems getting this information in a variety of ways: tape, floppy, file transfer, etc. Lincoln Lab just finished creating an identification scheme. If we want to include Lincoln Lab's people in the People Database, then we will need to take its implementation into consideration. ATTRIBUTES of MIT ID: Moving to a new MIT ID that is not based on SSN brings up several issues. First, the purpose of the MIT ID is to have a universal and unique way to identify people (a key) who have some connection to the Institute. This method facilitates the integration of information among systems. The attributes of this ID might be: Unique -- As a key it is imperative that this be true. This also implies that numbers not be reused, because there is no knowing how long a system night need to reference a person. Doesn't change -- Once assigned a person's ID must not change. Again if the number could change over time it wouldn't serve well as a key. No meaning -- The ID should only have the function of an identifier and should not have other information coded into it. One reason for this is that information could change which would require changing the ID. Public -- The ID needs to be public, since it is going to be passed among systems. It should not be used as an authenticator. The numbers should be truly random to reduce the chance that a typo will bring up an incorrect record. The identifiers can not overlap with SSNs. If alpha characters could be used the number of available identifiers would be much greater (but we need to find out whether all systems could handle this).The Registrar's System uses 9 digit numbers starting with 95,96,97 in order to handle this requirement. DESIGN Details API ===================== assign_id input: firstname, lastname , middle initial, birthday return code: output: mitid or list of person info records ?? Should we have optional "tie-breaker" arguments? check_status input: mitid output: is_student, is_staff get_person_info input mitid output: base person info + list of affiliations with info PROTOCOL ============== Since the logic for the generation of numbers and the "fuzzy" name matching required might go beyond what can be easily coded in the stored procedures of Sybase, it might be better to write our own simple protocol. While this adds a little complexity it should make it easier to port to anywhere we might need it. We are only talking about a half dozen functions, and the database calls can be run from a server process instead of directly from the client machines. The protocol could be an ASCII based request/response protocol of which we have many examples. The message header from server to client might be something like: return code, message type, number of records, number of bytes, followed by a new line. The data portion would follow and end with CRLF. The data portion would be different depending on the record type. Theclient to server protocol could be a ASCII character representing the request type followed by parameters, with everything delimited with ':'.It would end with a new line. We could modify existing code to implement the communication piece quickly. One thing to note is that some of the systems reside on the secure LAN. This raises a few problems in the implementation of this service. # Possible structures for a people database (DRAFT) # there would be only one of these person records per person. table person( LAST_NAME char(30), FIRST_NAME char(30), MIDDLE_INITIAL char(1), MIT_ID char(10), SOUNDEX_NAME char(30), SSN char(10), BIRTHDAY char(5), USERNAME char(20), #useful for mapping between auth and MITID EMAIL char(50) #not necessary, but where are contractors EMAIL addresses going to be kept? ) # There could be 0 to several records here for a person. table prior_name( LAST_NAME char(30), FIRST_NAME char(30), MIDDLE_INITIAL char(1), MITID char(10), ) #There could be several of these records per person, but maybe one primary one? # Would we need to keep more than one affiliation record of the same # type (student, staff etc.) for a person? history?? table affiliation( TYPE char(20), # student, staff, others... MIT_ID char(10) ) #There should be a corresponding status info record for each affiliation record table staff_status_info( MIT_ID char(10), DEPT_NUM char(6), TITLE char(70), IS_ACTIVE boolean ) table student_status_info( MIT_ID char(10), MAJOR char(30), LAST_TERM_REG char(15), REGISTERABLE boolean ) # There will probably be status records for other types of people like # contactors once we have defined this further. ####################################### Directory information -- Is this useful to have here? save for later? # several of these records are possible for an individual table phone( MIT_ID char(10), PHONE char(10), PHONE_TYPE char(15) # EX: office, lab, fax, pager.. ) # several of these records are possible for an individual table address( MIT_ID char(10), ADDRESS_TYPE char(20), # EX: office, lab, home, mail MIT_LOCATION char(12), # building and room format # or STREET CITY STATE ZIP )