EHS Training Interim System

Jim Repa - 9/20/2002

I. Two-phase plan

The EHS Training subteam decided on a two-phase plan for implementing systems for training. We expect that the long-term training delivery and record keeping system at MIT (Phase II) will be an integrated campus-wide system based either on software being developed under the Open Knowledge Initiative (OKI) or on the training module within SAP. However, neither OKI nor the SAP module was judged to be sufficiently mature to support the 2002-2003 requirements for EHS training, so an interim solution (Phase I) was necessary.

The two phases can be summarized as follows:

Phase I. (2002 until at least the summer of 2003)

For delivery of web-based courses and with some simple record-keeping and reporting, use Traincasters, a system managed and operated for MIT by the vendor Netcasters in Topsfield, MA.
For user registration, needs assessment, and more comprehensive record-keeping, use EHSWEB, a Sun server at MIT with a small Oracle database and an Apache web server. (There is a seemless connection between EHSWEB and Traincasters, so most users should not be aware that there are two servers involved.)
For comprehensive reporting, use MIT's Data Warehouse to store data uploaded from EHSWEB.
Phase II. (some point after the summer of 2003, timing to be determined later)

We will eventually choose an integrated, campus-wide system for training delivery and records-keeping system at MIT. This may be based on OKI or SAP.
Once a permanent training system has been selected, a decision will need to be made on when to migrate the training content and reimplement the training registration rules in the ultimate training system.

The remainder of this document will discuss the Phase I (interim) training solution.

II. Why do we need a separate server (EHSWEB) in addition to Traincasters?

The EHSWEB server, housed and managed at MIT, is a front-end to the Traincasters system. Users go to EHSWEB first to record some information about their affiliations with DLCs and PIs or supervisors, and fill out "needs analysis" information, i.e., a list of activities they perform that are associated with training requirements. The EHSWEB server then determines each user's web based training needs and automatically redirects the user to Traincasters to actually take the web-based courses.

There were several reasons for the decision to use a separate front-end server at MIT in tandem with the Traincasters system to support training. The reasons for this decision are the following:

Getting around records and reporting limitations in Traincasters system
We know that it is common for people in the research community to simultaneously work in more than one lab under the supervision of more than one PI and more than one DLC. Training reports should reflect this, allowing a person to appear under more than one PI or more than one DLC where appropriate. The Traincasters system works under the model of tracking each person to a single "group".
We also expect to be merging records for classroom-based training and web-based training in our training reports. Handling this in Traincasters would also involve custom modifications.
It was determined that it will be less costly to MIT to use a front-end server to keep track of the multiple-PI or multiple-DLC connections for people, in addition to classroom-based training information, rather than pay Netcasters to do significant modifications to their system.
Avoiding unnecessary dissemination of MIT employee and student information to an outside vendor
The registration process for a user can leverage information available for that person in MIT's Data Warehouse, filling in information about the person's Email address, and at least partial information about the person's Department affiliations. The information about a person from the Warehouse can be put up on a screen automatically once a person connects to the EHSWEB server using their MIT certificate. If we did not have the front-end EHSWEB server and had users register directly at the Traincasters site, either (i) we would have to give the outside vendor access to employee and student information for all 25,000+ people at MIT, or (ii) we would need to have each person rekey his/her Email address, DLC affiliation, etc. from scratch in Traincasters. Option (i) is not desirable - generally we discourage dissemination of MIT information on all 25,000+ people in the MIT community to outside vendors. Option (ii) is also problematic because it requires trainees to enter data that is already available in the Data Warehouse.
(Note: We do send Email address, course requirements, and other items of information required by Traincasters, but only for registered users of Traincasters, not the whole MIT community.)
Limiting outside vendor custom development during the prototyping phase for the needs assessment
Starting in the summer of 2002 and extending over the next 3-6 months, we will be deciding on the rules for training, i.e., what factors trigger a person's need to get training, and what specific courses or other requirements must each person complete. During this period, we will be prototyping a system to manage training needs assessment. It was determined that it will be less costly to MIT to do this prototyping work at MIT rather than paying Netcasters to make continuing modifications to their system.
Getting a simple needs assessment and registration solution working quickly, yet allowing flexibility for phasing in a more sophisticated solution
Traincasters expects each user to be registered for the specific courses that they require. The decision for which courses a person needs is made in the local MIT front-end system (EHSWEB) as the person registers, and the information is passed onto Traincasters when the person is automatically redirected to Traincasters. This mechanism was put in place in June, 2002, so that trainees could start to register and take the first 3 training modules made available in Traincasters. We will continue to evolve the needs assessment and add other course modules in the coming months.
Having the needs assessment component of the software on MIT's local EHSWEB server allowed us to do a quick and simple needs assessment for the first few modules, and to continue to evolve the needs assessment within our own control, without having to schedule repeated software changes with corresponding charges from Netcasters. Thus, we have been able to economically install the first needs assessment system and to continue to evolve it with very low cost to MIT, and without the need to coordinate each prototype change with the outside vendor.

III. How does the EHSWEB server work to support training registration and needs assessment?

The EHSWEB includes a database; Perl scripts to get data from the Warehouse, the Roles Database, Traincasters and other sources; Perl scripts to send data to other sources; and a web server with various CGI scripts. Some CGI scripts are interfaces for updating the database; wherever the database is updated via a CGI script, there are PL/SQL stored procedures enforcing authorization and data integrity rules.

Summary of components:

Oracle database
PL/SQL stored procedures (used to enforce authorization rules and data integrity rules when updating data in Oracle tables)
Perl data feed programs for data flowing into EHSWEB (See also data feeds to and from Traincasters.)
- ehs_person.pm
  Gets directory information on MIT employees, students, and others from the Warehouse (which in turn gets the information from HR, Student Information Systems, and Moira [the Athena user accounts database])
- ehs_dept.pm
  Gets list of departments from the Roles Database (later will get this information from the new Master Department Hierarchy). The information includes mappings from alphanumeric department codes (e.g., D_CHEME) to 6-digit HR numbers and academic course numbers.
- ehs_auth.pm
  Gets EHS authorizations information from the Roles Database
- ehs_pi.pm and ehs_pidept.pm
  Gets list of PIs and their department affiliations from the Warehouse.
- Soon to be implemented: Get from LDS data in the Warehouse a list of PIs or Supervisors, and a list of DLCs, for which each person works.
- To be implemented: Get from Traincasters data about people who have successfully completed web-based courses.
- To be implemented: Get from Kevin Cunninham's Filemaker database a list of people who have successfully completed classroom (live) courses.
Perl data feed programs for data flowing out of EHSWEB (See also data feeds to and from Traincasters.)
- Department data: Sends to Traincasters (via FTP after encrypting) a list of department codes and department names. Used in Traincasters for reports.
- PIs and Supervisors: Sends to Traincasters (via FTP after encrypting) a list of PI and Supervisors, with their Kerberos namd and full name. Used for reports.
- Note: Data are also sent to Traincasters as hidden HTML form variables from the registration web pages. (See also data feeds to and from Traincasters.)
- To be implemented: Send to Traincasters nightly updates of registered people's Email address, course requirements, etc.
- To be implemented: Send to the Warehouse training related data, to be used in Warehouse-based training reports
Apache web server with Perl CGI scripts to do the following:
- train_reg.cgi
  User training registration and needs assessment. Users fill in DLC and PI/supervisor information, and answer questions about activities they perform. Script records this information in the database, determines training requirements, and passes the information to Traincasters in hidden form variables.
- train_report1.cgi
  Evolving training report ( See training administrators tools and reports.)
- ehs_pi_admin.cgi
  Web interface to add to the list of PIs and supervisors. (This is the list that is displayed on the training registration web pages.) The list automatically contains PIs known to OSP, but PIs and supervisors not involved with sponsored projects may need to be manually added by EHS Training people using this interface. ( See training administrators tools and reports.)
- To be implemented: A web interface for displaying and maintaining rules for who needs training and what training modules or other requirements must be completed.

IV. Decisions, analysis, and development in the interim system

A. The Data Model for training requirements, in a nutshell

We propose a data model for defining training requirements that includes the following entities:

Rules for who needs what certification types
There will be a set of rules, based on

Activities a person performs
DLC with which a person is affiliated
A person's proximity to certain hazard types
A person's job title (only practical within a few departments, such as Department of Facilities)
A student's enrollment in certain lab courses
Each rule will be of the form "If person X fits criteria Y (from the list of possible criteria above), then he/she must obtain certification type Y."
Certification types
A certification type may be "Certification to work with chemicals in a lab", or "Certification to work with lasers". There may be one or more criteria that require a person to get a given certification type, and there may be one or more courses or other requirements that a person must complete to get a certification type. Certification types may be hidden from end users, but they will be useful for grouping courses and requirements within the database.
Training and other requirements for each certification type
For each certification type, there will be a list of required course modules and/or other requirements. For example, for "Certification to work with lasers", the requirements will be (1) a laser safety course and (2) a baseline eye exam.
Course modules and other requirements
We will have a list of course modules (live and/or web-based), and other criteria that are associated with requirements for various certification types.
In addition to the general rules for who needs to obtain various certification types and what requirements are associated with each certification type, we will need to keep track of data for each person, e.g.,

For each person, what criteria (activities, DLC, proximity to hazard types, job title, enrollment in certain courses) apply? These criteria are used to determine a list of certification types required by that person.
What certification types is a person required to complete? Which ones are currently completed or not completed?
What courses or other requirements, associated with a person's certification type requirements, have been completed or not completed? For those that have been completed, which ones will expire and need to be redone, and when?

B. Separation of issues

I'd like to point out that there are some separate training-related issues to be resolved that do not have to be resolved at the same time.
The following three issues, though related, are separate and do not need to be resolved all at the same time:

What are the general types of criteria that will be considered in determining a person's certification type requirements (and thus, a person's course or other requirements)?
Here, we are interested just in the general types, not all the specific requirements. So far, we expect to be looking at (i) activities a person performs, (ii) DLC affiliations, (iii) proximity to certain hazard types, (iv) job titles (for people in a few departments only), (v) enrollment in certain lab courses. Once we know that we've got a reasonable starting set of general kinds of requirements, we can build a prototype system to store and evaluate these requirements. To build the system, we do not need to know all the specific rules, but just what kinds of rules we need to support.

What are the specific rules for certification types, and the course or other requirements for each certification type?
Here we want to list the actual specific rules. Once there is a prototype system to store these rules, we can start putting them into the system and trying them out.

How are we going obtain information about people, either by deducing it from other data, or by directly collecting it, so that we can apply the rules and determine people's training requirements?
Once we have an idea what the criteria are for determining certification and training requirements, we need to determine how we will know who fits these criteria. There could be a combination of self-reported data, data collected or at least reviewed by supervisors or "C-level" people, and data deduced from other sources. To satisfy the spirit of the Consent Decree, and to satisfy auditors who will review our systems in the future, we need to be able to show reasons why our process for collecting or deducing this information is reliable and auditable.