Production: ~rolesdb/cronjobs/crontab.cloverleaf
Test: ~rolesdb/cronjobs/crontab.parsley
If you ever need to change the crontab file, do the following: 1. Connect to the server machine using telnet or SSH as user rolesdb. 2. Go to the cronjobs directory cd cronjobs 3. Check out the crontab file from RCS using the alias "checkout" checkout crontab.cloverleaf (or) checkout crontab.parsley 4. Use emacs or vi to make the desired changes 5. Check in the new crontab file into RCS using the alias "checkin" checkin crontab.cloverleaf (or) checkin crontab.parsley 6. **** Don't forget this step! *** Run the crontab command. crontab crontab.cloverleaf (or) crontab crontab.parsleyYou can display the current crontab entry with the command
crontab -l
morning_jobs_early - Extracts most data from the Warehouse, including most types of Qualifiers, and does some processing for the Master Department Hierarchy (Runs in early morning) morning_jobs_late - Runs some steps that depend on PERSON data from the Warehouse including (a) loading the PERSON table from krb_person@warehouse, (b) loading EHS-related Room Set data from the Warehouse into RSET qualifiers, (c) processing externally-derived authorizations (table EXTERNAL_AUTH). (Runs in the morning, not quite so early) cron_run_exception_notify - Generates Email about Authorizations that still exist for people with deactivated Kerberos usernames (Runs after morning_jobs_late) weekend_jobs - Runs a procedure to run an Oracle ANALYZE on all the tables in the Role DB. (Runs each Saturday.) hourly_jobs - Currently only runs once a day, not hourly. Updates derived database tables ("shadow" tables) for the Master Department Hierarchy cron_roles_cleanup_archive - Cleans up some old files in the archive directory cron_run_sapfeed - Runs a job to create several daily files about SAP-related authorizations and uses the scp command to copy them to a directory on one of the SAP servers where they are picked up, further processed, and loaded into the appropriate SAP objects. The files built and sent are incremental, including authorization information only for those people whose "expanded" authorizations (after including all child qualifiers) have changed since the previous run. (Runs once each morning, only on the production Roles server) cron_pdorg_prog.sh - Compares SAP-related approver authorizations in the Roles DB with parallel information in the pd org structures in SAP, and generates a file indicating differences. (Runs once each morning) cron_pddiff_feed.sh - Sends the file pdorg_roles.compare, generated from cron_pdorg_prog.sh, to the SAP dropbox (Runs each morning after cron_pdorg_prog.sh) cron_ehs_extract run_ehs_role_prog.pl - Runs ~rolesdb/bin/ehs/run_ehs_role_prog.pl to compare DLC-level EHS roles (e.g., DEPARTMENTAL EHS COORDINATOR) to their equivalent Authorizations in the Roles DB (which is the system of record), generates a differences file, and sends it to the SAP dropbox so the changes can be applied.
Directory | Description |
---|---|
~rolesdb/archive | Some compressed historical files from previous days' data feed runs |
~rolesdb/bin | Generic data feed perl scripts and other program files |
~rolesdb/bin/ehs | EHS-related data feed programs |
~rolesdb/bin/extract | Programs related to out-going data for DACCA, LDS (SAP component being phased out), Budget (failed SAP project), etc. |
~rolesdb/bin/pdorg | Programs related to out-going data for updating PD Org entries in SAP related to APPROVER authorizations |
~rolesdb/bin/repa_feed | Temporary or test versions of programs |
~rolesdb/bin/roles_feed | Most data feed programs for data coming into the Roles DB |
~rolesdb/data | Data files used by data feed programs. Most data files are temporary, but some, such as roles_person_extra.dat, are permanent. |
~rolesdb/doc | Miscellaneous notes and documentation |
~rolesdb/extract | Empty |
~rolesdb/lib | A few generic perl modules, and some config files |
~rolesdb/log | Most recent log files from data feed programs |
~rolesdb/misc | Miscellaneous notes and working files |
~rolesdb/sap_feed | Obsolete versions of Roles->SAP data feed programs |
~rolesdb/sql | SQL source files for creating tables, views, and stored procedures. Files for creating tables (new_schema*.sql) are preserved for documentation purposes and should NOT be rerun -- tables should never be dropped and recreated since we do not want to lose the data. Files for creating stored procedures and views can be modified and rerun. |
~rolesdb/sql/frequently_run_scripts | Special SQL scripts that are run periodically, e.g., to analyze tables |
~rolesdb/sap_feed | Obsolete versions of Roles->SAP data feed programs |
~rolesdb/xsap_feed/bin | Programs for Roles->SAP data feed programs |
~rolesdb/xsap_feed/config | Config files for Roles->SAP data feed programs |
~rolesdb/xsap_feed/data | Nightly data for Roles->SAP data feed programs |
The steps do the following:
Extract | Extract a full set of data for a particular type of object from the external source, generally the Data Warehouse. The data are written to a flat file in the ~/data directory, generally a file with a name ending in ".warehouse". The "warehouse" suffix is used even if the source of the data is something other than the Warehouse. |
---|---|
Prepare | (a) Select a full set of parallel data for a particular type of
object from Roles DB tables into a flat file in the ~/data directory
with a name ending in ".roles". (b) Compare the data from the Extract and Prepare steps and produce a flat file "*.actions" listing actions to be applied to the Roles DB tables to synchronize them with the source data. |
Load | Apply the actions from the previous step's "*.actions" table to actually update the data in the Roles DB tables |
However, when debugging or correcting a problem, it is possible to run any or all steps manually, and check the results after each step. There are two ways to do this:
Example 1:
To run the steps of the EHS PIs, Room Sets, and Rooms feed program
on the production Roles database, do the following
- Connect to roles.mit.edu as user rolesdb - From any directory test_procedure.pl (as mentioned above, different production data feed steps are available in run_procedure.pl and test_procedure.pl) - Enter the number 22 to run the person extract step - Enter 23 to run the prepare step - Enter 0 to stop - cd data (to get to the data directory) - tp (a little command wrapped around ls to list the most recent files in the current directory) - use emacs or vi to look at files rset.actions, or rset.warehouse, or rset.roles (The *actions file lists transactions to be performed) - Make adjustments in the rset.actions file, or use the web interface to change the MAX_RSET parameter - test_procedure.pl - Enter 24 to run the load step - Enter 0 to stop
Example 2:
To run the steps of the person feed program on the production Roles
database, do the following
- Connect to roles.mit.edu as user rolesdb - From any directory run_procedure.pl - Enter the number 16 to run the person extract step - Enter 17 to run the prepare step - Enter 0 to stop - cd data (to get to the data directory) - tp (a little command wrapped around ls to list the most recent files in the current directory) - use emacs or vi to look at files roles_person.DELETE, roles_person.UPDATE, or roles_person.INSERT - Make adjustments in one or more files, or use the web interface to change the MAX_PERSON parameter (max no. of person changes) - run_procedure.pl - Enter 18 to run the load step - Enter 0 to stop
There are three different ways that notification Email get generated by the Roles DB nightly feed programs.
If an Authorization exists where the person (Kerberos_name column) represents a username that is no longer active (no longer included in the KRB_PERSON table from the Warehouse), then notification Email will be sent.
The people who should be notified are specified by setting up authorizations in the Roles Database where the function "NOTIFICATION - INACTIVE USERS" and the qualifier represents the function_category (application area) for which this person should receive notification.
The program ~/bin/roles_feed/exception_mail.pl finds all categories (column function_category in the authorization table) for which authorizations exist for deactivated Kerberos_names. The program then sends Email to the appropriate recipients (based on NOTIFICATION - INACTIVE USERS authorizations), one piece of Email per recipient for each function_category where there are one or more authorizations for inactive Kerberos_names. The Email lists the inactive usernames and the number of authorizations in the category that should be deleted or reassigned.
Some errors are detected by various data feed programs, usually in the LOAD step, that result in Email being sent to a list of Email addresses stored in the file ~/lib/roles_notify. Currently, the Email addresses included are (the list) warehouse@mit.edu and repa@mit.edu.
The full log from LOAD step of most data feed programs, and the full log from other data feed programs that do not have a separate LOAD step are sent to one or more Email recipients. Within the cronjobs directory, the scripts morning_jobs_early, morning_jobs_late, evening_jobs, and weekend_jobs include steps that send out this Email. Currently repa@mit.edu is the only recipient.
These log files do not need to be examined every day. It is useful to periodically examine them for warning messages. It is also useful to have them available if a problem is detected. However, usually the Email sent out for detected error messages is sufficient for detecting problems.
- Most common errors * Too many changes in one night o More than expected no. of transactions, due to batch load of student data, EHS data, or some other source with an unusually high volume of changes Common examples: - The WHCOST feed exceeds its limit because a change has been made to all cost objects related to one of the profit centers under the Department of Facilities. There are a couple of profit centers under the Department of Facilities that have tens or hundreds of thousands of cost objects, and all of these cost objects tend to have the same supervisor and addressee. If the supervisor or addressee changes, or if the room number or other info about the supervisor or addressee changes for all cost objects under one of these huge profit centers, then the number of changes exceeds the 10000 changes per night limit. The solution is to increase the MAX_WHCOST parameter from its default to a high enough number to accommodate the changes for the day. This is described in the solution section below. - The person feed exceeds its limit each year on the morning following the annual old Kerberos principle purge. The solution is to increase the MAX_PERSON parameter from its default (3000) to a higher number that accommodates the day's set of changes, as described in the solution below. - The person feed exceeds its limit each spring (May or June) when the graduating seniors lose their student status. The solution is to increase the MAX_PERSON parameter from its default (3000) to a higher number that accommodates the day's set of changes, as described in the solution below. Solution: Examine the data changes (see the section "Running Extract Prepare, and Load steps by hand"). Determine if the changes are legitimate due to source data or system problems. If the changes are legitimate, increase the appropriate value in the ROLES_PARAMETERS table (see description in the "Adjusting max-action per day in the ROLES_PARAMETERS table" section below in this document), and either rerun the 3 data feed steps by hand, or wait for tommorow morning's cronjob. o EXTRACT step failed due to a network or server problem and there was no data for the PREPARE and LOAD steps * Deactivated users have one or more authorizations Solution: No immediate action is necessary. Peggie takes care of these issues over time. - less frequent problems * Run out of qualifier_id's in range allocated for this type of qualifier. (This should now be very rare -- there should be lots of room for qualifiers of each type. However each qualifier has its own limited contiguous range of Qualifier IDs, so it is possible for any qualifier_type to overrun its limits. You'll see a unique index error if this happens.)
Solution: See rolesweb.mit.edu/sys_admin_tasks.html for this task and others
* EXTRACT step failed because the format of the data in the Warehouse has changed without corresponding changes in the Roles data feed Solution: You might need to alter the data feed program in the Roles DB, in the bin/roles_feed directory.
From rolesweb.mit.edu, click on "System Administrator tools", then click on Update Roles DB parameters for data feeds and other processes.
- The ROLES_PARAMETERS table This table contains records that define the max number of allowable changes for any of the incoming data feed programs. You can use the web UI to temporarily increase the number if it appears that the incoming data is valid - There is a web interface for maintaining data in the ROLES_PARAMETERS table This can be found by starting at rolesweb.mit.edu, clicking "System administrators tasks", and clicking "Update Roles DB parameters for data feeds and other processes". - Pick the parameter you want to change, e.g., "MAX_PERSON", and click "Change Parameter Value", and follow instructions. - There are two sets of parameter values for each parameter, the so-called "parameter value" and the "default value". If you change a "parameter value", it will be reset to its default in the evening after the next day's data feeds. An authorization for Roles DB function "MAINTAIN PARAMETER VALUES" is required for changing just the current value for a parameter. An authorization for Roles DB function "MAINTAIN ALL PARAMETER DATA" is required for changing default values and other fields.
See rolesweb.mit.edu/sys_admin_tasks.html for documentation on various system administrators tasks.