Longjobs Discovery Report

Table of Contents

Project Information

Sponsor

Vijay Kumar, Assistant Provost and Director, Academic Computing

Discovery Team

Consulting Assistance

URL

http://web.mit.edu/longjobs/

Email

longjobs-dev@mit.edu

Executive Summary

Rationale

The purpose of this project was to examine the need to execute long-running jobs in the Athena environment, and, if possible, to design a feasible service to satisfy that need. Under current rules of use, our customers have no reliable way to run procedures of long duration on an unattended Athena workstation. This has led to much frustration in the user community, and frequent abuse of the rules.

Recommendations

Conceptual Design

We proposed a service with the following characteristics:

Status

Next Steps

Business Case

Currently, our customers have no reliable way to execute long-running procedures, even those requiring no user interaction, without physically remaining at a workstation console. The Athena Rules of Use state that workstations cannot be left unattended for longer than 20 minutes, and that such an unattended session can be logged out by another user. The inability to run such jobs has resulted in a steady demand for a "long job" service, by faculty who assign work requiring long-running computations, and by individual students, both graduate and under-graduate, as part of their academic work. The lack of such a service has inevitably led to frustration and abuse, and necessitated our spending more resources on enforcement than is desirable.

In order to characterize the specific need further, we outlined the design of a service which, as conceived, would be compatible with the Athena environment, and prepared a questionnaire to solicit feedback on it from interested faculty and representative students. The results demonstrated that our envisioned solution would meet a preponderance of the expressed need, and provided a basis for quantifying that need.

We concluded that we should design a service which will give users the ability to run non-interactive, unattended jobs within the Athena environment, by providing a pool of dedicated, centrally-managed Athena server machines on which users can execute such non-interactive jobs remotely, in as regulated, ordered, and predictable a manner as possible.

The Proposed Solution

[The following is adapted from the Proposed test of Longjobs service document.]

The Goal

The intent is to provide a system that is as "transparent", and compatible with the Athena interactive computing experience, as possible; a user should easily be able to submit a job as a script of the same shell commands that would be entered during a normal interactive login session. The execution machines would be comparable to normal Athena workstations, with the same facilities available, except for those facilities which are only suitable for interactive capabilities (e.g. a display).

General Architecture

Access to the service would be granted through a registration process administered by the appropriate support group (see Registration, Support, and Reservations). Usage would be limited by a group-based quota system, maintained on the master; a quota of job hours would be granted at registration time.

A registered user could submit a job from any Athena workstation, specifying the type of Athena machine they want the job to run on, and other desired job parameters. The job would be directed immediately to a central "master" server, which would dispatch jobs to "slave" execution machines, as they become available. An execution slave would only run one job at a time, so a job would remain queued at the master until a suitable slave machine became available. Authentication and authorization would be based on the user's Kerberos principal.

Queues and Scheduling

All submitted jobs will enter precisely one of several queues; the queues will distinguish jobs based on machine type and time limit, and will have access control lists. We recommend having queues with at least two different time limits, for relatively short or long jobs.

Scheduling of all jobs will be based on a modified first-in, first-out scheme. A slave machine will serve one or more queues (e.g. the "short" or "long" queue, or both) and may be migrated between queues to satisfy varying demands on load.

Users can query the server to obtain job status, and cancel jobs that were no longer wanted. Users would optionally be notified, via e-mail and/or Zephyr, when a job begins and/or ends.

Job Execution

User accounts would be added, and home directories attached, on the execution machine only for the duration of the job. Jobs would be subject to a strict limit of elapsed time; any job exceeding that limit would be killed. Since a slave machine will only run one job at a time, the user is assured of having no contention for CPU cycles or other resources while the job is running, so that the time needed to complete the job will be as predictable as possible. At the end of each job, the slave will perform a cleanup and check procedure, to ensure system integrity. Standard output and error files would be written to the user's directory, or emailed to the user.

Tickets/Tokens

In order for the job to run with Kerberos credentials, as is required for most authentication and authorization purposes in the Athena environment, the user would optionally be able to acquire a long-lived, renewable Kerberos 5 ticket-granting ticket, which would be forwarded as part of the job. The master and slave servers would manage these tickets, renewing them as needed. In addition, the execution server would use this ticket to acquire Kerberos 4 tickets, and AFS tokens, the latter being most critical for users to be able to access their home and other directories from within the job context. Users who did not wish to forward a long-lived ticket could optionally choose to forward an existing short-lived ticket, or to run the job without any tickets or tokens. A renewable ticket could be maintained up to the maximum life permitted by our Kerberos configuration (currently one week).

Implementation

We recommend using the Portable Batch System (PBS) as a basis for our system. PBS was developed at NASA Ames Research Center, and is now supported by Veridian Corporation. The open-source version, OpenPBS, is freely available; a Professional version, PBS Pro, is also available at very low cost, including source, to educational sites. Our prototype is based on the open-source version of PBS.

Athena Development will make the necessary enhancements and additions to PBS, including the support for Kerberos authentication and authorization, the management of renewable tickets and tokens, compatibility with the Athena login system, Zephyr support, and integration with a group-based quota system. The server machines will be administered by Athena Server Operations, and registrations by the appropriate support group (see below).

Registration, Reservations, and Support

The Faculty Liaisons will handle registrations, reservations, and support for course-based use of the service (through faculty and TAs). Athena Accounts and Consulting would handle registrations and support for other uses, according to whatever policies are developed for such use of the service.

To ensure that sufficient resources are available for their classes, instructors would be able to request a reservation from the Faculty Liaisons for a particular number and type of machines during a specific time period. This would be implemented via a separate queue mapping to an appropriate subset of machines in the pool, ACLd to the group for that class. We would require advance notice for such requests (in order to ensure sufficient machines are available, to transfer general queues to non-reserved machines, etc.). Similarly, reservations would probably need to be staggered to build in some free time for handling transitions.

Registrations would include an automatic expiration date, which would typically be the end of the current semester.

Accounting

The system will produce accounting records for each job, containing user and account names, as well as other pertinent information. Initially, we will use these to track utilization; eventually, these records could be input to a billing system, if such a component were desired or needed to fund expansion of the service.

Machines

We recommend that the machines in the slave pool be comparable to the workstations found in Athena clusters. If feasible, all platforms in the current Athena release should be represented in the slave pool. We recommend a two-year renewal cycle for the slaves, in order to remain comparable with the latest hardware found in the clusters. Some number of the older slaves could remain in use, perhaps serving special queues for jobs where top performance is not essential.

Design Issues

We identified the following issues concerning our proposed design:

Status

Following the report of our initial recommendations, at the request of the Sponsor, we proposed a feasiblility test of the service. Upon approval, we completed the necessary development of the prototype, and launched the test for selected courses and individual students in the Spring 2001 term. Five SGI O2 and five Sun Ultra 5 machines were purchased for the test slave pool, plus another Ultra 5 to serve as the master.

At the end of the term, we reported the results of the test. While the results were generally favorable, we would have preferred to have had greater load, and to have tested the reservation system. We recommended continuing the test in the Fall 2001 term, and successfully launched the test in September 2001, after having ported the existing software to Athena 9.0, and adding a Moira DCM feed for group membership queries.

The test experience also helped to identify and clarify potentially needed future development work, which could be the basis for subsequent Delivery tasks.

Resources

Costs

Future Work

In the course of implementing and deploying the service prototype, we identified potential future development tasks. Generally, these can be categorized as follows: Whether and when to do any of these tasks would depend on specific Delivery scope decisions. (See our notes on the issues involved in these tasks).

Appendix

Key Documents

Other documents