Proposed Athena 'longjobs' service description

Contact information:	Public discussion: longjobs@mit.edu For notes to the longjobs team: longjobs-dev@mit.edu

Introduction	This document describes the current thinking for a system which would provide a solution to the "longjobs" problem in the Athena environment. The goal is to ensure that we understand user needs, and, if feasible, devise a solution that meets those needs, to the fullest practical extent. In this document, we present: a problem summary an overview of the proposed solution a list of issues and problems inherent in the proposal examples of using such a system We welcome and encourage any comments and other responses, whether approving, critical, or otherwise.

Problem Summary	The major problem we are addressing is the requirement that users remain present at the console of an Athena workstation for the duration of a session; in public clusters, workstations cannot be left unattended for more than a few minutes. Users have no reliable way to run procedures of long duration, even jobs that require no user interaction, without physically remaining at the workstation console. In "traditional" UNIX environments, users can either run non-interactive procedures in the "background", or use the "at" facility to submit a script, or "job", to be run by the system. Neither of these options is supported in the current Athena environment. We thus perceive the need to be one of an ability to run non-interactive, unattended jobs within the Athena environment.

Service Overview	We envision a system consisting of a pool of dedicated, centrally-managed Athena server machines on which users can execute such non-interactive jobs remotely. Submitting and running jobs Users could submit a job from any Athena workstation, specifying the type of Athena machine they want the job to run on. The job would be directed immediately to a central "master" server, which would dispatch jobs to "slave" execution machines. An execution slave would only run one job at a time, so a job would remain queued at the master until a suitable slave machine became available. Authentication and authorization would be based on the user's Kerberos principal. The intent would be to provide a system that is as "transparent", and compatible with the Athena interactive computing experience, as possible; a user should easily be able to submit a job as a script of the same shell commands that would be entered during a normal interactive login session. The execution machines would essentially be identical to normal Athena workstations, with the same facilities available, except for those facilities which are only suitable for interactive capabilities. Users would specify the resource requirements for the job, e.g. type of machine and maximum elapsed time, at submit time. Submitted jobs would enter a queue; scheduling would be done using a first-in, first-out scheme, probably adjusted for fairness. There would most likely be multiple queues, based on resource requirements and perhaps priority. Users could query the server to obtain job status, and cancel jobs that were no longer wanted. Users would optionally be notified, via email and/or Zephyr, when a job begins and/or ends. User accounts would be added, and home directories attached, on the execution machine only for the duration of the job, similar to an Athena login session. The user's normal login shell would be executed, with the submitted script as input. Standard output and error files would be written to the user's directory, or emailed to the user, per user option. Jobs would be subject to a strict limit of elapsed time; any job exceeding that limit would be killed. Since a slave machine will never run more than one job at a time, the user is assured of having no contention for CPU cycles or other resources while the job is running, so that the time needed to complete the job will be as predictable as possible. At the end of each job, the slave will perform a cleanup and check procedure, to ensure system integrity. There will most likely be a strict limit on the number of jobs a user can have queued and/or running at one time. We will possibly program the job scheduler to further prevent one user from unfairly monopolizing the service. Please see the end of this document for examples of how such a system might be used. Job credentials In order for the job to run with Kerberos credentials, as is required for most authentication and authorization purposes in the Athena environment, the user would optionally be able to acquire a long-lived, renewable Kerberos 5 ticket-granting ticket, which would be forwarded as part of the job. The master and slave servers would manage these tickets, renewing them as needed. In addition, the execution server would use this ticket to acquire Kerberos 4 tickets, and AFS tokens, the latter being most critical for users to be able to access their home and other directories from within the job context. Users who did not wish to forward a long-lived ticket could optionally choose to forward an existing short-lived ticket, or to run the job without any tickets or tokens. A renewable ticket could be maintained up to the maximum life permitted by our Kerberos configuration (currently one week). Accounting Experience with prototype Athena batch facilities, and with other services, leads us to be concerned that a completely free and open longjobs service will be oversubscribed to the extent that those with real needs will be frozen out and frustrated. For this reason, we are investigating an accounting and billing component. Billing has the advantage that it will discourage abuse, and that it will provide a rate base to fund expansion of a genuinely popular service. We will examine the following services as possible cost models: Athena cluster; Athena dialup; Athena printing; Electronic Classroom; Tether. Charges for using the service may manifest (for example) as debits against a predetermined time allocation (e.g. for course work), against a pre-paid sum, or as charges on a monthly bill. Charges may be based on a per-hour charge, a flat rate with usage quota, rates varying by time, priority, etc., or some combination thereof. There may also be a charge for account activation. Users may have multiple accounts available to them. Users should have a way to query the system for their current accounting information. Administration and support The master and slave machines would be managed by Athena Server Operations. Mechanisms would be implemented for them to regulate the service as needed, e.g. limiting queue size, locking out misbehaving users, etc. The Academic Computing Support Team will provide user support for the service, and be able to control certain access to the queues, so that queues could potentially be reserved for special needs.

Design Issues and Problems	The following issues and potential problems are inherent in the approach outlined above: When a user submits a job, there is no guarantee that it will be run before a certain time. However, unacceptable delays should not be a chronic problem. The system will be scalable; additional servers, master as well as slave, could be added as warranted. We will also strive to implement various policies and procedures to avoid system overload. There is also no guarantee that any external resources the job might require, such as locker software, or a software license based on the number of active users, will be available when the job executes. We will consider the feasibility of ways for the service to keep jobs queued until the required resource is available. Jobs in this environment will be completely non-interactive; there will be no X display or controlling terminal. Programs which require such an interactive component, even for a brief initialization stage, will not be supported. We will try to have a test facility, such as a queue with a short time limit, by which users could test that a program runs successfully in the non-interactive longjobs environment. We will strive to support running any software that runs on Athena platforms, if it is capable of running in a non-interactive mode, documenting that capability as needed. Even if the user acquires a long-lived renewable Kerberos ticket for use by the job, there is no guarantee that the ticket will not expire while the job is waiting to execute. We might consider implementing a remote renew command, allowing a user to create a new ticket-granting ticket interactively for an already-queued job. Using long-lived Kerberos tickets would be a serious problem if the system were ever compromised; currently there is no way for a stolen ticket to be revoked, even if reported. Cautious users may prefer to submit jobs without a ticket; however, the job would then not be able to access the user's AFS directories (unless the directory was world-accessible). To alleviate this problem, we might consider implementing an option by which the execution server could acquire default AFS tokens for the job. One such possibility would be to use a special AFS group for the execution servers; the user would have to add this group to the directory's ACL, thus also making the directory accessible to other users of the service. Another possibility would be to create a per-user group to which the longjobs server had administrative rights; this is more secure than the first option, but raises significant overhead and support cost issues. Due to the special security considerations of this system (particularly with respect to managing long-lived tickets), the execution servers will be configured to run whatever system management procedures are deemed necessary to ensure system integrity (similar to what is done now on Athena dial-up machines). Since this could involve a prohibitive amount of new development work for a particular platform, it is conceivable we may choose not to provide any execution servers for that platform. The goal, however, would be to have available all platforms which are part of the current Athena release, if feasible. The intent is to provide execution machines which are essentially equivalent to workstations in the current Athena deployment. There are no plans to be able to provide users with additional computational capacities, beyond what is available on a typical Athena workstation; there will be no "supercomputer" machines, or machines with much more memory or disk space than might be found on a machine in an Athena cluster. There are no plans to support "parallel" execution, i.e. a job requiring multiple execution machines. The system would not provide a way to run jobs on, or submit jobs from, non-Athena platforms. However, the system would be implemented in a way that would not preclude its being extended to supporting other platforms in the future. We do not envision supporting "very long jobs", i.e. those requiring many days or weeks to complete. The maximum time limit for any queue will probably be on the order of one or two days. Because of security requirements, users will not be able to run jobs as root. There are no plans to support submitting or executing jobs except as the user's null-instance Kerberos principal, using the normal user ID as given by Hesiod. Again, we would like to encourage your comments, questions, and any other responses concerning the solution outlined above.

Examples	The following are general examples of how a user might create and submit a job for the service. (The longjobs command names and user interfaces shown below are merely illustrative; they do not indicate actual commands). Creating a job script To create a script for a job, the user would enter the appropriate shell commands into a script file named, for example, "my_script"; this file might look like: cd <my_working_directory> ./my_program > /tmp/raw_output ./crunch_data /tmp/raw_output > ./my_data In this example, the script changes to the user's working directory, runs a program which writes raw output to a temp file, and then runs another program to process that output, with the resulting data written to the user directory. The last step would be necessary if, for example, the initial output was so large that it would exceed the user directory's quota. Submitting a job The user would then submit the script to the longjobs service, e.g.: athena% lj_submit --queue sun4-12hour --zephyr begin,end my_script Password: Job XXXX submitted. [There may be a more user-friendly submit program which prompts for the submission parameters.] In the example, the queue name, sun4-12hour, would be a pre-defined system entity, providing a way for the user to specify the resources required for the job, in this case, a sun4 machine, with a time limit of 12 hours. The --zephyr option specifies that the user should be sent a Zephyr message when the job begins and ends execution. (There would be a similar option to send mail). The submit program would then prompt for the user's Kerberos password; this would be necessary to create a renewable ticket for the job. Additional options would exist for the user to forward an existing ticket, or to run the job without any ticket. Finally, when the job has been enqueued successfully, the submit program will output a unique job ID, which can be used to identify the job in subsequent status or delete commands. Getting job status The user will be able to list the status of the queue(s), for example: athena% lj_status Job ID Username Queue Jobname Limit Status Elapsed ------ -------- ----- ------- ----- ------ ------- XXXX jdoe sun4-12hour my_script 12:00 Running 01:37 YYYY jqpublic sun4-2hour foo 2:00 Queued - In this way, users will be able to monitor their jobs' progress, and get an indication of how busy the system is. Removing a job Finally, if the user decides to cancel the job: athena% lj_cancel XXXX where XXXX denotes the job ID displayed when the job was submitted.

longjobs-dev@mit.edu
Last updated $Date: 2000/01/31 20:30:27 $ GMT