Proposed test of longjobs service |
We propose to implement a test longjobs service, based on the service design we have outlined, and involving a small of number of courses, and possibly some individual users. The broad goals of the test are:
Document Layout |
This document:
|
|||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The Problem |
The major problem we are addressing is the requirement, as stated in the Athena Rules of Use, that users remain present at the console of an Athena workstation for the duration of a session. This presents a dilemma for users who need to execute long-running, typically non-interactive jobs for legitimate academic purposes, which has led to frustration and abuse. Our proposed service would give users the ability to run non-interactive, unattended jobs within the Athena environment, by providing a pool of dedicated, centrally-managed Athena server machines on which users can execute such non-interactive jobs remotely, in as regulated, ordered, and predictable a manner as possible. | |||||||||||||||||||||||||||||||||||||||||||
The Proposed Solution |
The goalThe intent would be to provide a system that is as "transparent", and compatible with the Athena interactive computing experience, as possible; a user should easily be able to submit a job as a script of the same shell commands that would be entered during a normal interactive login session. The execution machines would be comparable to normal Athena workstations, with the same facilities available, except for those facilities which are only suitable for interactive capabilities (e.g. a display).General architectureAccess to the service would be granted through a registration process administered by the appropriate support group (see Registration, Support, and Reservations). Usage would be limited by a group-based quota system, maintained on the master; a quota of job hours would be granted at registration time.A registered user could submit a job from any Athena workstation, specifying the type of Athena machine they want the job to run on, and other desired job parameters. The job would be directed immediately to a central "master" server, which would dispatch jobs to "slave" execution machines, as they become available. An execution slave would only run one job at a time, so a job would remain queued at the master until a suitable slave machine became available. Authentication and authorization would be based on the user's Kerberos principal. Queues and SchedulingAll submitted jobs will enter precisely one of several queues; the queues will distinguish jobs based on machine type and time limit, and will have access control lists. We recommend having queues with at least two different time limits, for relatively short or long jobs.Scheduling of all jobs will be based on a first-in, first-out scheme. A slave machines will serve one or more queues (e.g. the "short" or "long" queue, or both) and may be migrated between queues to satisfy varying demands on load.
Users can query the server to obtain job status, and cancel jobs that
were no longer wanted. Users would optionally be notified, via
e-mail and/or Zephyr, when a job begins and/or ends.
Tickets/tokensIn order for the job to run with Kerberos credentials, as is required for most authentication and authorization purposes in the Athena environment, the user would optionally be able to acquire a long-lived, renewable Kerberos 5 ticket-granting ticket, which would be forwarded as part of the job. The master and slave servers would manage these tickets, renewing them as needed. In addition, the execution server would use this ticket to acquire Kerberos 4 tickets, and AFS tokens, the latter being most critical for users to be able to access their home and other directories from within the job context. Users who did not wish to forward a long-lived ticket could optionally choose to forward an existing short-lived ticket, or to run the job without any tickets or tokens. A renewable ticket could be maintained up to the maximum life permitted by our Kerberos configuration (currently one week).ImplementationWe will use the Portable Batch System (PBS) as a basis for our system. PBS was developed at NASA Ames Research Center, and is supported by MRJ Technology Solutions, recently acquired by Veridian Corporation. It is freely available in source form, and we are free to modify, though not redistribute, it.Athena Development will make the necessary enhancements and additions to PBS, including the support for Kerberos authentication and authorization, the management of renewable tickets and tokens, compatibility with the Athena login system, Zephyr support, and integration with a group-based quota system. The server machines will be administered by Athena Server Operations, and registrations by the appropriate support group (see below). Registration, Reservations, and SupportThe Faculty Liaisons will handle registrations, reservations, and support for course-based use of the service (through faculty and TAs). Athena Accounts and Consulting would handle registrations and support for other uses, according to whatever policies are developed for such use of the service.To ensure that sufficient resources are available for their classes, instructors would be able to request a reservation from the Faculty Liaisons for a particular number and type of machines during a specific time period. This would be implemented via a separate queue mapping to an appropriate subset of machines in the pool, ACLd to the group for that class. We would require advance notice for such requests (in order to ensure sufficient machines are available, to transfer general queues to non-reserved machines, etc.). Similarly, reservations would probably need to be staggered to build in some free time for handling transitions. See Over-subscription for more details on the registration process. | |||||||||||||||||||||||||||||||||||||||||||
Concerns |
Over-subscriptionA low or non-existent cost for using the longjobs service may encourage more usage than can readily be supported. However, if the service is to be viable, the cost for the user needs to be low enough to encourage the use of the service instead of the already "free" service provided by the available cluster machines. In other words, it is important not to create an under-subscription problem in addressing this issue.Methods to prevent over-subscription need not involve simply some charge for using the service. There may also be barriers to usage, or throttles that control the amount of usage. The methods available to this service involve registration, quotas that limit usage, billing for actual use, and the queueing system. For a service test, we propose the use of quotas and registrations. The issues surrounding billing are discussed in the next section. The service will be available only to registered users. Registration will be based on the user's Kerberos principal, and will require some description of the intended use, and an estimate of the quota needed. (If we decide to implement billing, a billable account would also be required). Registrations would typically be done on a group basis, by departments and/or faculty members on behalf of students in a class. Registered users will be assigned one or more group names, which is intended to classify the intended uses. A request for registration should be denied if the service is already at its subscription limit. The user will have to provide a valid (user, group-name) pair to be able to submit a job to the service. Registrations will expire. The expiration time will either be specified/negotiated at registration time, imposed by policy (e.g. no registration may last more than the current year or current semester), or some combination thereof. Expiration of registrations is necessary for reasonably accurate estimates of quota likely to be used. The quota available to a user is the sum total of quotas available for all of the (user, group-name) pairs for that user. The quota provided is not necessarily what is requested at registration, it may be less due to constraints on the actual available quota hours available. Quotas will be valid until a registration expires or is renewed - in the case of renewal, a new quota will be assigned. Quotas may be valid only on certain types of machines (e.g. SGI only, older sparcs only), to allow for better estimation of future service usage. The queueing system acts to help keep the service from being over-subscribed by providing both positive and negative re-inforcement to the user. Specifically to teach the users to plan for submitting their jobs at non-peak times in order to get their jobs running and finished faster. Since delays in running jobs may manifest as the highest apparent cost to the user, there is an incentive on our part to limit registrations and quota assigned. This, however, will be at the expense of turning people away from the service and pushing them toward use of cluster machines. Balancing these costs will take some experience. CostsAs our policy will state explicitly that execution machines will be comparable to cluster workstations, with no intention of providing super-computing capability, we estimate that the cost of purchasing a slave for the proposed test currently would be about $2800 per machine for a Sun (Ultra 5/400, 256MB RAM), $1700 for a Linux PC (Dell GX110, 256MB RAM), and $4800 for an SGI (rackmount O2, 128MB RAM).For an eventual service roll-out, we propose a 2 year renewal cycle for the slaves, in order to keep pace with the cluster machines. Old slaves could be recycled for other services, or kept in the pool, with a factor to equate them with a job-hour on the faster slaves. For the master server machine to be used in an eventual production service, our ballpark estimate is a cost of up to $15000; this includes the optional cost of some form of hardware-based redundancy. This would not be needed for the proposed test, for which we estimate a cost of about $3900 (Ultra 5/400, 512MB RAM). Finally, we estimate that the following staff costs would be incurred:
We will track the staff time spent maintaining the service, to make sure that it does not become an inordinate drain on IS resources. Security
Scale, Robustness
|
|||||||||||||||||||||||||||||||||||||||||||
Open Issues |
BillingThe possibility of billing for use is a controversial issue. On the one hand, charging users directly for usage could help prevent the system from being over-subscribed (see above), and/or allow us to recover at least part of the costs of running the service. On the other hand, many of the users we have corresponded with seem to feel strongly that, just as with the use of "free" cluster workstations, they should not be billed for using the service for valid academic purposes. Also, currently there is no general billing infrastructure for handling academic computing services, so adding a billing component would likely incur a significant cost in staff resources.Given the presumption that at least some fraction of the service will be subsidized, i.e. so that users would not be billed directly for such use, the lack of supporting infrastructure to do billing, and the controversy surrounding the issue for a service viewed as meeting a basic academic computing need, we recommend that no billing system be implemented for our proposed test of the service. IS could decide to add a billing component, possibly as part of an actual roll-out of the service, or later, as a basis for expanding the service, or to recover the costs of maintaining the service. Regardless, the system will always write accounting records, containing any fields (e.g. account number, type) which might be required to interface to an eventual billing system. This data would be derived from the group under which the user submitted the request. Such records will also be critical in our determining how well the system performs; we will write tools to extract information such as average job and queue wait times. We expect that the proposed service test would help to suggest what eventual billing model, if any, might be appropriate. SGI platform supportA disproportionate share of the expressed need for the service is to run jobs using software which only runs on the SGI platform, though we are striving to reduce the share of SGI machines in Athena clusters. For now, given that we intend to support SGI software on cluster machines until 2003, and the suggested 2-year renewal cycle of slave servers, we recommend that we support the SGI platform for the test. Our hope is that the migration path for the software in question will be clearer by the time we need to purchase the machines; if so, we could revise this recommendation accordingly.Suppressing interactive capabilitiesThere is concern that the system should disallow the ability of users to run interactive shells in their jobs (e.g. by starting an xterm displaying back to the user's workstation), so that they would not be able to probe the execution machine for possible security weaknesses. While our Sun dialup servers do forbid running xterm (and other X client) processes which display remotely, via modifications to the kernel, that is done in the interest of minimizing system load; that is not an issue for a longjobs slave. It could be argued that a significant effort to make similar modifications to the other operating systems used by this service would not be worthwhile. (Note that any such process started by the user's job would be killed at the end of job). |
|||||||||||||||||||||||||||||||||||||||||||
Proposed service test |
We propose a service test, restricted to a few classes and possibly a
small number of individual students, in order to examine the basic
feasibility of the proposed service to satisfy the core need: running
long, unattended jobs on Athena for academic purposes. The main goals
of this test will be to determine whether this approach would provide
users with an effective solution, and to gather data on its resource
requirements. Whether to actually establish a longjobs service, on
what scale and with what funding sources are not questions to be
determined by this team, but we believe the test will yield important
information needed as input to these decisions.
Questions to be answered by the test
Participants and ExpectationsSelection of participants would be made closer to implementation time, when such details as platform support for SGI-only applications and needs for the specific term are clearer; some likely candidates are those whose estimates are shown in the section on scale, below. The number and type of machines allotted to the test will also be a factor in how many classes and of what sizes to include, but we would recommend distribution across several departments with the greatest perceived need, willingness to help test under the prescribed conditions, and best apparent chance of providing representative data.We recommend also including some individual students with legitimate academic needs outside of classes, to gather as much data as possible on the continuing cases of students who appear to be violating the Rules of Use (against leaving a machine unattended or using multiple machines) in the clusters because they have no other means of running non-interactive jobs. We will explain to the participants that we are conducting the test to determine whether such a service meets their needs, to answer technical questions, and to gather data on what such a service would cost IS as the basis for determining whether an acceptable billing model could be applied to it. All participants would be required to agree to some basic conditions:
TimelineRealistically, spring 2001 is the earliest term when we could begin a test. The test should run for a full term, after which there should be a decision to do one of the following:
Scale FactorsFactors in considering the scale for the test include:
Peak Usage CalculationAssuming students working on an assignment will tend to submit their jobs in close proximity, we look at the time span to complete the whole set for a given class, vs. how many machines are available to them.Observations:
Case 1
This corresponds to estimates from Ceder (MatSci, 30 students x 20 hrs/job) and Rutledge (ChemEng, 25 students x 24 hrs/job), both on SGI. Case 2
This corresponds to estimates from Cesnik (AeroAstro) and Gupta (Sloan), who each estimated 50 students for "a few hours", both on Sun. Number of machines: Three OptionsWe can take one of several approaches:
|
Last modified: Thu Sep 27 18:53:24 EDT 2001