Subject: Facility for running long, unattended jobs on Athena From: longjobs-dev Reply-to: longjobs-dev Hello. Naomi Schmidt wrote to you last month asking for input on whether Information Systems should develop a facility for students to run long, unattended jobs on Athena. As an outcome of the meeting where that feedback was presented, a team was formed to examine customer needs in more detail, study the technical issues, and finally present the business case for such a service; IS will decide whether to move ahead with a pilot project based on this work. Our goal is to ensure that we understand user needs, and, if feasible, devise a solution that meets those needs, to the fullest practical extent. As the next stage in this process, we have put together an overview of the proposed service, and ask that you answer some questions about whether you would find it to be a suitable solution. If you are interested in a more detailed discussion of design considerations and implementation issues, please see and feel free to send any additional comments. Thank you for your time, The "longjobs" Discovery Team ============================================================================ Overview of Proposed Athena "longjobs" Service Note: the primary focus for this service would be students doing course work (homework, projects, or theses); if other types of use would lead you to different answers for some questions, please specify those. Types of Jobs Supported ----------------------- * any software available on Athena cluster machines which is capable of running in a non-interactive mode will be supported * programs which require an X display will NOT be supported * jobs will execute on machines comparable to cluster workstations Questions: - How much would the inability to run programs requiring an X display hurt your ability to make good use of the service? - To what extent would you be satisfied with using standard cluster hardware (in particular, CPU speed, disk size, memory)? User Interaction ---------------- * users could submit a job from any Athena workstation, via a typical UNIX command line interface * a job would consist of a script of the same shell commands that would be entered during a normal login session * jobs would execute from the user's Athena home directory * log files would optionally be emailed or saved to user's directory * users could monitor and control job status and be notified, via email and/or Zephyr when a job begins and/or ends * authentication and authorization would be based on the user's Kerberos principal * to deal with problems inherent in short-lived Kerberos tickets, options may include forwarding a long-lived renewable ticket * for examples of basic usage, see http://web.mit.edu/longjobs/doc/service-description.html#examples Questions: - Do you have a strong preference for an interface which prompts for submission parameters? If so, text-based or graphical? - In the event a system failure prevents your job from completing, what would you expect for notification/resolution? - To what extent would you be satisfied with the use of long-lived Kerberos tickets? For technical details, see http://web.mit.edu/longjobs/doc/service-description.html#issues Job Scheduling -------------- * jobs would be queued, with positioning possibly adjusted for fairness and prevention of abuse * each execution machine will run only one job at a time; no contention for CPU cycles or other resources while the job runs * jobs would be subject to a strict limit of elapsed time, maximum to be determined but probably on the order of a day or two * there may be a mechanism to reserve access for course work or other special needs Questions: - What would you consider an acceptable amount of time for a job to wait in a queue before starting? - Would you want a way for the user to specify a job's priority? Would there be an accounting or billing component? -------------------------------------------------- It is an open question whether there would be an accounting or billing component. Experience with prototype Athena batch facilities and with other services leads us to be concerned that a completely unrestricted longjobs service will be overused to the extent that those with real needs could be frozen out and frustrated. Billing has the advantage that it discourages abuse, and can provide a rate base to fund expansion of a genuinely popular service. For course work, an allotted quota of free time may be deemed more appropriate. Questions: - What sort of accounting/billing model would be most appropriate? A. No fees; fund it by cutting back other IS services B. Allotted amount of free time, after which fees are charged C. Allotted free time for course work, charge for other use D. Departments pay for course work or other allotments, other uses charged to individuals E. Strictly pay as you go F. Other (please describe) - Do you feel that a billing component would result in you choosing not to use the system? - Do you believe that a billing component would be useful to improve system availability? If not, are there other alternatives you can suggest? - If fees are charged, what level would be acceptable? (e.g., one ballpark estimate suggests a rate of $1/hour) - Would you be willing to pay different rates for different levels of service? (e.g., more for rush jobs) Additional Questions -------------------- In addition to the questions above, we would appreciate your answers to the following: - How would you rate the overall suitability of the outlined solution for your needs on a 1-5 scale, where 1=useless and 5=perfect? - How much would you expect to use such a service? Please estimate monthly usage in terms of number of jobs, expected time per job, and platform. Again, we would like to encourage your comments, questions, and suggestions concerning the solution outlined above. longjobs-dev@mit.edu