/mit/longjobs/doc/support-issues Mon Oct 15 11:08:41 EDT 2001 The following is a revision, based on details of the test implementation and a preliminary understanding of intended scope of service. (Previous version is /mit/longjobs/doc/support-issues-20000518, prepared in conjunction with /mit/longjobs/doc/summary-dec-17.txt, a report back to the original "Dec. 17" group which reconvened in May, 2000.) During the Spring 2001 and Fall 2001 tests, all user registration, documentation, and support was provided by longjobs-dev. Currently, we are granting access for these usage categories: course (grad or undergrad) UROP thesis (grad or undergrad) misc (other academic project, not sponsored research) with no need thus far to distinguish priorities or ration quotas. Guaranteed access for courses duing crunch periods is to be handled through reservations if needed. Questions: - Should admission and quota allocation be prioritized, or kept first-come/first-served, with any rationing decisions for new requests deferred until near capacity? If service implementation splits access control between FLs (for courses) and Athena Accounts for other uses, do we coordinate quota calculations on a case-by-case basis, have predetermined allotments, etc.? - What should we tell people who are turned away, e.g. - if sponsored research, should someone follow up with their advisor about whether Athena cluster usage is appropriate? - if legitimate use, but we're at capacity limit? - Future policy decisions, capacity planning, marketing through Owls? Support overview ================ FLs would provide support (via faculty/TAs) for course use, as we do for other Athena services. We would also troubleshoot use of 3partysw on demand (as we generally do for courses) and document specific usage, bringing alexp or dev in as needed for vendor contact, diagnostic help, or modifications. Estimate is for 10% FTE, absorbable by existing staff. Support for other approved uses would be done by Athena Consulting (with appropriate agreement on support level and possibly additional student staffing if high load). Once the service is established, the costs for training consultants and providing help on general usage should be manageable, but it's important to have expertise on hand for troubleshooting. The initial estimate is for 15% FTE, with an expectation that the load will not be high, but that support costs are not entirely absorbable; funds should be provided for an appropriate increase in the Athena Consulting budget. Areas of user support --------------------- - registration, quotas, reservations (see below) - basic use (commands, scripting jobs, etc.) - emergency requests for quota increase, queue manipulation, or extra disk space - troubleshooting job failures or unexpected outcomes - 3partysw interactions - problems in job script - problems with access to lockers (ACLs, quota) - longjobs-service problems - other infrastructure problems (e.g., fileservers, license servers) - depending on default tokens implementation, possibly account configuration (users may need help with ACLs in homedir to allow a server-based group write access, for a job to run without tickets) Support should have access to logs for troubleshooting, bits to retrieve missing output from spool area if appropriate, appropriate access to requeue jobs, etc. Athena Accounts and Consulting will need clear guidelines and appropriate access for granting access, setting quotas, and problem remediation. Registration ------------ For a course using the service, FLs would either set up a new group or use an existing one (perhaps from an automatically-generated Class Participant List) in Moira. For non-course uses, registrations should go through Athena Accounts, who would add them to the appropriate group in Moira. Non-course groups would have default quota, to be raised as appropriate in individual cases; Accounts should have clear guidelines on calculating acceptable ranges for doing this on their own, and at what point coordination with FLs is needed. Registration is likely to require minimal effort, and be absorbable within the existing load of account/group requests, assuming appropriate policies are in place to keep procedures clear. Reservations ------------ Reservations for courses would be a queue manipulation task, done by FLs on longjobs-service. The level of effort is estimated to be akin to reserving an eclassroom or NMC machine, likely to include negotiating with other requestors for fairness, and more complicated during peak usage periods. Load will depend on number of courses and clustering of demand, presumably with peaks toward end-of-term. There will be logistical issues with transitioning queues; course staff will be expected to provide appropriate lead time for reservations. Quota increases --------------- For courses, likely to be a moderate level of effort akin to course locker quota requests or term-allocation requests, but perhaps more frequent (estimating job times may be harder than space needs) and requiring more negotiation to balance needs between courses. Unlike disk quota (where users may be able to free up space or use other temporary workarounds as they near the limit), once allotted time on the system is exhausted, users have no recourse but a quota increase. Hence there are likely to be emergency requests of a more pressing nature, whether due to lack of sufficient planning or unforeseeable complications. Since the service would primarily be used by students, it is likely that a good number of occasions will fall outside of business hours, and hence OLC will need clear guidelines and appropriate access to deal with these. Again, load will depend on number of users and clustering of demand, presumably with peaks toward end-of-term.