Notes on the items identified as potentially needing further development
work, including lists of some of the issues, and guesstimates as to the
amount of work involved.

* Don't allow group membership to be inferred
  - make owner name (and job name?) protected attributes (change qstat)
  - hack select code to prevent selecting by account or user name
  2-3 days dev

* Automatic temp disk space creation/clean-up
  - make separate AFS cell where master has admin access
  - master would track cell usage
  - add space resource that could be specified in qsub
    - job would wait in queue until space could be created
    - master creates volume on suitable partition, creates user, makes
      mount point, adds user to acl
    - AFS path must be passed to job - env. variable?
    - at EOJ, master must mark volume to be deleted somehow - file with
      EOJ timestamp in top level directory?
    - scheduler issues - resource not tied to node: race problem?
  - periodic master task would nuke volumes after some grace period
    following EOJ
  - what about input side? (much less likely case, and harder to solve)
  1-3 months dev, + ops maintenance (likely absorbable), + machine cost

* Upgrade to OpenPBS 2.3 or PBSPro
  - resource monitor and ping queries between master and MOM now UDP-based
    by default - configure option can be used for TCP
    - done mostly to keep down node from hanging things, we avoid this by
      timing out on the connect
    - these connections are not now kerberized, so need some work anyway
  - we already updated to support later Solaris rev's, and incorporated
    relevant bug fixes
  - PBS Pro has useful features:
    - advance reservations
    - requeue or delete jobs running on dead nodes
    - can associate node(s) with queue
    - new node attributes, resources
    - improved pbsnodes command
  ~2 weeks to upgrade to OpenPBS 2.3, more for PBS Pro
  (latter not including time to incorporate wanted scheduler features)
  + $1000/yr support for PBS Pro

* Better scheduler
  - Reservations
    - Maui and PBS Pro have some sort of reservation support
  - node-based queue limits (e.g. aggregate)
    - PBS Pro may support this
    - maybe use resources_available queue attribute somehow
  - predictable execution time
    - might be feasible to change the current scheduler to transmit
      the job sort order to the server, and have the latter calculate
      the estimated time - this would be problematic if the "one job
      on one node" model were to change (scheduler backfill, etc.)
  - ability to query
    - Maui has this, though not Kerberized
  ~2-3 months to replace the scheduler - needs lots of investigation, testing

* Replace quota DB with Oracle
  - probably don't bother for anticipated scale of service
  1-2 months

* Quota system tweaks
  - Better definition of how to deal with member's record when the
    containing group's record is modified
  1 week

* Default tokens
  - IP host-based group that all slaves would be in - < 1 week
  - User-based group for which master has admin rights (add slave to group
    for duration of job)
    - more overhead and support costs (esp. with respect to clean-up of
      groups) - groups either added to Moira, or directly to AFS protection
      database
    - more secure than first option
    guess 2 weeks dev, but need to sort out support issues first

* Encrypt all network requests
  - currently only encrypts data such as credentials, files
  - priority would be to encrypt all user/master communications
  2 weeks

* Clean up "RPP" messages (inter-server status, UDP-based)
  - server uses this to ping nodes, note which are down
  - options include removing this entirely, or Kerberizing
  1-3 weeks

* General code clean-up
  - inspect for short-cuts taken in prototype (important) - 1-2 weeks
  - make suitable for incorporation - nice long-term goal, not urgent
    - distinguish between Athena-specific (e.g. quota system, login system)
      and generally useful features (Kerberos, AFS support) - big effort,
      may not be worth the cost at this point - OpenPBS future is uncertain

* Command namespace
  - wrappers for all PBS commands?
  - rename commands?
  1 week

* Linux slaves
  - qualify and purchase machines
  - port ops server modifications
  - PBS (with local mods) already ported
  ~1 week to qualify machine, 1-3 weeks for ops port

* User documentation
  - review by other Support staff, reorganization and polishing
    by TPS for wider use
  guess 3 weeks, collective

* Internal support documentation
  - collect basic admin, diagnostic commands in one place  
  - document policies for non-course use, particularly for:
    - determining eligibility of users, mapping to account types
    - initial quota, guidelines for increase requests
    - requests for addiitonal disk space
  - quota and capacity calculations
  - what to do with requests beyond current resources or outside allowed
    uses (in particular, to avoid bouncing stopit cases back to
    clusters) 
   2 weeks for doc, but many policy decisions need to be made first
   see /mit/longjobs/doc/support-issues for more details