MIT Information Systems    Longjobs -- Running Jobs
Athena Owl    On this page: Defaults | Submit via longjob | Submit via qsub | QSUB environment variable | Cancel Job | Modify Job

How to submit a job

(See the job script page for details on preparing your job for submission.)

There are two ways to submit a job:


Defaults

The following apply to each job by default, but can be changed with qsub options, directives in the job script, or the QSUB environment variable (with precedence in that order, qsub options being highest). For details, see the section on qsub options and the qsub man page.

Job notification
The service will send email (to submitter@mit.edu) and zephyr messages when the job begins and ends, or if it is aborted.

Standard error/output disposition
The service saves a copy of each stream to the submitting directory, using a filename constructed from the job script name and jobid (e.g. ~/my_jobs/jobname.e483 and ~/my_jobs/jobname.o483). If it is unable to save the files there, it will instead email them to you.

Rerunnability
The service will attempt to rerun your job in certain server-failure cases; this may cause problems with files your job writes to directly (as opposed to standard ouput and error streams, which the service will simply recreate). If it is not feasible to construct your job to handle the possible existence of data from a previous run, you should submit the job as non-rerunnable (with the -rn option in qsub, a script directive, or the QSUB environment variable).

Authentication
The service will prompt for a password to create a renewable, forwardable Kerberos 5 ticket for the job, from which Kerberos 4 tickets and AFS tokens will be created. The service will manage the renewal of tickets and tokens as needed. Alternatively, there are options to forward your current TGT to the service, or to run the job without tickets or tokens, but these require special care -- in the first case, authentication may expire before the job completes (the default TGT has a lifetime of 10 hours and is not renewable); in the second, you would need to specify a world-writable directory for output files and would not be able to run any programs on Athena which require authentication.

Running longjob -submit

(See the job script page for details on paths, disposition of output, and other considerations when preparing the job script.)

In the example below, user input is shown as value; [value] indicates that the user accepted the listed default with the <Enter> key.

Before running longjob -submit:

athena% longjob -submit

You are registered in the following account(s), with available
quota as shown:

Account           Remain   Queued   Running   Expires
-------           ------   ------   -------   -------
29.123            97:23     0:00     0:00    01/01/02

The following queues are configured.  You may not submit a job
to any queue whose status contains 'Inaccessible'.

Queue             Limit  Run  Que  State
----------------  -----  ---  ---  -----
any-medium        06:00    0    0
any-long          27:00    0    0
linux-medium      06:00    0    0
linux-long        27:00    1    0
sun-medium        06:00    1    4
sun-long          27:00    0    1
reserved-1        27:00    1    3
reserved-2        27:00    0    0  Inaccessible
                         ---  ---
                           3    8

Please enter an account: [29.123]

Please enter a queue: sun-long

Jobs with shorter time limits will tend to have queue priority.
You may accept the default time limit for the queue, or specify
a lower limit, if you know that your job will complete in time.
In either case, your job will be terminated if it exceeds the
time limit.
Please enter a time limit in hours, or in hh:mm format: [27:00] 15

Please enter the name of your script file,
or <Enter> to read commands from standard input.
Script file: [<stdin>] foo

[Creating a renewable Kerberos ticket for use by the job]
Password for jqpublic@ATHENA.MIT.EDU:
[Forwarding renewable Kerberos 5 TGT...]
478.longm.mit.edu

Your job has been submitted.  Use the 'qstat' command to display
the status of your job, and the job queues.

The current queue status:

Job ID      Username  Queue         Jobname     Limit  State  Elapsed
------      --------  -----         -------     -----  -----  -------
478         jqpublic  sun-long      foo         15:00  Run    --
   Job started on Thu Feb 01 at 11:16

Note that your job will be listed with State = Hold if you have other jobs running which might put you over quota; see the quota information on the Checking Job Status page for details.

Running qsub

(See the job script page for details on paths, disposition of output, and other considerations when preparing the job script.)

Before running qsub:

Basic syntax for qsub: It is generally recommended that you use a script file rather than STDIN, so you have a copy of the job's exact contents in case of problems or questions.

For a list of available queues, use the qstat -q command (more details on the checking status page.)

Other common qsub options

(This is not an exhaustive list; see the qsub man page for information on additional options.)

-l resource_list
Defines required resources and limits for the job. In particular to set a shorter time limit than the queue default, use
     -l walltime=hh:mm:ss
For example,
     -l walltime=15:00:00
will limit the job to 15 hours. Note that time should be specified in the format hh:mm:ss (otherwise a single number will be interpreted as seconds and xx:yy will be interpreted as minutes and seconds).
-r rerunnability
Defines whether the job is rerunnable. In case of certain server-failure conditions, the service will attempt to rerun a job if this is set to y; the standard output and error streams will be recreated, but if your job attempts to write to output files directly it may produce undesirable behaviors (e.g., adding to data from an earlier partial run, or failing to create a file that already exists). If set to n, the service will not attempt to rerun the job; instead, it would simply produce a message about the failure..
String consisting of n for no, y for yes. Default is y.
-z zephyr_options
Defines conditions under which server attempts to zephyr you about job status.
String consisting of n for none, or a combination of a (job aborted), b (job began), e (job ended). Default is abe.
-m mail_options
Defines conditions under which server sends you mail about job status.
String consisting of n for none, or a combination of a (job aborted), b (job began), e (job ended). Default is abe.
-E email_output
Defines which (if either) of the standard output or error streams the server will mail you upon completion of the job (instead of saving as files in your submit directory). This should only be used if the files are likely to be reasonably small; if the size is beyond a certain limit (currently 256kb), the file will be saved on the server rather than emailed and help from an administrator will be necessary for retrieval.
String consisting of e for error, o for output, oe for both, or n for neither. Default is n.
-e error_path
Defines the file where the standard error stream of the batch job will be copied upon job completion.
Default is submit_dir/job_name.ejob_id (for example, ~/my_jobs/foo.e483 for a script foo submitted from ~/my_jobs).
-o output_path
Defines the path where the standard output stream of the batch job will be copied upon job completion.
Default is submit_dir/job_name.ojob_id (for example, ~/my_jobs/foo.o483 for a script foo submitted from ~/my_jobs).
-j join
Defines whether to merge standard error with standard output.
String consisting of n for not merged, eo for intermixed as standard error, or oe for intermixed as standard output. Default is n.
-N name
Defines a name for the job (up to 15 characters long, must begin with an alphabetic character and consist of printable, non-whitespace characters). If not specified, the job name is taken from the job script name or is STDIN if the script is read from standard input.

QSUB environment variable

The environment variable QSUB may be used to set default options for qsub. For example:
     setenv QSUB "-a 29.123 -l walltime=10:00:00"
Options specified at the command line or in job script directives take precedence over those specified via the QSUB environment variable

Cancelling a job

[First, add longjobs if you haven't already.]

There are two ways to cancel a job:

Note that the deleted job may still appear in the qstat list for a few seconds until the process is finished.

Modifying a job

[First, add longjobs if you haven't already.]

Once a job has been submitted, you may use:

See the man pages for details.


Longjobs Documentation: Overview | Job Scripts | Running Jobs (this page) | Checking Job Status | Quick Reference and FAQ


Last modified: Mon Jul 21 12:16:51 EDT 2003