Athena Longjobs -- Using Job Scripts | |
On this page: Basic Template | MATLAB | MSI/Cerius2 | Script Details | Testing Scripts |
######################################################################## # set job's working directory, if you want it to be other than ~/ # (this example uses the submit directory, as explained below) cd $PBS_O_WORKDIR # remove or clean up any output from a previous run rm -f results # add or attach any lockers you will need aside from your homedir add foo bar # run commands, invoke other scripts, etc. tweedledee tweedledum > results ########################################################################Basic notes on output, lockers, and cwd:
PBS_O_WORKDIR
to store the directory from which you
submit the job; the first command above is a convenient way to make
this be the job's working directory.
cd
or otherwise provide a
different path, the system will attempt to save any files created
by your job to your home directory. In either case, the system
does not
provide a fallback if the write fails (and you will receive an error
message only if the program attempting to create the file provides one).
-rn
option (in
qsub
, a script directive, or the QSUB environment
variable).
print -djpeg
and print
-dtiff
will not work (they require an X display); as a
workaround you can either print to a different format and convert
later (see imread/imwrite
), or have your M-file do
calculations only, and generate plots from the output data later.
quit
command at the end of your M-file.
Otherwise, MATLAB 5 on SGIs goes into an endless loop of "Missing
variable or function" complaints, apparently misinterpreting
end-of-file; the quit
is not necessary on Suns, but
has no effect (unless you run the M-file interactively, in which
case it will close Matlab). Alternatively, you can use a
conditional, e.g. if strcmp(computer, 'SGI') quit;
end
matlab -tty < infile.m > outfile
If you would rather have the results as standard output (which longjobs will save to jobname.ojobid by default):
matlab -tty < infile.m
############################################################# cd $PBS_O_WORKDIR rm -f results add matlab matlab -tty < myfile.m > results #############################################################
cerius2ms -n infile
or
cerius2ms -n -o outfile infile
if you would rather have standard output saved to outfile (in the first case, longjobs will save it to jobname.ojobid).
Some application modules (for example, those on the QUANTUM cards) are actually interfaces to programs independent of Cerius2, which the main program may launch in a detached state. For the process to complete in the batch environment, you may need to launch it explicitly after the Cerius2 command for successful completion in the batch environment; consult your instructor for help if the templates here don't cover your case.
############################################################# cd $PBS_O_WORKDIR #clear previous job status, if any rm -f ~/.MSIcastepstat add molsim cerius2ms -n setup.log castep-wrap #############################################################The
castep-wrap
command does the final processing on
the intermediate files which are generated by the CASTEP/CREATE_FILES
command in the cerius2 input file.
#PBS
) as explained in the Script Directives section below)
#
which are
not directives) and blank lines
/mit/username
path is
available; it does not attach or add other lockers
unless you have such commands in your dotfiles (see below), or include
them in your job script. If you have an
appropriate binary directory set up under your homedir (i.e.,
following the Athena locker organization conventions described
in the lockers
man page), it will be
added to the path automatically, assuming you are using the
default Athena shell; see the dotfiles
section below for more details.
If your job will write output files directly (rather than using stdout),
make sure that you save these files before the end of the job to an
attached locker with sufficient quota, and where you have write access.
The service will attempt to rerun your job in certain server-failure
cases; this may cause problems with files your job writes to directly
(as opposed to standard ouput and error streams, which the service will
simply recreate). If it is not feasible to construct your job to handle
the possible existence of data from a previous run, you should make the
job non-rerunnable by submitting with the -rn
option (in
qsub
, a script directive, or the QSUB environment
variable).
Note that you may use the local disk on the execution machine for intermediate processing (e.g., you may want to have your job generate large data files in /tmp, then compress them before saving the final output in a locker).
~/my_jobs/29.123
and your
script contains the lines:
cd $PBS_O_WORKDIR ./my_program > my_datait will look for an executable
~/my_jobs/29.123/my_program
(rather than the
default ~/my_program
) and
will create the output file ~/my_jobs/29.123/my_data
(rather than the default ~/my_data
).
foo
is submitted from
~/my_jobs
, it generates respective error and
output files:
~/my_jobs/foo.e483 ~/my_jobs/foo.o483
assuming you are using the Athena-default ~/.cshrc which use the system-wide startup files (for more information, see the Athena Dotfiles publication). If you are using a different shell (in particular bash), you may need to source its associated dotfiles in your script.
Note that the job may fail if you have included commands which attempt to set terminal characteristics in one of the sourced dotfiles. Any such command should be skipped by adding a test for the environment variable PBS_ENVIRONMENT, for example:
setenv PRINTER kiwi setenv LPROPT "-h -z" if ( ! $?PBS_ENVIRONMENT ) then terminal stuff here endif
if ( $?PBS_ENVIRONMENT ) then cd $PBS_O_WORKDIR other longjobs stuff here endif generic stuff here
488.longm.mit.edu
. Can be useful for
constructing unique filenames. Note that the full JobIB
includes a hostname (for the master server), which may be
omitted when used in most longjobs commands.
For more details, see the qsub
man page, DESCRIPTION
section.
qsub
options using this syntax:
#PBS -flag option
For example:
#PBS -a 29.123 #PBS -l walltime=10:00:00
will specify 29.123 as the account and set a time limit of 10 hours.
(For details on qsub
options, see the Running Jobs page.)
Notes:
testjob
program allows you
to do this without actually submitting it to the system (it simulates
the longjobs environment on your local workstation so you can test your
script without having to wait in a queue or use up quota).
To test your job script:
testjob
utility, or on the actual system)
There are two ways you can do this:
which
to
check that an executable is available on the path, and
echo
to write an output file, e.g.:
cd $PBS_O_WORKDIR rm -f my_data add 29.123 # crunch_data > my_data which crunch_data echo "test" > my_data
cd $PBS_O_WORKDIR add matlab # matlab < input.m matlab < tiny_input.m
testjob
athena% testjob script_nameexample:
athena% add longjobs athena% testjob foo [Ignoring -a script directive] Note that any locker dependencies will not be tested. You have attached the following lockers; please ensure that your script and/or dotfiles attach or add them as needed: matlab longjobs infoagents 29.123 Executing foo... Process exited with status 0 The standard output stream is in foo.out The standard error stream is in foo.err
testjob
(for more
information, see the testjob
man page)
qsub
options
from the command-line, script directives, and
QSUB
environment variable (in that order of
precedence, command-line being highest):
[-C directive_prefix] [-e error_file] [-j oe|eo|n] [-N jobname] [-o output_file] [-S shell] [-tn] [-v variable_list] [-V]If you specify other
qsub
options via
QSUB
or script
directives, it will simply ignore them (with a message as
in the above example).
testjob -n
script_name
to
suppress this output.
script_name.err
and
script_name.out
respectively, but without
a JobID number. If you run the program again in the same
directory with the same script file without deleting
the error/output files, the streams will be appended to them.
testjob
command.
PBS_O_WORKDIR
,
PBS_ENVIRONMENT
, and PBS_JOBNAME
will also be available via testjob
; others
such as PBS_JOBID
will not. For complete
details, see the testjob
man page.
If the
system is not busy, you may prefer to test your script by actually
submitting a short version of it. Note that if you submit with the
longjob -submit
command, you will be prompted for a time limit.
If you submit with qsub
you can specify a short limit with the
option:
-l walltime=hh:mm:ssFor example, for a 10-minute limit use the following:
athena% qsub -l walltime=00:10:00 ...(The system will interpret a single value as seconds and xx:yy as minutes:seconds; to avoid confusion it's best to specify all three time fields.)
Last modified: Thu, Jan 2 2003