4 Running Jobs
4.1 Interactive jobs
4.2 Batch jobs
4.3 Managing jobs
4.4 Partitions
4.5 Job priority
4.6 Condo priority
4.7 Job arrays
To run a batch job on the Oscar cluster, you first have to write a script that describes what resources you need and how your program will run. Example batch scripts are available in your home directory on Oscar, in the directory:
~/batch_scripts
To submit a batch job to the queue, use the sbatch command:
$ sbatch <jobscript>
This command will return a number, which is your job ID. You can view the output of your job in the file slurm-<jobid>.out in the directory where you ran the sbatch command. For instance, you can view the last 10 lines of output with:
$ tail -10 slurm-<jobid>.out
A batch script starts by specifing the bash shell as its interpreter, with the line:
#!/bin/bash
Next, a series of lines starting with #SBATCH define the resources you need, for example:
#SBATCH -n 4
#SBATCH -t 1:00:00
#SBATCH --mem=16G
The above lines request 4 cores (-n), an hour of runtime (-t), and a total of 16GB memory for all cores (--mem). By default, a batch job will reserve 1 core and a proportional amount of memory on a single node.
Alternatively, you could set the resources as command-line options to sbatch:
$ sbatch -n 4 -t 1:00:00 --mem=16G <jobscript>
The command-line options will override the resources specified in the script, so this is a handy way to reuse an existing batch script when you just want to change a few of the resource values.
Usefulsbatch options:
-J
|
Specify the job name that will be displayed when listing the job. |
-n
|
Number of cores. |
-t
|
Runtime, as HH:MM:SS. |
--mem=
|
Number of cores. |
-p
|
Request a specific partition. |
-C
|
Add a feature constraint (a tag that describes a type of node). You can view the available features on Oscar with the nodes command.
|
--mail-type= |
Specify the events that you should be notified of by email: BEGIN, END, FAIL, REQUEUE, and ALL. |
You can read the full list of options with:
$ man sbatch