User Manual

1 Getting Started

2 Managing Files

3 Software

4 Running Jobs
4.1 Interactive jobs
4.2 Batch jobs
4.3 Managing jobs
4.4 Partitions
4.5 Job priority
4.6 Condo priority
4.7 Job arrays

5 XSEDE

6 GPU Computing

4.2 Batch jobs

To run a batch job on the Oscar cluster, you first have to write a script that describes what resources you need and how your program will run. Example batch scripts are available in your home directory on Oscar, in the directory:

~/batch_scripts

To submit a batch job to the queue, use the sbatch command:

$ sbatch <jobscript>

This command will return a number, which is your job ID. You can view the output of your job in the file slurm-<jobid>.out in the directory where you ran the sbatch command. For instance, you can view the last 10 lines of output with:

$ tail -10 slurm-<jobid>.out

4.2.1 Batch scripts

A batch script starts by specifing the bash shell as its interpreter, with the line:

#!/bin/bash

Next, a series of lines starting with #SBATCH define the resources you need, for example:

#SBATCH -n 4
#SBATCH -t 1:00:00
#SBATCH --mem=16G

The above lines request 4 cores (-n), an hour of runtime (-t), and a total of 16GB memory for all cores (--mem). By default, a batch job will reserve 1 core and a proportional amount of memory on a single node.

Alternatively, you could set the resources as command-line options to sbatch:

$ sbatch -n 4 -t 1:00:00 --mem=16G <jobscript>

The command-line options will override the resources specified in the script, so this is a handy way to reuse an existing batch script when you just want to change a few of the resource values.

Useful sbatch options:
-J Specify the job name that will be displayed when listing the job.
-n Number of cores.
-t Runtime, as HH:MM:SS.
--mem= Number of cores.
-p Request a specific partition.
-C Add a feature constraint (a tag that describes a type of node). You can view the available features on Oscar with the nodes command.
--mail-type= Specify the events that you should be notified of by email: BEGIN, END, FAIL, REQUEUE, and ALL.

You can read the full list of options with:

$ man sbatch