More On Slurm Jobs & migration

Job Runtime Environment in Slurm

When a job is submitted, Slurm will store all environment variables in-place at submission time and replicate that environment on the first allocated node where the batch script actually runs.

Additionally, at run time, Slurm will set a number of shell environment variables that relate to the job itself and can be used in the job run. The Slurm documentation’s manpage onĀ sbatch provides an exhaustive guide, but we highlight some useful ones here.

#SBATCH Directives

In line with most batch schedulers, Slurm uses directives in submission scripts to specify job requirements and parameters for a job – the #SBATCH directives. Thus for an MPI task we might typically have:

#SBATCH -p compute
#SBATCH -o runout.%J
#SBATCH -e runerr.%J
#SBATCH --job-name=mpijob
#SBATCH -n 80
#SBATCH --tasks-per-node=40
#SBATCH --exclusive
#SBATCH -t 0-12:00
#SBATCH --mem-per-cpu=4000

Walking through these:

Slurm #SBATCH directiveDescription
#SBATCH --partition=compute
#SBATCH -p compute
In Slurm, jobs are submitted to 'partitions'. Despite the naming difference, the concept is the same.
#SBATCH --output=runout.%J
#SBATCH -o runout.%J
File for STDOUT from the job run to be stored in. The '%J' to Slurm is replaced with the job number.
#SBATCH --error=runerr.%J
#SBATCH -e runerr.%J
File for STDERR from the job run to be stored in. The '%J' to Slurm is replaced with the job number.
#SBATCH --job-name=mpijobJob name, useful for monitoring and setting up inter-job dependency.
#SBATCH --ntasks=128
#SBATCH -n 128
Number of processors required for job.
#SBATCH --tasks-per-node=16The number of processors (tasks) to run per node.
#SBATCH --exclusiveExclusive job allocation - i.e. no other users on allocated nodes.
#SBATCH --time=0-12:00
#SBATCH -t 0-12:00
Maximum runtime of job. Note that it is beneficial to specify this and not leave it at the maximum as it will improve the chances of the scheduler 'back-filling' the job and running it earlier.
#SBATCH --mem-per-cpu=4000 Memory requirements of job. Slurm's memory-based scheduling is more powerful than many schedulers.


Environment Variables

Once an allocation has been scheduled and a job script is started (on the first node of the allocation), Slurm sets a number of shell environment variables that can be used in the script at runtime. Below is a summary of some of the most useful:

$SLURM_JOB_NODELISTNodes allocated to the job i.e. with at least once task on.
$SLURM_ARRAY_TASK_IDIf an array job, then the task index.
$SLURM_JOB_PARTITIONPartition that the job was submitted to.
$SLURM_JOB_NUM_NODESNumber of nodes allocated to this job.
$SLURM_NTASKSNumber of tasks (processes) allocated to this job.
(Only set if the --ntasks-per-node option is specified)
Number of tasks (processes) per node.
$SLURM_SUBMIT_DIRDirectory in which job was submitted.
$SLURM_SUBMIT_HOSTHost on which job was submitted.
$SLURM_PROC_IDThe process (task) ID within the job. This will start from zero and go up to $SLURM_NTASKS-1.


System Queues & Partitions in Slurm

Please use the sinfo command to see the names of the partitions (queues) to use in your job scripts. If not specified, the default partition will be used for job submissions. sinfo -s will give a more succinct partition list. Please see the Hawk and Sunbird pages for a list of partitions and their descriptions on each system.