More On Slurm Jobs & migration
Job Runtime Environment in Slurm
When a job is submitted, Slurm will store all environment variables in-place at submission time and replicate that environment on the first allocated node where the batch script actually runs.
Additionally, at run time, Slurm will set a number of shell environment variables that relate to the job itself and can be used in the job run. The Slurm documentation’s manpage onĀ sbatch provides an exhaustive guide, but we highlight some useful ones here.
#SBATCH Directives
In line with most batch schedulers, Slurm uses directives in submission scripts to specify job requirements and parameters for a job – the #SBATCH directives. Thus for an MPI task we might typically have:
#SBATCH -p compute #SBATCH -o runout.%J #SBATCH -e runerr.%J #SBATCH --job-name=mpijob #SBATCH -n 80 #SBATCH --tasks-per-node=40 #SBATCH --exclusive #SBATCH -t 0-12:00 #SBATCH --mem-per-cpu=4000
Walking through these:
Slurm #SBATCH directive | Description |
---|---|
#SBATCH --partition=compute or #SBATCH -p compute | In Slurm, jobs are submitted to 'partitions'. Despite the naming difference, the concept is the same. |
#SBATCH --output=runout.%J or #SBATCH -o runout.%J | File for STDOUT from the job run to be stored in. The '%J' to Slurm is replaced with the job number. |
#SBATCH --error=runerr.%J or #SBATCH -e runerr.%J | File for STDERR from the job run to be stored in. The '%J' to Slurm is replaced with the job number. |
#SBATCH --job-name=mpijob | Job name, useful for monitoring and setting up inter-job dependency. |
#SBATCH --ntasks=128 or #SBATCH -n 128 | Number of processors required for job. |
#SBATCH --tasks-per-node=16 | The number of processors (tasks) to run per node. |
#SBATCH --exclusive | Exclusive job allocation - i.e. no other users on allocated nodes. |
#SBATCH --time=0-12:00 or #SBATCH -t 0-12:00 | Maximum runtime of job. Note that it is beneficial to specify this and not leave it at the maximum as it will improve the chances of the scheduler 'back-filling' the job and running it earlier. |
#SBATCH --mem-per-cpu=4000 | Memory requirements of job. Slurm's memory-based scheduling is more powerful than many schedulers. |
Environment Variables
Once an allocation has been scheduled and a job script is started (on the first node of the allocation), Slurm sets a number of shell environment variables that can be used in the script at runtime. Below is a summary of some of the most useful:
Slurm | Description |
---|---|
$SLURM_JOBID | Job ID. |
$SLURM_JOB_NODELIST | Nodes allocated to the job i.e. with at least once task on. |
$SLURM_ARRAY_TASK_ID | If an array job, then the task index. |
$SLURM_JOB_NAME | Job name. |
$SLURM_JOB_PARTITION | Partition that the job was submitted to. |
$SLURM_JOB_NUM_NODES | Number of nodes allocated to this job. |
$SLURM_NTASKS | Number of tasks (processes) allocated to this job. |
$SLURM_NTASKS_PER_NODE (Only set if the --ntasks-per-node option is specified) | Number of tasks (processes) per node. |
$SLURM_SUBMIT_DIR | Directory in which job was submitted. |
$SLURM_SUBMIT_HOST | Host on which job was submitted. |
$SLURM_PROC_ID | The process (task) ID within the job. This will start from zero and go up to $SLURM_NTASKS-1. |
System Queues & Partitions in Slurm
Please use the sinfo command to see the names of the partitions (queues) to use in your job scripts. If not specified, the default partition will be used for job submissions. sinfo -s will give a more succinct partition list. Please see the Hawk and Sunbird pages for a list of partitions and their descriptions on each system.