Job Arrays
Submission
Job arrays operate in Slurm much as they do in other batch systems. They enable a potentially huge number of similar jobs to be launched very quickly and simply, with the value of a runtime-assigned array id then being used to cause each particular job iteration to vary slightly what it does. Array jobs are declared using the –array argument to sbatch, which can (as with all arguments to sbatch) be inside as job script as an #SBATCH declaration or passed as a direct argument to sbatch. There are a number of ways to declare:
[test.user@cl1 hello_world]$ sbatch --array=0-64 sbatch_sub.sh
…declares an array with iteration indexes from 0 to 64.
[test.user@cl1 hello_world]$ sbatch --array=0,4,8,12 sbatch_sub.sh
…declares an array with iteration indexes specifically identified as 0, 4, 8 and 12.
[test.user@cl1 hello_world]$ sbatch --array=0-12:3 sbatch_sub.sh
…declares an array with iteration indexes from 0 to 12 with a stepping of 3, i.e. 0,3,6,9,12
Monitoring
When a job array is running, the output of squeue shows the parent task and the currently running iteration indexes:
[test.user@cl1 hello_world]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 143_[6-64] all hello test.use PD 0:00 4 (Resources) 143_4 all hello test.use R 0:00 4 ccs[005-008] 143_5 all hello test.use R 0:00 4 ccs[005-008] 143_0 all hello test.use R 0:03 4 ccs[001-004] 143_1 all hello test.use R 0:03 4 ccs[001-004] [test.user@cl1 hello_world]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 143_[15-64] all hello test.use PD 0:00 4 (Resources) 143_14 all hello test.use R 0:00 4 ccs[001-004] 143_10 all hello test.use R 0:02 4 ccs[005-008] 143_11 all hello test.use R 0:02 4 ccs[005-008] 143_1 all hello test.use R 0:07 4 ccs[001-004]
IDs and Variables
Each iteration in an array assumes its own job ID in Slurm. However, Slurm creates two new environment variables that can be used in the script in addition to SLURM_JOB_ID storing the particular iteration’s job ID.
SLURM_ARRAY_JOB_ID stores the value of the parent job submission – i.e. the ID reported in the output from sbatch when submitted.
SLURM_ARRAY_TASK_ID stores the value of the array index.
Additionally, when specifying a job’s STDOUT and STDERR files using the -o and -e directives to sbatch, the reference %A will take on the parent job ID and the reference %a will take on the iteration index. In summary:
BASH Environment Variable | SBATCH Field Code | Description |
---|---|---|
$SLURM_JOB_ID | %J | Job identifier |
$SLURM_ARRAY_JOB_ID | %A | Array parent job identifier |
$SLURM_ARRAY_TASK_ID | %a | Array job iteration index |
$SLURM_ARRAY_TASK_COUNT | Number of indexes (tasks) in the job array | |
$SLURM_ARRAY_TASK_MAX | Maximum array index | |
$SLURM_ARRAY_TASK_MIN | Minimum array index |
And so, with this example script:
#!/bin/bash #SBATCH -J arraytest #SBATCH --array=0-4 #SBATCH -o output-%A_%a-%J.o #SBATCH -n 1 echo SLURM_JOB_ID $SLURM_JOB_ID echo SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID echo SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID
We can submit the script:
[test.user@cl1 sbatch]$ sbatch array.sh Submitted batch job 231
Resulting in the following output files:
output-231_0-232.o output-231_1-233.o output-231_2-234.o output-231_3-235.o output-231_4-231.o
Each iteration of which contained variables as follows:
output-231_0-232.o: SLURM_JOB_ID 232 SLURM_ARRAY_JOB_ID 231 SLURM_ARRAY_TASK_ID 0
output-231_1-233.o: SLURM_JOB_ID 233 SLURM_ARRAY_JOB_ID 231 SLURM_ARRAY_TASK_ID 1
output-231_2-234.o: SLURM_JOB_ID 234 SLURM_ARRAY_JOB_ID 231 SLURM_ARRAY_TASK_ID 2
output-231_3-235.o: SLURM_JOB_ID 235 SLURM_ARRAY_JOB_ID 231 SLURM_ARRAY_TASK_ID 3
output-231_4-231.o: SLURM_JOB_ID 231 SLURM_ARRAY_JOB_ID 231 SLURM_ARRAY_TASK_ID 4
More advanced job array information is available in the Slurm documentation here.