Job Arrays

Submission

Job arrays operate in Slurm much as they do in other batch systems. They enable a potentially huge number of similar jobs to be launched very quickly and simply, with the value of a runtime-assigned array id then being used to cause each particular job iteration to vary slightly what it does. Array jobs are declared using the –array argument to sbatch, which can (as with all arguments to sbatch) be inside as job script as an #SBATCH declaration or passed as a direct argument to sbatch. There are a number of ways to declare:

[test.user@cl1 hello_world]$ sbatch --array=0-64 sbatch_sub.sh

…declares an array with iteration indexes from 0 to 64.

[test.user@cl1 hello_world]$ sbatch --array=0,4,8,12 sbatch_sub.sh

…declares an array with iteration indexes specifically identified as 0, 4, 8 and 12.

[test.user@cl1 hello_world]$ sbatch --array=0-12:3 sbatch_sub.sh

…declares an array with iteration indexes from 0 to 12 with a stepping of 3, i.e. 0,3,6,9,12

Monitoring

When a job array is running, the output of squeue shows the parent task and the currently running iteration indexes:

[test.user@cl1 hello_world]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
        143_[6-64]       all    hello test.use PD       0:00      4 (Resources)
             143_4       all    hello test.use  R       0:00      4 ccs[005-008]
             143_5       all    hello test.use  R       0:00      4 ccs[005-008]
             143_0       all    hello test.use  R       0:03      4 ccs[001-004]
             143_1       all    hello test.use  R       0:03      4 ccs[001-004]
[test.user@cl1 hello_world]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
       143_[15-64]       all    hello test.use PD       0:00      4 (Resources)
            143_14       all    hello test.use  R       0:00      4 ccs[001-004]
            143_10       all    hello test.use  R       0:02      4 ccs[005-008]
            143_11       all    hello test.use  R       0:02      4 ccs[005-008]
             143_1       all    hello test.use  R       0:07      4 ccs[001-004]

IDs and Variables

Each iteration in an array assumes its own job ID in Slurm. However, Slurm creates two new environment variables that can be used in the script in addition to SLURM_JOB_ID storing the particular iteration’s job ID.

SLURM_ARRAY_JOB_ID stores the value of the parent job submission – i.e. the ID reported in the output from sbatch when submitted.

SLURM_ARRAY_TASK_ID stores the value of the array index.

Additionally, when specifying a job’s STDOUT and STDERR files using the -o and -e directives to sbatch, the reference %A will take on the parent job ID and the reference %a will take on the iteration index. In summary:

BASH Environment VariableSBATCH Field CodeDescription
$SLURM_JOB_ID%JJob identifier
$SLURM_ARRAY_JOB_ID%AArray parent job identifier
$SLURM_ARRAY_TASK_ID%aArray job iteration index
$SLURM_ARRAY_TASK_COUNTNumber of indexes (tasks) in the job array
$SLURM_ARRAY_TASK_MAXMaximum array index
$SLURM_ARRAY_TASK_MINMinimum array index

And so, with this example script:

#!/bin/bash

#SBATCH -J arraytest
#SBATCH --array=0-4
#SBATCH -o output-%A_%a-%J.o
#SBATCH -n 1

echo SLURM_JOB_ID $SLURM_JOB_ID
echo SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID
echo SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID

We can submit the script:

[test.user@cl1 sbatch]$ sbatch array.sh 
Submitted batch job 231

Resulting in the following output files:

output-231_0-232.o
output-231_1-233.o
output-231_2-234.o
output-231_3-235.o
output-231_4-231.o

Each iteration of which contained variables as follows:

output-231_0-232.o:
SLURM_JOB_ID 232
SLURM_ARRAY_JOB_ID 231
SLURM_ARRAY_TASK_ID 0
output-231_1-233.o:
SLURM_JOB_ID 233
SLURM_ARRAY_JOB_ID 231
SLURM_ARRAY_TASK_ID 1
output-231_2-234.o:
SLURM_JOB_ID 234
SLURM_ARRAY_JOB_ID 231
SLURM_ARRAY_TASK_ID 2
output-231_3-235.o:
SLURM_JOB_ID 235
SLURM_ARRAY_JOB_ID 231
SLURM_ARRAY_TASK_ID 3
output-231_4-231.o:
SLURM_JOB_ID 231
SLURM_ARRAY_JOB_ID 231
SLURM_ARRAY_TASK_ID 4

More advanced job array information is available in the Slurm documentation here.