{"id":252,"date":"2016-01-29T09:49:15","date_gmt":"2016-01-29T09:49:15","guid":{"rendered":"https:\/\/portal.supercomputing.wales\/?page_id=252"},"modified":"2021-03-29T14:55:55","modified_gmt":"2021-03-29T13:55:55","slug":"job-arrays","status":"publish","type":"page","link":"https:\/\/portal.supercomputing.wales\/index.php\/index\/slurm\/interactive-use-job-arrays\/job-arrays\/","title":{"rendered":"Job Arrays"},"content":{"rendered":"<h4>Submission<\/h4>\n<p>Job arrays operate in Slurm much as they do in other batch systems. They enable a potentially huge number of similar jobs to be launched very quickly and simply, with the value of a runtime-assigned <em>array id<\/em> then being used to cause each particular job iteration to vary slightly what it does. Array jobs are declared using the <em>&#8211;array<\/em> argument to <em>sbatch<\/em>, which can (as with all arguments to <em>sbatch<\/em>) be inside as job script as an <em>#SBATCH<\/em> declaration or passed as a direct argument to <em>sbatch<\/em>. There are a number of ways to declare:<\/p>\n<pre>[test.user@cl1 hello_world]$ sbatch --array=0-64 sbatch_sub.sh<\/pre>\n<p>&#8230;declares an array with iteration indexes from 0 to 64.<\/p>\n<pre>[test.user@cl1 hello_world]$ sbatch --array=0,4,8,12 sbatch_sub.sh<\/pre>\n<p>&#8230;declares an array with iteration indexes specifically identified as 0, 4, 8 and 12.<\/p>\n<pre>[test.user@cl1 hello_world]$ sbatch --array=0-12:3 sbatch_sub.sh<\/pre>\n<p>&#8230;declares an array with iteration indexes from 0 to 12 with a stepping of 3, i.e. 0,3,6,9,12<\/p>\n<h4>Monitoring<\/h4>\n<p>When a job array is running, the output of <em>squeue<\/em> shows the parent task and the currently running iteration indexes:<\/p>\n<pre>[test.user@cl1 hello_world]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 143_[6-64] all hello test.use PD 0:00 4 (Resources) 143_4 all hello test.use R 0:00 4 ccs[005-008] 143_5 all hello test.use R 0:00 4 ccs[005-008] 143_0 all hello test.use R 0:03 4 ccs[001-004] 143_1 all hello test.use R 0:03 4 ccs[001-004]\n[test.user@cl1 hello_world]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 143_[15-64] all hello test.use PD 0:00 4 (Resources) 143_14 all hello test.use R 0:00 4 ccs[001-004] 143_10 all hello test.use R 0:02 4 ccs[005-008] 143_11 all hello test.use R 0:02 4 ccs[005-008] 143_1 all hello test.use R 0:07 4 ccs[001-004]<\/pre>\n<h4>IDs and Variables<\/h4>\n<p>Each iteration in an array assumes its own job ID in Slurm. However, Slurm creates two new environment variables that can be used in the script in addition to <em>SLURM_JOB_ID<\/em> storing the particular iteration&#8217;s job ID.<\/p>\n<p><em>SLURM_ARRAY_JOB_ID<\/em> stores the value of the parent job submission &#8211; i.e. the ID reported in the output from <em>sbatch<\/em> when submitted.<\/p>\n<p><em>SLURM_ARRAY_TASK_ID <\/em>stores the value of the array index.<\/p>\n<p>Additionally, when specifying a job&#8217;s STDOUT and STDERR files using the <em>-o<\/em> and <em>-e<\/em> directives to <em>sbatch<\/em>, the reference <em>%A <\/em>will take on the parent job ID and the reference <em>%a<\/em> will take on the iteration index. In summary:<\/p>\n\n<table id=\"tablepress-1\" class=\"tablepress tablepress-id-1\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">BASH Environment Variable<\/th><th class=\"column-2\">SBATCH Field Code<\/th><th class=\"column-3\">Description<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">$SLURM_JOB_ID<\/td><td class=\"column-2\">%J<\/td><td class=\"column-3\">Job identifier<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">$SLURM_ARRAY_JOB_ID<\/td><td class=\"column-2\">%A<\/td><td class=\"column-3\">Array parent job identifier<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">$SLURM_ARRAY_TASK_ID<\/td><td class=\"column-2\">%a<\/td><td class=\"column-3\">Array job iteration index<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">$SLURM_ARRAY_TASK_COUNT<\/td><td class=\"column-2\"><\/td><td class=\"column-3\">Number of indexes (tasks) in the job array<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">$SLURM_ARRAY_TASK_MAX<\/td><td class=\"column-2\"><\/td><td class=\"column-3\">Maximum array index<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">$SLURM_ARRAY_TASK_MIN<\/td><td class=\"column-2\"><\/td><td class=\"column-3\">Minimum array index<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-1 from cache -->\n<p>And so, with this example script:<\/p>\n<pre>#!\/bin\/bash #SBATCH -J arraytest\n#SBATCH --array=0-4\n#SBATCH -o output-%A_%a-%J.o\n#SBATCH -n 1 echo SLURM_JOB_ID $SLURM_JOB_ID\necho SLURM_ARRAY_JOB_ID $SLURM_ARRAY_JOB_ID\necho SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID<\/pre>\n<p>We can submit the script:<\/p>\n<pre>[test.user@cl1 sbatch]$ sbatch array.sh Submitted batch job 231<\/pre>\n<p>Resulting in the following output files:<\/p>\n<pre>output-231_0-232.o\noutput-231_1-233.o\noutput-231_2-234.o\noutput-231_3-235.o\noutput-231_4-231.o<\/pre>\n<p>Each iteration of which contained variables as follows:<\/p>\n<pre>output-231_0-232.o:\nSLURM_JOB_ID 232\nSLURM_ARRAY_JOB_ID 231\nSLURM_ARRAY_TASK_ID 0<\/pre>\n<pre>output-231_1-233.o:\nSLURM_JOB_ID 233\nSLURM_ARRAY_JOB_ID 231\nSLURM_ARRAY_TASK_ID 1<\/pre>\n<pre>output-231_2-234.o:\nSLURM_JOB_ID 234\nSLURM_ARRAY_JOB_ID 231\nSLURM_ARRAY_TASK_ID 2<\/pre>\n<pre>output-231_3-235.o:\nSLURM_JOB_ID 235\nSLURM_ARRAY_JOB_ID 231\nSLURM_ARRAY_TASK_ID 3<\/pre>\n<pre>output-231_4-231.o:\nSLURM_JOB_ID 231\nSLURM_ARRAY_JOB_ID 231\nSLURM_ARRAY_TASK_ID 4<\/pre>\n<p>More advanced job array information is available in the Slurm documentation <a href=\"https:\/\/slurm.schedmd.com\/job_array.html\">here<\/a>.<\/p> <p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Submission Job arrays operate in Slurm much as they do in other batch systems. They enable a potentially huge number of similar jobs to be launched very quickly and simply, with the value of a runtime-assigned array id then being used to cause each particular job iteration to vary slightly what it does. Array jobs [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":42,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-nosidebar.php","meta":{"_lmt_disableupdate":"no","_lmt_disable":"","footnotes":""},"class_list":["post-252","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/252","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/comments?post=252"}],"version-history":[{"count":3,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/252\/revisions"}],"predecessor-version":[{"id":1267,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/252\/revisions\/1267"}],"up":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/42"}],"wp:attachment":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/media?parent=252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}