{"id":256,"date":"2016-01-29T09:50:44","date_gmt":"2016-01-29T09:50:44","guid":{"rendered":"https:\/\/portal.supercomputing.wales\/?page_id=256"},"modified":"2020-04-28T20:21:31","modified_gmt":"2020-04-28T19:21:31","slug":"batch-submission-of-serial-jobs-for-parallel-execution","status":"publish","type":"page","link":"https:\/\/portal.supercomputing.wales\/index.php\/index\/slurm\/interactive-use-job-arrays\/batch-submission-of-serial-jobs-for-parallel-execution\/","title":{"rendered":"Batch Submission of Serial Jobs for Parallel Execution"},"content":{"rendered":"<p>Large numbers of serial jobs can become incredibly inefficient and troublesome on mixed-mode HPC systems. The SCW Slurm deployment limits the number of running &amp; submitted jobs any single user may have.<\/p>\n<p>However, there are ways to submit multiple jobs:<\/p>\n<ol>\n<li>Background jobs using shell process control and wait for processes to finish on a single node.<\/li>\n<li>Combining <a href=\"http:\/\/www.gnu.org\/software\/parallel\/\" target=\"_blank\" rel=\"noopener noreferrer\"><em>GNU Parallel<\/em> <\/a>and Slurm&#8217;s<em> srun <\/em>command allows us to handle such situations in a more controlled and efficient way than in the past. Using this method, a single job is submitted that requests an allocation of X cores, and the GNU <em>parallel<\/em> command enables us to utilise all of those cores by launching the serial tasks using the <em>srun<\/em> command.<\/li>\n<li>Using Job Arrays for very similar tasks. See <a href=\"https:\/\/portal.supercomputing.wales\/index.php\/index\/slurm\/interactive-use-job-arrays\/job-arrays\/\">here<\/a>.<\/li>\n<\/ol>\n<p><strong>Shell process control<\/strong><\/p>\n<p>Here is an example of submitting 2 processes on a single node:<br \/>\n<pre class=\"preserve-code-formatting\">#!\/bin\/bash\n#SBATCH --ntasks=32\n#SBATCH --ntasks-per-node=32\n#SBATCH -o example.log.%J\n#SBATCH -e example.err.%J\n#SBATCH -J example\n\n#set the partition, use compute if running in Swansea\n#SBATCH -p htc\n#SBATCH --time=1:00:00\n#SBATCH --exclusive\n\ntime my_exec &amp;lt; input1.csv &amp;gt; input1.log.$SLURM_JOBID &amp;amp;\ntime my_exec &amp;lt; input2.csv &amp;gt; input2.log.$SLURM_JOBID &amp;amp;\n# important to make sure the batch job won&#039;t exit before all the\n# simultaneous runs are completed.\nwait\n<\/pre><br \/>\nThe <em>my_exec<\/em> commands in this case would be multithreaded to use 32 cores between them.<\/p>\n<p><strong>GNU Parallel and Slurm&#8217;s srun command<\/strong><\/p>\n<p>Here is the example, commented, job submission file <em>serial_batch.sh<\/em>:<br \/>\n<pre class=\"preserve-code-formatting\">#!\/bin\/bash --login\n#SBATCH -n 40&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #Number of processors in our pool\n#SBATCH -o output.%J&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#Job output\n#SBATCH -t 12:00:00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #Max wall time for entire job\n\n#change the partition to compute if running in Swansea\n#SBATCH -p htc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#Use the High Throughput partition which is intended for serial jobs\n\nmodule purge\nmodule load hpcw\nmodule load parallel\n\n# Define srun arguments:\nsrun=&quot;srun -n1 -N1 --exclusive&quot;\n# --exclusive&nbsp;&nbsp;&nbsp;&nbsp; ensures srun uses distinct CPUs for each job step\n# -N1 -n1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; allocates a single core to each task\n\n# Define parallel arguments:\nparallel=&quot;parallel -N 1 --delay .2 -j $SLURM_NTASKS --joblog parallel_joblog --resume&quot;\n# -N 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;is number of arguments to pass to each job\n# --delay .2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;prevents overloading the controlling node on short jobs\n# -j $SLURM_NTASKS&nbsp;&nbsp;is the number of concurrent tasks parallel runs, so number of CPUs allocated\n# --joblog name&nbsp;&nbsp;&nbsp;&nbsp; parallel&#039;s log file of tasks it has run\n# --resume&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;parallel can use a joblog and this to continue an interrupted run (job resubmitted)\n\n# Run the tasks:\n$parallel &quot;$srun .\/runtask arg1:{1}&quot; ::: {1..64}\n# in this case, we are running a script named runtask, and passing it a single argument\n# {1} is the first argument\n# parallel uses ::: to separate options. Here {1..64} is a shell expansion defining the values for\n#&nbsp;&nbsp;&nbsp;&nbsp;the first argument, but could be any shell command\n#\n# so parallel will run the runtask script for the numbers 1 through 64, with a max of 40 running \n#&nbsp;&nbsp;&nbsp;&nbsp;at any one time\n#\n# as an example, the first job will be run like this:\n#&nbsp;&nbsp;&nbsp;&nbsp;srun -N1 -n1 --exclusive .\/runtask arg1:1<\/pre><br \/>\nSo, in the above we are requesting an allocation from Slurm of 12 processors, but we have 32 tasks to run. Parallel will execute the jobs as soon as space on our allocation becomes available (i.e. tasks finish). As this does not have the overhead of setting up a new full job, it is more efficient.<\/p>\n<p>A simple &#8216;runtask&#8217; script that demonstrates the principal by logging helpful text is included here, courtesy of the <a href=\"https:\/\/rcc.uchicago.edu\/docs\/running-jobs\/srun-parallel\/index.html#parallel-batch\" target=\"_blank\" rel=\"noopener noreferrer\">University of Chicago Research Computing Centre<\/a>:<br \/>\n<pre class=\"preserve-code-formatting\">#!\/bin\/sh\n\n# this script echoes some useful output so we can see what parallel\n# and srun are doing\n\nsleepsecs=$[($RANDOM % 10) + 10]s\n\n# $1 is arg1:{1} from parallel.\n# $PARALLEL_SEQ is a special variable from parallel. It the actual sequence\n# number of the job regardless of the arguments given\n# We output the sleep time, hostname, and date for more info&amp;gt;\necho task $1 seq:$PARALLEL_SEQ sleep:$sleepsecs host:$(hostname) date:$(date)\n\n# sleep a random amount of time\nsleep $sleepsecs<\/pre><br \/>\nSo, one would simply submit the job script as per normal:<br \/>\n<pre class=\"preserve-code-formatting\">$ sbatch serial_batch.sh<\/pre><br \/>\nAnd we then see output in the Slurm job output file like this:<br \/>\n<pre class=\"preserve-code-formatting\">...\ntask arg1:34 seq:34 sleep:11s host:ccs0132 date:Fri 29 Jun 09:37:26 BST 2018\nsrun: Exclusive so allocate job details\ntask arg1:38 seq:38 sleep:12s host:ccs0132 date:Fri 29 Jun 09:37:27 BST 2018\nsrun: Exclusive so allocate job details\ntask arg1:45 seq:45 sleep:11s host:ccs0132 date:Fri 29 Jun 09:37:29 BST 2018\nsrun: Exclusive so allocate job details\ntask arg1:41 seq:41 sleep:12s host:ccs0132 date:Fri 29 Jun 09:37:28 BST 2018\nsrun: Exclusive so allocate job details\ntask arg1:47 seq:47 sleep:11s host:ccs0132 date:Fri 29 Jun 09:37:29 BST 2018\nsrun: Exclusive so allocate job details\n...<\/pre><br \/>\nAlso the parallel job log records completed tasks:<br \/>\n<pre class=\"preserve-code-formatting\">Seq Host Starttime JobRuntime Send Receive Exitval Signal Command\n8 : 1530261102.040 11.088 0 75 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:8\n9 : 1530261102.248 11.088 0 75 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:9\n5 : 1530261101.385 12.088 0 75 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:5\n12 : 1530261102.897 12.105 0 77 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:12\n1 : 1530261100.475 17.082 0 75 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:1\n2 : 1530261100.695 17.091 0 75 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:2\n3 : 1530261100.926 17.088 0 75 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:3\n10 : 1530261102.450 16.088 0 77 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:10\n6 : 1530261101.589 17.082 0 75 0 0 srun -n1 -N1 --exclusive .\/runtask arg1:6\n...<\/pre><br \/>\nSo, by tweaking a few simple commands in the job script and having a &#8216;runtask&#8217; script that does something useful, we can accomplish a neat, efficient serial batch system.<\/p>\n<h3>Multi-Threaded Tasks<\/h3>\n<p>It is trivially possible to use the above technique and scripts, with a very small modification, to run multi-threaded or otherwise intra-node parallel tasks.&nbsp; We achieve this by changing the SBATCH directive specifying processor requirement (#SBATCH -n &#8230;) in the submission script to the following form:<br \/>\n<pre class=\"preserve-code-formatting\">#SBATCH --nodes=3\n#SBATCH --ntasks-per-node=3\n#SBATCH --cpus-per-task=4<\/pre><br \/>\nIn this case, parallel will launch across 3 nodes, and will run 3 tasks of 4 processor each per node.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Large numbers of serial jobs can become incredibly inefficient and troublesome on mixed-mode HPC systems. The SCW Slurm deployment limits the number of running &amp; submitted jobs any single user may have. However, there are ways to submit multiple jobs: Background jobs using shell process control and wait for processes to finish on a single [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":42,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-nosidebar.php","meta":{"_lmt_disableupdate":"no","_lmt_disable":"","footnotes":""},"class_list":["post-256","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/256","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/comments?post=256"}],"version-history":[{"count":17,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/256\/revisions"}],"predecessor-version":[{"id":1131,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/256\/revisions\/1131"}],"up":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/42"}],"wp:attachment":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/media?parent=256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}