Custom Parallel Task Geometry

If you need to run an MPI (/OpenMP) task that requires a custom task geometry, perhaps because one task requires a larger amount of memory than the others, then this can easily be achieved with Slurm.

To do this, rather than specifying the number of processors required, one can specify the number of nodes (#SBATCH –nodes=X) plus the number of tasks per node (#SBATCH –tasks-per-node=X). The geometry can then be defined to the SLURM_TASKS_PER_NODE environment variable at runtime. As long as there are enough nodes to match the geometry, then Slurm will allocate parallel tasks to the MPI runtime to follow the geometry specification.

For example:

#!/bin/bash --login

#SBATCH --job-name geom_test
#SBATCH --nodes 4
#SBATCH --ntasks-per-node 16
#SBATCH --time 00:10:00
#SBATCH --output geom_test.%J.out

module purge
module load mpi/intel

export SLURM_TASKS_PER_NODE='1,16(x2),6'
mpirun ./mpi_test

In this case, we are requesting 4 nodes and 16 processors on those 4 nodes. Therefore, a maximum job size of 64 parallel tasks (to match the number of allocated processors) would apply. However, we override the SLURM_TASKS_PER_NODE environment variable to be just a single task on the first node, then fill the next two allocated nodes, and then place just six parallel tasks on the final allocated node. So, in this case, a total of 1+16+16+6=39 parallel processes. ‘mpirun’ will automatically pick this up from the Slurm allocated runtime environment.