Using GPUs

GPU Usage

Slurm controls access to the GPUs on a node such that access is only granted when the resource is requested specifically.  Slurm models GPUs as a Generic Resource (GRES), which is requested at job submission time via the following additional directive:

#SBATCH --gres=gpu:2

This directive requires Slurm to allocate two GPUs per allocated node, to not use nodes without GPUs and to grant access.  SCW GPU nodes have two GPUs each.

Jobs must also be submitted to the desired GPU-enabled nodes queue:

#SBATCH -p gpu  # to request P100 GPUs
Or
#SBATCH -p gpu_v100 # to request V100 GPUs

It is then possible to use CUDA enabled applications or the CUDA toolkit modules themselves, modular environment examples being:

module load CUDA/9.1
module load gromacs/2018.2-single-gpu

CUDA Versions & Hardware Differences

Multiple versions of the CUDA libraries are installed on SCW systems, as can always be seen by:

[b.iss03c@cl1 ~]$ module avail CUDA

                ---- /apps/modules/libraries ----
CUDA/10.0  CUDA/10.1  CUDA/11.2  CUDA/11.3  CUDA/11.4  CUDA/8.0   CUDA/9.0   CUDA/9.1   CUDA/9.2

The GPU nodes always run the latest nvidia driver to support the latest installed version of CUDA and also offer backwards-compatabilitity with prior versions.

However, Pascal generation nVidia Tesla cards (present in Hawk) which are supported in all installed versions of CUDA, but Volta generation nVidia Tesla cards (present in Hawk and Sunbird) are only supported in CUDA 9+.  Codes that require CUDA 8, such as Amber 16, will not run on the Volta cards.

Some important differences between Pascal and Volta nVidia Tesla cards:

CharacteristicVoltaPascal
Tensor Cores6400
Cuda Cores51203584
Memory (GB)1616

Tensor cores are a new type of programmable core exclusive to GPUs based on the Volta architecture that run alongside standard CUDA cores. Tensor cores can accelerate mixed-precision matrix multiply and accumulate calculations in a single operation. This capability is specially significant for AI/DL/ML applications that rely on large matrix operations.

GPU Compute Modes

nVidia GPU cards can be operated in a number of Compute Modes.  In short the difference is whether multiple processes (and, theoretically, users) can access (share) a GPU or if a GPU is exclusively bound to a single process.  It is typically application-specific whether one or the other mode is needed, so please pay particular attention to example job scripts.  GPUs on SCW systems default to ‘shared’ mode.

Users are able to set the Compute Mode of GPUs allocated to their job through a pair of helper scripts that should be called in a job script in the following manner:

To set exclusive mode:

clush -w $SLURM_NODELIST "sudo /apps/slurm/gpuset_3_exclusive"

And to set shared mode (although this is the default at the start of any job):

clush -w $SLURM_NODELIST "sudo /apps/slurm/gpuset_0_shared"

To query the Compute Mode:

clush -w $SLURM_NODELIST "nvidia-smi -q|grep Compute"

In all cases above, sensible output will appear in the job output file.

Additionally, as Slurm models the GPUs as a consumable resource that must be requested in their own right (i.e. not implicitly with processor/node count), the default of the scheduler would be to not allocate the same GPU to multiple users or jobs – it would take some manual work to achieve this.