Using Hawk’s AMD EPYC Nodes

What are the AMD EPYC nodes?

As part of an extension to Hawk called Phase 2, 4096 compute cores have been added in the form of 64 nodes each containing two 32-core AMD EPYC (Rome) processor chips and 256GB memory (4 GB per core).

Comparing these processors to the Intel Xeon (Skylake) processors used in Hawk’s Phase 1 is not trivial. The clock rate is a fraction higher (2.5GHz vs 2.4GHz) but both the chip architecture and processing pipeline are considerably different. A further significant difference is that Xeon Skylake supports the AVX512 instruction set, but EPYC Rome does not. Codes that make good use of AVX512 should run better on Xeon Skylake; codes that do not will generally see very similar performance on a core-to-core comparison basis, but benefit much more on a node-to-node comparison as Skylake offers 40 cores per node where Rome offers 64 cores per node.

Using the AMD EPYC nodes

Job submission to the AMD nodes is only permitted from a dedicated AMD login node called cla1.

To get to from outside, ssh to ‘hawkloginamd.cf.ac.uk’. If you’re already logged in to a Skylake login node – cl1 or cl2 – just ssh to cla1:

 [c.sismfg@cl1 ~]$ ssh cla1
 Last login: Wed Feb 19 01:50:17 2020 from cl2
 ======================== Supercomputing Wales - Hawk ==========================
    This system is for authorised users, if you do not have authorised access
        please disconnect immediately, and contact Technical Support.
  -----------------------------------------------------------------------------
             For user guides, documentation and technical support:
                    Web: http://portal.supercomputing.wales

 --------------------------- Message Of The Day ---------------------------
 ==============================================================================
 [c.sismfg@cla1 ~]$ 

Jobs should then be submitted to the ‘compute_amd’ queue/partition and updated to use the 64 cores per node where appropriate. In a job script:

#!/bin/bash --login
 SBATCH --nodes=2
 SBATCH --ntasks=128
 SBATCH --threads-per-core=1
 SBATCH --ntasks-per-node=64
 SBATCH -p compute_amd
 SBATCH -A scwXYZ
 SBATCH -J MyJob
 SBATCH --time=2:00:00
 SBATCH --exclusive

Codes compiled for AMD can be loaded in the same way as for Skylake. Here is a list of some popular codes that are currently available for AMD:

ProgramVersionLoad command (e.g.)
Amber (*)18module load amber18/18-cpu
DFTB+19.1module load dftbplus/19.1
DLPOLY4.07module load dlpoly/4.07
DLPOLY-Classic1.09module load dlpoly-classic/1.9
Gamess UK8.0module load gamess-uk/8.0-MPI
Games US20180930module load gamess/20180930
Gromacs2018.2, 2019.6, 2020.1module load gromacs/2018.2-single
Lammps22 Aug 2018, 12 Dec 2018, 5 June 2019module load lammps/20180822-cpu
NWChem6.8.1module load nwchem/6.8.1-cpu
Orca (*)4.2.1module load orca/4.2.1
Plumed2.4.3, 2.5.1, 2.5.2, 2.6.0module load plumed/2.4.3
Vasp (*)5.4.4module load vasp/5.4.4
(*) requires a software licence to use, please contact SCW support for more information.

If you need a code optimised for EPYC Rome, please get in contact with us.

Optimisation Options in the Compiler

If you are compiling your own code, we recommend doing this on the AMD login node itself. This enables easier use of the compiler optimisation flags in the same way as on Skylake, i.e:

With the Intel Compilers:

icc -xHOST -O3

This corresponds to ‘icc -march=core-avx2 -O3’ when run on the AMD login node

With the GNU Compilers:

gcc -march=native -O3

This correspond to ‘gcc -march=znver2 -O3’ when run on the AMD login node

Use of the Intel Math Kernel Library (MKL)

Many codes deployed on SCW systems make use of the Intel MKL to accelerate certain mathematical compute. MKL is able to be used on the AMD chips through the setting of two environment variables prior to task execution. One can either set them manually or load an environment module that will set them where appropriate. Centrally provided software will be modified to do this automatically where necessary for MKL use.

export MKL_DEBUG_CPU_TYPE=5
export MKL_CBWR=COMPATIBLE

Or, more recommended as it will pick up future changes if necessary:

module load mkl_env

OpenMP

We recommend specifying OpenMP options where code supports it. OMP_NUM_THREADS should be set as appropriate and OMP_PROC_BIND causes process-core binding to occur, which is typically of a performance benefit due to cache effects.

export OMP_NUM_THREADS=1
export OMP_PROC_BIND=true

Finding the Incompatibles

If you do submit a code that has been compiled for Intel Skylake but contains illegal instructions when used on the AMD Rome nodes, you will see an error in one of a few ways.

1. Meaningful message:

  $ /apps/materials/QuantumEspresso/6.1/el7/AVX2/intel-2018/intel-2018/bin/pw.x
 Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, F16C, AVX, FMA, BMI, LZCNT and AVX2 instructions. 

2. Immediate exit with instruction error:

$ /apps/chemistry/gromacs/2018.2-single/el7/AVX512/intel-2018/intel-2018/bin/mdrun_mpi<br> Illegal instruction (core dumped)

In any such case, the code needs recompiling as per above advice. If this occurs with a centrally provided software module, please contact the helpdesk.