{"id":1057,"date":"2020-02-25T15:06:45","date_gmt":"2020-02-25T15:06:45","guid":{"rendered":"https:\/\/portal.supercomputing.wales\/?page_id=1057"},"modified":"2020-06-17T11:21:05","modified_gmt":"2020-06-17T10:21:05","slug":"using-hawks-amd-epyc-nodes","status":"publish","type":"page","link":"https:\/\/portal.supercomputing.wales\/index.php\/using-hawks-amd-epyc-nodes\/","title":{"rendered":"Using Hawk&#8217;s AMD EPYC Nodes"},"content":{"rendered":"\n<h4 class=\"wp-block-heading\">What are the AMD EPYC nodes?<\/h4> <p>As part of an extension to Hawk called Phase 2, 4096 compute cores have been added in the form of 64 nodes each containing two 32-core AMD EPYC (Rome) processor chips and 256GB memory (4 GB per core). <\/p> <p>Comparing these processors to the Intel Xeon (Skylake) processors used in Hawk&#8217;s Phase 1 is not trivial. The clock rate is a fraction higher (2.5GHz vs 2.4GHz) but both the chip architecture and processing pipeline are considerably different. A further significant difference is that Xeon Skylake supports the AVX512 instruction set, but EPYC Rome does not. Codes that make good use of AVX512 should run better on Xeon Skylake; codes that do not will generally see very similar performance on a core-to-core comparison basis, but benefit much more on a node-to-node comparison as Skylake offers 40 cores per node where Rome offers 64 cores per node.<\/p> <h4 class=\"wp-block-heading\">Using the AMD EPYC nodes<\/h4> <p>Job submission to the AMD nodes is only permitted from a dedicated AMD login node called cla1. <\/p> <p>To get to from outside, ssh to &#8216;hawkloginamd.cf.ac.uk&#8217;. If you&#8217;re already logged in to a Skylake login node &#8211; cl1 or cl2 &#8211; just ssh to cla1:<\/p> <div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<pre class=\"wp-block-preformatted\"> [c.sismfg@cl1 ~]$ ssh cla1 Last login: Wed Feb 19 01:50:17 2020 from cl2 ================== Supercomputing Wales - Hawk ==================== This system is for authorised users, if you do not have authorised access please disconnect immediately, and contact Technical Support. ----------------------------------------------------------------- For user guides, documentation and technical support: Web: http:\/\/portal.supercomputing.wales --------------------- Message Of The Day --------------------- ================================================================== [c.sismfg@cla1 ~]$ <\/pre>\n<\/div><\/div> <p>Jobs should then be submitted to the &#8216;compute_amd&#8217; queue\/partition and updated to use the 64 cores per node where appropriate. In a job script:<\/p> <pre class=\"wp-block-preformatted\">#!\/bin\/bash --login SBATCH --nodes=2 SBATCH --ntasks=128 SBATCH --threads-per-core=1 SBATCH --ntasks-per-node=64 SBATCH -p compute_amd SBATCH -A scwXYZ SBATCH -J MyJob SBATCH --time=2:00:00 SBATCH --exclusive<\/pre> <p>Codes compiled for AMD can be loaded in the same way as for Skylake. Here is a list of some popular codes that are currently available for AMD:<\/p> \n<table id=\"tablepress-26\" class=\"tablepress tablepress-id-26\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Program<\/th><th class=\"column-2\">Version<\/th><th class=\"column-3\">Load command (e.g.)<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">Amber (*)<\/td><td class=\"column-2\">18<\/td><td class=\"column-3\">module load amber18\/18-cpu<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">DFTB+<\/td><td class=\"column-2\">19.1<\/td><td class=\"column-3\">module load dftbplus\/19.1<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">DLPOLY<\/td><td class=\"column-2\">4.07<\/td><td class=\"column-3\">module load dlpoly\/4.07<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">DLPOLY-Classic<\/td><td class=\"column-2\">1.09<\/td><td class=\"column-3\">module load dlpoly-classic\/1.9<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">Gamess UK<\/td><td class=\"column-2\">8.0<\/td><td class=\"column-3\">module load gamess-uk\/8.0-MPI<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">Games US<\/td><td class=\"column-2\">20180930<\/td><td class=\"column-3\">module load gamess\/20180930<\/td>\n<\/tr>\n<tr class=\"row-8\">\n\t<td class=\"column-1\">Gromacs<\/td><td class=\"column-2\">2018.2, 2019.6, 2020.1<\/td><td class=\"column-3\">module load gromacs\/2018.2-single<\/td>\n<\/tr>\n<tr class=\"row-9\">\n\t<td class=\"column-1\">Lammps<\/td><td class=\"column-2\">22 Aug 2018, 12 Dec 2018, 5 June 2019<\/td><td class=\"column-3\">module load lammps\/20180822-cpu<\/td>\n<\/tr>\n<tr class=\"row-10\">\n\t<td class=\"column-1\">NWChem<\/td><td class=\"column-2\">6.8.1<\/td><td class=\"column-3\">module load nwchem\/6.8.1-cpu<\/td>\n<\/tr>\n<tr class=\"row-11\">\n\t<td class=\"column-1\">Orca (*)<\/td><td class=\"column-2\">4.2.1<\/td><td class=\"column-3\">module load orca\/4.2.1<\/td>\n<\/tr>\n<tr class=\"row-12\">\n\t<td class=\"column-1\">Plumed<\/td><td class=\"column-2\">2.4.3, 2.5.1, 2.5.2, 2.6.0<\/td><td class=\"column-3\">module load plumed\/2.4.3<\/td>\n<\/tr>\n<tr class=\"row-13\">\n\t<td class=\"column-1\">Vasp (*)<\/td><td class=\"column-2\">5.4.4<\/td><td class=\"column-3\">module load vasp\/5.4.4<\/td>\n<\/tr>\n<tr class=\"row-14\">\n\t<td class=\"column-1\"><\/td><td class=\"column-2\"><\/td><td class=\"column-3\"><\/td>\n<\/tr>\n<\/tbody>\n<tfoot>\n<tr class=\"row-15\">\n\t<th colspan=\"3\" class=\"column-1\">(*) requires a software licence to use, please contact SCW support for more information.<\/th>\n<\/tr>\n<\/tfoot>\n<\/table>\n<!-- #tablepress-26 from cache --> <p>If you need a code optimised for EPYC Rome, please get in contact with us.<\/p> <h4 class=\"wp-block-heading\">Optimisation Options in the Compiler<\/h4> <p>If you are compiling your own code, we recommend doing this on the AMD login node itself. This enables easier use of the compiler optimisation flags in the same way as on Skylake, i.e:<\/p> <p>With the Intel Compilers:<\/p> <pre class=\"wp-block-preformatted\">icc -xHOST -O3<\/pre> <blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>This corresponds to &#8216;icc -march=core-avx2 -O3&#8217; when run on the AMD login node<\/p><\/blockquote> <p>With the GNU Compilers:<\/p> <pre class=\"wp-block-preformatted\">gcc -march=native -O3<\/pre> <blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>This correspond to &#8216;gcc -march=znver2 -O3&#8217; when run on the AMD login node<\/p><\/blockquote> <p><\/p> <h4 class=\"wp-block-heading\">Use of the Intel Math Kernel Library (MKL)<\/h4> <p>Many codes deployed on SCW systems make use of the Intel MKL to accelerate certain mathematical compute. MKL is able to be used on the AMD chips through the setting of two environment variables prior to task execution. One can either set them manually or load an environment module that will set them where appropriate. Centrally provided software will be modified to do this automatically where necessary for MKL use. <\/p> <pre class=\"wp-block-preformatted\">export MKL_DEBUG_CPU_TYPE=5\nexport MKL_CBWR=COMPATIBLE<\/pre> <p>Or, more recommended as it will pick up future changes if necessary:<\/p> <pre class=\"wp-block-preformatted\">module load mkl_env<\/pre> <h4 class=\"wp-block-heading\">OpenMP<\/h4> <p>We recommend specifying OpenMP options where code supports it. OMP_NUM_THREADS should be set as appropriate and OMP_PROC_BIND causes process-core binding to occur, which is typically of a performance benefit due to cache effects.<\/p> <pre class=\"wp-block-preformatted\">export OMP_NUM_THREADS=1\nexport OMP_PROC_BIND=true<\/pre> <h4 class=\"wp-block-heading\">Finding the Incompatibles<\/h4> <p>If you do submit a code that has been compiled for Intel Skylake but contains illegal instructions when used on the AMD Rome nodes, you will see an error in one of a few ways.<\/p> <p>1. Meaningful message:<\/p> <pre class=\"wp-block-preformatted\"> $ \/apps\/materials\/QuantumEspresso\/6.1\/el7\/AVX2\/intel-2018\/intel-2018\/bin\/pw.x Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, F16C, AVX, FMA, BMI, LZCNT and AVX2 instructions. <\/pre> <p>2. Immediate exit with instruction error:<\/p> <pre class=\"wp-block-preformatted\">$ \/apps\/chemistry\/gromacs\/2018.2-single\/el7\/AVX512\/intel-2018\/intel-2018\/bin\/mdrun_mpi<br> Illegal instruction (core dumped)<\/pre> <p>In any such case, the code needs recompiling as per above advice. If this occurs with a centrally provided software module, please contact the helpdesk. <\/p> <p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What are the AMD EPYC nodes? As part of an extension to Hawk called Phase 2, 4096 compute cores have been added in the form of 64 nodes each containing two 32-core AMD EPYC (Rome) processor chips and 256GB memory (4 GB per core). Comparing these processors to the Intel Xeon (Skylake) processors used in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_lmt_disableupdate":"no","_lmt_disable":"","footnotes":""},"class_list":["post-1057","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/1057","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/comments?post=1057"}],"version-history":[{"count":22,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/1057\/revisions"}],"predecessor-version":[{"id":1175,"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/pages\/1057\/revisions\/1175"}],"wp:attachment":[{"href":"https:\/\/portal.supercomputing.wales\/index.php\/wp-json\/wp\/v2\/media?parent=1057"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}