Rapid Start for HPC Wales Users Migrating to SCW

For users of the HPC Wales services who are migrating to the SCW service, please be aware of these important notes in order to get up-and-running as quickly & effectively as possible.

 

Hawk is the new supercomputer at Cardiff and is for the use of Cardiff & Bangor led projects.

Sunbird is the new supercomputer at Swansea and is for the use of Swansea & Aberystwyth led projects.

 

User Credentials

  • These have changed to be linked to your institutional accounts.
  • The method to access your new credentials and get started with Hawk and Sunbird is covered on the Getting Access page – please follow it closely.
    • Contact Support if you have any issues.
  • Existing project memberships have also been migrated.

 

Data

  • Old home directories from the three HPCW sites have been cloned to a read-only area on the new systems for quick access.
    • These will remain for approximately 6 months only and then be removed.
    • These cloned home directories are available on the Hawk & Sunbird login nodes at:
      • Cardiff home directories
        • /migrate/hpcw-cf/HPCW_USERID
      • Swansea home directories
        • /migrate/hpcw-sw/HPCW_USERID
      • Bangor home directories
        • /migrate/hpcw-ba/HPCW_USERID
  • New home directory quotas for all are as follows:
    • 50GB per user and 100GB per project share on Hawk
    • 100GB per user and 200GB per project share on Sunbird
    • extensions are quickly available on justified request.
  • Project Shared Areas are a way for users of a particular project to share data. They are separate from user home directories.
    • If a Project Shared area does not exist for your project and one would be useful, please request from Support.
  • We can also create Collaboration Shared Areas that cross project and institutional boundaries as separate areas from home directories for data sharing among multiple users.
    • Please contact Support to discuss if this would be useful.
  • As with HPC Wales, no backup of data in home directories is taken, please keep necessary data backed up elsewhere.
  • The scratch file system will see automatic cleanup of data that is not accessed for 60 days.
    • Exceptions available by justified request to Support.
  • Access to HPC Wales login nodes is still available for a short time to allow you to access the HPCW scratch directories.

 

Jobs & Applications

  • Once successfully migrated, we request that you no longer submit jobs on the HPC Wales systems as they are increasingly contended whilst we complete migration to the new systems.
  • At this stage, a slowly-increasing number of the most utilised application codes have been re-built in an optimized form on the new systems.
    • These can be seen and accessed from the default output of module avail.
  • For other codes that have _not_ been re-built in optimal form at launch, the HPCW software stack is made available by first performing module load hpcw.
    • Once the hpcw module is loaded, the HPCW software stack is displayed by module avail and modules are loaded in the usual way using module load <modulename>.
    • The large majority of the old HPCW software stack will function on Hawk & Sunbird just fine.
      • A few things won’t. If you find one, please highlight to Support and a new solution will be prioritised.
  • We will monitor usage of old codes in order to prioritise those to be re-built in optimal form for the new system, and also those which will at some point be decommissioned due to zero usage.
  • Old job submission scripts should be modified to make the most of the new system’s greater processor per node count and also target the right partition (queue).
    • Where previous systems had 12 or 16 processor cores per node, Hawk & Sunbird have 40.
      • It is therefore more important to use the nodes efficiently as proportionally more resource can be easily wasted.
      • Be aware to update the Slurm directive #SBATCH –tasks-per-node in migrated submission scripts.
    • To better reflect the wide diversity of jobs and improve the user experience, partition naming and use has been re-worked. Please see the hardware descriptions for Hawk & Sunbird, output of sinfo and the below.
      • Standard parallel jobs to the default compute partition (Hawk & Sunbird).
      • GPU accelerated tasks to the gpu partition (Hawk & Sunbird).
      • High-memory tasks to the highmem partition (Hawk only).
      • Serial / high-throughput tasks to the htc partition (Hawk only).
      • Small and short development tasks (up to 40 processor cores, 30 minutes runtime) to the dev partition (Hawk only).
  • A newly refreshed training tarball, full of example jobs across a variety of applications is available. Please see here.