Rapid Start for Raven Users Migrating to SCW Hawk

For users of the ARCCA Raven services who are migrating to the SCW service, please be aware of these important notes in order to get up-and-running as quickly & effectively as possible.

User Credentials

  • These have changed to be linked to your institutional account.
  • The method to access your new credentials and get started with Hawk is covered on the Getting Access page – please follow it closely.
    • Contact Support if you have any issues.

 

Projects

  • Existing projects and their memberships from Raven have been migrated to Hawk.
  • You can apply for new projects through the MySCW system, see Getting Access.
  • It will soon become necessary to specify which of your project memberships to account a compute job against.

 

Data

  • Home directories from Raven have been cloned to a read-only area on Hawk for quick access.
    • These will remain for approximately 6 months only and then be removed.
    • They are being re-synchronised from Raven every day.
    • These cloned home directories are available on the Hawk login nodes at:/migrate/raven/RAVEN_USERID
  • New home directory quotas for all are 50GB per user and 100GB per project share – extensions are quickly available on justified request.
  • Project Shared Areas are a way for users of a particular project to share data. They are separate from user home directories.
    • If a Project Shared area does not exist for your project and one would be useful, please request from Support.
  • We can also create Collaboration Shared Areas that cross project and institutional boundaries as separate areas from home directories for data sharing among multiple users.
    • Please contact Support to discuss if this would be useful.
  • The scratch file system will see automatic cleanup of data that is not accessed for 60 days.
    • Exceptions available by justified request to Support.
  • Raven is still an active system.
  • Cardiff User home directories on Hawk are currently being cloned to a backup system, but there are no historical archives kept thereof. This backup is done on a best efforts basis.

 

Gluster Storage

  • Gluster storage will be available as it was on Raven via mountpoints on the login nodes.
  • Access to Gluster storage is via membership of relevant groups on Hawk. These groups are mapped from the Cardiff university groups used for Gluster storage.
  • To simplify administration and keep things better organised all mounts are under /gluster.
  • N.B. /gluster/neurocluster contains its own group of mountpoints.
  • Users are expected to copy data to scratch for processing – Gluster moutpoints will not be accessible to cluster nodes due to network routing.

 

Jobs & Applications

      • Once migrated and you have validated the operation & correctness of software you use and all is doing as you would expect, we request that you no longer submit jobs on Raven.
      • At this stage, a slowly-increasing number of the most utilised application codes have been re-built in an optimized form on the new systems.
        • These can be seen and accessed from the default output of module avail.
      • For other codes that have _not_ been re-built in optimal form at launch, the Raven software stack is made available by first performing module load raven.
        • Once the raven module is loaded, the Raven software stack is displayed by module avail and modules are loaded in the usual way using module load <modulename>.
        • The large majority of the Raven software stack will function on Hawk just fine.
          • A few things won’t. If you find one, please highlight to Support and a new solution will be prioritised.
      • We will monitor usage of old codes in order to prioritise those to be re-built in optimal form for the new system, and also those which will at some point be decommissioned due to zero usage.
      • Job submission scripts from Raven will need to be modified to use the Slurm scheduler deployed on Hawk.
        • The PBS Pro to Slurm Migration Reference highlights the normal changes needed as part of this.
          • For more complex job scripts, please see the other documentation on this site regarding interactive use, array jobs, etc.
        • To make the most of the new system’s greater processor per node count and also target the right partition (queue).
          • Where previous systems had 12 or 16 processor cores per node, Hawk has 40.
            • It is therefore more important to use Hawk’s nodes efficiently as proportionally more resource can be easily wasted.
            • Be aware to correctly populate the Slurm directive #SBATCH –tasks-per-node in migrated submission scripts.
          • To better reflect the wide diversity of jobs and improve the user experience, partition naming and use has been re-worked. Please see the output of sinfo and the below.
            • Standard parallel jobs to the default compute partition.
            • High-memory tasks to the highmem partition.
            • GPU accelerated tasks to the gpu partition.
            • Serial / high-throughput tasks to the htc partition.
            • Small and short development tasks (up to 40 processor cores, 30 minutes runtime) to the dev partition.
      • A newly refreshed training tarball, full of example jobs across a variety of applications is available. Please see here.
      • Please don’t hesitate to contact us with any questions, issues or comments.