Systems Upgrades & Outages – September 2015

Planning & preparation continues in the partner Higher Education Institutions for Stage 2 of HPC Wales. It is likely that the current transition phase will last in to the new calendar year, and we hope to make further announcements soon.

As previously advertised, we are making a number of Technical changes across the infrastructure to provide a better & more economical service. We have recently migrated to a new Portal platform and backend Support Desk ticketing system. Additionally, the Storage Refresh project is progressing through design finalisation.

Upgrades

Sooner than that, however, we will begin upgrading our individual HPC clusters. We will take each cluster upgrade in turn and ensure we have the necessary time between cluster upgrades to provide any and all necessary user support.

The process for each cluster consists of three primary aspects: (i) upgrading the operating system to the latest release of CentOS 6 (ii) migrating the cluster management to a new solution (iii) migrating the Job Scheduler to a different package known as Slurm. Unfortunately, the change is significant enough to require a few days outage to be enacted.

The change in Job Scheduler requires some change in use by system users. Most of this change is simple & quick, but a few users will have to make more significant changes to existing job scripts. We have prepared (& continue to extend) a section of the HPC Wales Portal covering the new system & the migration process. Please see this here. It is particularly important to be prepared for the change and we recommend you review in advance, and please do raise any questions so that we can allay any fears and extend the documentation.

We will begin the upgrade process with the Swansea system accessed through the login nodes sw-sb-log-001, sw-sb-log-002 and sw-sb-log-003. We will decommission the current software environment at the end of Friday 4th September 2015 and then have a systems outage that will run until Tuesday 8th September 2015 during which the upgrade will be performed. The intention is then to open the upgraded system on Thursday 10th, but please note this is subject to satisfactory testing and may vary in either direction. We will keep the login Message Of The Day updated to advise availability and also post to this Technical Blog.

Other HPC Wales cluster upgrades will be scheduled after this, a draft timetable follows, although this is subject to change:

  • Swansea (HPCW Phase 2 Sandy Bridge system) – September 2015
  • Cardiff Capacity (HPCW Phase 2 Sandy Bridge system) – October 2015
  • Bangor (HPCW Phase 1 Westmere system) – November 2015
  • Cardiff HTC (HPCW Phase 1 Westmere system) – December 2015

 

As ever, should you have any issues or questions, please do get in contact via submission to the Support Desk on the Portal or via email to support@hpcwales.co.uk