Cardiff HTC Cluster Upgrade Outage: 17-19 May 2016
The upgraded software stack that has been deployed over the last 8 months to HPCW systems has proven reliable and capable. Stack related issues have reduced substantially and system availability/user experience has improved as a result.
It is therefore time to complete the transition to the new stack by upgrading the Cardiff HTC cluster to the CentOS6/xCAT/Slurm stack. The upgraded system will be the final HPC Wales system to be upgraded and life-extended, and is the second Westmere based system to migrate (after Bangor earlier this year).
This will remove RHEL/CentOS 5 generation operating systems and application builds from the HPCW service. Over the course of the last 8 months, we have tackled a number of issues that this is likely to cause, but there may be a few new user issues that arise – rest assured that we will work with you to resolve anything that arises as part of the removal of the final old stack based system. There is plenty of information available here on how to use & migrate to the new Slurm environment.
We anticipate the system will be unavailable for a duration of approximately 2-3 days, subject to satisfactory completion of performance tests. We will announce resumption of service on the Portal and the login Message of the Day as soon as possible.
Hello, I’m sorry for the disturbance, but I would like to ask you a question.
Are my data safe during the system upgrade?
I know that I’m asking you at the last moment but I am a little bit concerned.
Thank you very much.