Cluster Status

From ACENET
Jump to: navigation, search
Ambox notice.png This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.

Cluster Status Planned Outage Notes
Placentia Offline 2nd Power outage August 17-20 Scheduled Outage in effect.
Glooscap Online None planned
Arbutus See status.computecanada.ca (west.cloud.computecanada.ca)
Cedar See status.computecanada.ca
Graham See status.computecanada.ca
Niagara See status.computecanada.ca

Services

Service Status Planned Outage Notes
WebMO Online No outages
Account creation Manual No outages Write support
PGI and Intel licenses Online No outages
Videoconferencing (IOCOM Server) Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs, or service is not working
Online some users can login and/or there are problems affecting your work

Outage schedule

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

  • Placentia will be offline from 12h00 NDT on Friday 17 August 2018 to sometime on Monday 20 August, while Memorial University carries out electrical power work. This is a rescheduling of work originally announced for July 6-9, then August 3-6.


Placentia

  • Placentia is offline from 12h00 NDT on Friday 17 August 2018 to sometime on Monday 20 August, while Memorial University carries out more electrical power work.
12:03, August 17, 2018 (NDT)
  • In the early hours of Thursday August 16th there was a power event in Placentia's data centre, due to thunderstorms above St. John's. This caused all compute nodes to crash and killed running jobs. The nodes are back up now. Please check on your jobs.
09:34, August 16, 2018 (ADT)
  • Placentia is back online from the scheduled power outage (July 27th - 30th) due to Memorial University carrying out electrical power work. Note there will be a second outage from August 17-20th and no job with a runtime extending into that outage will be able to start.
13:43, July 30, 2018 (ADT)
  • CFI funding for Placentia will end on 2019 March 31. Allocation holders and existing users are welcome to continue computing there. New account holders are urged to start using new national systems instead of Placentia.
09:34, May 1, 2018 (ADT)

Glooscap

  • Some repairs on the filesystem are still pending, but Glooscap is now operating at near-full capacity. 1678 cores are in service today. Jobs of up to 336 hours (two weeks) are accepted, although 48 hours is still the maximum run time on nodes served by nfs5.
09:20, June 19, 2018 (ADT)
  • Attempts to repair nfs5 at Glooscap continue. 480 cores that do not depend on nfs5 have been returned to service, but all are assigned to short.q and will only accept jobs of up to 48 hours.
12:17, May 29, 2018 (ADT)
  • Power work on Glooscap was completed on 2018 May 3, but the infrastructure component 'nfs5' did not power-up properly. Compute nodes cl059-cl096 have been returned to service, comprising 528 cores. Only jobs of up to 48 hours run-time (short.q) are being accepted currently, while work on 'nfs5' continues.
10:29, May 8, 2018 (ADT)
  • CFI funding for Glooscap will end on 2019 March 31. Allocation holders and existing users are welcome to continue computing there. New account holders are urged to start using new national systems instead of Glooscap.
09:34, May 1, 2018 (ADT)

Fundy

  • Fundy has been retired from service.
10:01, April 5, 2018 (ADT)

Mahone

  • Mahone has been retired from service.
10:00, April 5, 2018 (ADT)
User Support
Resources