Cluster Status

From ACENET
Jump to: navigation, search
Ambox notice.png This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.

Cluster Status Planned Outage Notes
Placentia Online End of service Mar 31
Glooscap Online End of service Mar 31
Arbutus See status.computecanada.ca (west.cloud.computecanada.ca)
Cedar See status.computecanada.ca
Graham See status.computecanada.ca
Niagara See status.computecanada.ca

Services

Service Status Planned Outage Notes
WebMO Online No outages
Account creation Manual No outages Write support
PGI and Intel licenses Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs, or service is not working
Online some users can login and/or there are problems affecting your work

Outage schedule

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

  • No planned outages
  • Glooscap and Placentia will no longer be operated by ACENET after 2019 March 31.


Placentia

  • The issues with the A/C unit, that occurred over the holidays have been fixed and it is working as intended. The compute nodes are available again. Placentia is still scheduled to be decommissioned at the end of March.
14:03, January 7, 2019 (AST)
  • Due to an issue with an A/C unit, some compute (up to cl107) has been taken down. Repairs have completed, but at ITS request, we are leaving the nodes down until Jan 3.
12:00, December 24, 2018 (NST)
  • The WebMO web-service has been migrated to a different physical machine on the afternoon of Thursday November 15 2018. Further reboots were necessary on the morning of Friday Nov. 16. Running jobs have not been effected.
08:59, November 16, 2018 (AST)

Glooscap

  • Partial outage for cooling system maintenance is ended. Compute nodes are back in service.
10:19, January 3, 2019 (AST)
  • Glooscap is back in service after a planned interruption this weekend (Sep 7-10). The metadata server component of the file system has been relocated and a full fschk has been run. We hope this will alleviate file system load problems.
10:54, September 10, 2018 (ADT)
  • Glooscap is back in service again. We believe we may have identified a source of unusual load that was causing the trouble. Please check on the status of any jobs you have in the system to ensure they are running properly.
16:25, August 23, 2018 (ADT)
  • Glooscap is not accepting logins again. The sysadmin is investigating the cause.
10:48, August 23, 2018 (ADT)

Fundy

  • Fundy has been retired from service.
10:01, April 5, 2018 (ADT)

Mahone

  • Mahone has been retired from service.
10:00, April 5, 2018 (ADT)
User Support
Resources