Cluster Status

From ACENET
Jump to: navigation, search
Ambox notice.png This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.

Cluster Status Planned Outage Notes
Mahone Offline Service ended March 31 2018
Placentia Online No outages Service extended to 31 March 2019
Fundy Offline Service ended March 31 2018
Glooscap Online No outages Service extended to 31 March 2019
Arbutus See status.computecanada.ca (west.cloud.computecanada.ca)
Cedar See status.computecanada.ca
Graham See status.computecanada.ca
Niagara See status.computecanada.ca

Services

Service Status Planned Outage Notes
WebMO Online No outages
Account creation Manual No outages Write support
PGI and Intel licenses Online No outages
Videoconferencing (IOCOM Server) Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs, or service is not working
Online some users can login and/or there are problems affecting your work

Outage schedule

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

  • Fundy and Mahone have been retired from service.
  • No other outages are currently planned.


Mahone

  • Mahone has been retired from service.
10:00, April 5, 2018 (ADT)

Placentia

  • The compute nodes that had to be taken offline yesterday to facilitate repairs to one of the A/C units are now back online and ready to use.
09:59, February 20, 2018 (AST)
  • This morning at 08h00 NST am the compute nodes cl001 to cl108 have been shut down to facilitate repairs to one of the A/C units. If all goes as planned we will be able to bring them back up by Tuesday noon.
09:33, February 19, 2018 (AST)
  • Due to important repairs to the A/C unit in Placentia's data centre, compute nodes cl001 to cl108 will be unavailable between 07h00 NST Monday, February 19th until noon Tuesday, February 20th. This section includes the Gaussian.q nodes, which have more RAM and local scratch than most other nodes.
14:55, February 15, 2018 (AST)

Fundy

  • Fundy has been retired from service.
10:01, April 5, 2018 (ADT)

Glooscap

  • Electrical power work originally scheduled for the week of Feb 20-23 has been indefinitely postponed.
11:46, February 16, 2018 (AST)
  • Interactive response of the head node is very slow for many operations. Technical staff are investigating.
16:36, December 6, 2017 (AST)
  • The metadata server was hung all night March 7-8. It was rebooted this morning and Glooscap is operating once again, although technical staff continue to be cautious about its future behaviour. To try to alleviate the load on the metadata server we are withdrawing compute nodes cl002 through cl058 from service. This represents a reduction of 188 cores in the capacity of the cluster.
11:24, March 8, 2017 (AST)
User Support
Resources