Cluster Status

This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Cluster	Status	Planned Outage	Notes
Siku	Online	No outages
Placentia	Online	Restricted since March 2019
Nefelibata	Online	-	No scheduler
Argo	Online	-	still shaking out bugs

For national clusters (Arbutus, Beluga, Cedar, Graham, Narval, Niagara) see status.alliancecan.ca

Services

Service	Status	Planned Outage	Notes
Globus	Offline	Until June 2024	Awaiting upgrade to Globus v5
Account creation	Manual	No outages	Write support
PGI and Intel licenses	Online	No outages

Legend:

Online	cluster is up and running
Offline	all users cannot login or submit jobs, or service is not working
Online	some users can login and/or there are problems affecting your work

Outage schedule

Jobs will not be scheduled with a run time (--time=) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

There are currently no planned outages.

Siku

2024

There was an unplanned power outage between 16h15 and 16h30 UTC (13h45 and 14h00 NDT), during which many but not all jobs were lost. Normal operation was resumed about 18h00 UTC (15:30 NDT).

15:38, March 26, 2024 (NDT)

Slurm job scheduler was off-line Monday March 25, 2024, beginning at 11h00 UTC (08h30 NDT) until 12h45 UTC (10h15 NDT) for a second urgent maintenance on the machine running the Slurm controller. This was now completed and normal operation has resumed.

10:23, March 25, 2024 (NDT)

Siku scheduler is available again.
The emergency maintenance was completed and normal operation has resumed at 11h50 NDT (14h20 UTC).

12:00, March 19, 2024 (NDT)

Slurm job scheduler will be off-line Tuesday March 19, 2024, beginning at 13h30 UTC for emergency maintenance on the machine running the Slurm controller. We anticipate an outage of approximately two hours. New jobs are being accepted but none will be launched until after the outage. Access to the cluster will still be permitted and storage will remain accessible.

For older outages see: Previous outages

Our newest cluster, Siku, is now in production. Access is currently restricted to invited users only. Access request form.

13:00, December 10, 2019 (NST)

Placentia

Placentia was retired from general service as of 2019 Mar 31. A reduced number of compute nodes remain in service, with access restricted to MUN users who have made suitable arrangements. Contact support@ace-net.ca if you believe you should have access.

Nefelibata

Nefelibata has had its shared storage replaced, but Slurm scheduler service has not yet been restored. This is waiting for personnel to become available from other work.

2024-03-18

Nefelibata will be unavailable on 2023 September 5, Tuesday, for operating system and driver updates. We expect return-to-service on Wednesday Sept 6.

Update at 2023-09-07 12:00 NDT: Outage complete, Nefelibata back in service.

Cluster Status

Clusters

Services

Outage schedule

Siku

2024

Placentia

Nefelibata

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Quick Links

User Support

Resources

Policies

Legacy Documentation

Tools