Cluster Status

From ACENET
Jump to: navigation, search
Ambox notice.png This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.

Cluster Status Planned Outage Notes
Mahone Online No outages
Placentia Online No outages
Fundy Online No outages
Glooscap Online No outages

Services

Service Status Planned Outage Notes
WebMO Online Date to come Outage delayed
Account creation Online No outages
PGI and Intel licenses Online No outages
Videoconferencing (IOCOM Server) Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs, or service is not working
Online some users can login and/or there are problems affecting your work

Outage schedule

Webmo at ACENET will be unavailable for use from 8am Nov. 29th to Noon Dec 1st while we upgrade to Webmo 17 which uses HTML5 instead of a Java application. Running jobs will still run and save results in your home directory but may look failed in webmo when it returns to service.

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

  • No cluster outages currently scheduled.

Mahone

  • The cluster may be unreachable due to an upstream provider networking issue.
08:11, December 8, 2016 (AST)
  • The default storage quota for users at Mahone has been reduced to 150 gigabytes.
14:34, March 8, 2016 (AST)

Placentia

  • This morning around 10 am NST (9:30 AST) there was a power outage at Memorial University's St. John’s campus that caused all compute nodes (and jobs running on them) to crash. Almost all nodes are now back online. Grid Engine has automatically restarted jobs that where running at the time of the crash (see also:FAQ#Rr:_Job_re-started).
14:25, January 12, 2017 (AST)
  • There was a planned internet outage for Memorial University's St. John’s campus on Thursday, Dec. 22nd 2016, between Midnight and 4 am NST to perform upgrades on core network infrastructure. This has caused active connections from outside of MUN's network to Placentia being terminated.
09:14, December 22, 2016 (AST)

Fundy

  • Fundy is back now.
10:59, August 8, 2016 (ADT)
  • There was a power outage on Saturday around 4pm due to a thunderstorm that caused the cluster to become unavailable.
08:18, August 8, 2016 (ADT)
  • The cluster is unreachable. We are investigating.
07:22, August 8, 2016 (ADT)
  • Fundy is back online after a brief UNB network interruption.
10:07, July 5, 2016 (ADT)

Glooscap

  • Users report intermittent slowness in interactive use of Glooscap. Symptoms include pauses of several seconds to over a minute in response to shell commands involving files or file metadata (such as "ls"). This is believed to be due to load on the file system, and therefore may also be affecting the run times of jobs doing extensive I/O. Vendor support for the file system is no longer available so deep troubleshooting is out of reach. We have no reports of loss of data or other actual failures. All we can recommend is great patience.
12:18, February 9, 2017 (AST)
User Support
Resources