Siku

From ACENET
Jump to: navigation, search


Siku is a high-performance computer cluster installed in 2019 at Memorial University in St. John's, Newfoundland.

It is funded in large part by the Atlantic Canada Opportunities Agency (ACOA) with the intention of generating regional economic benefits through industry engagement, while recognizing the important work that ACENET does for academic research in the region.

Siku is currently (November 2019) in beta-test phase and only accessible to selected clients.

Known issues

  • Siku is not yet accessible from the internet outside the MUN network.
    • The current workaround is to connect to Placentia first, and thence to siku.ace-net.ca. See Access workarounds for options and details.
    • Access from Siku to the Internet works, so direct file transfers can be done by running the transfer program (e.g. sftp, scp, rsync, wget, ...) on Siku.
  • No Globus endpoint is yet available.
  • The PGI compilers are not yet working (access to license server is till pending). Intel and GCC compilers, however, should work.
  • Multi-Processing using libverbs is not working as expected. MPI implementations, however, should work.

Similarities and differences with national GP clusters

Siku is designed from experience gained with the Compute Canada systems, Béluga, Cedar, Graham, and Niagara. Users familiar with those systems will find much familiar here.

Job scheduling

Tasks taking more than 10 CPU minutes or 4 GB of RAM should not be run directly on a login node, but submitted to the job scheduler, Slurm. Scheduling policies on Siku resemble those on Compute Canada systems but are considerably simplified:

  • Maximum run-time limit is 24 hours.
  • There is only one partition.
  • Paid clients have higher priority than academic (free) clients, but with usage limited by contract. See Tracking paid accounts.
  • GPUs should be requested like so:
#SBATCH --gres=gpu:v100:2
  • Your account name is not necessarily the same as your account name on Compute Canada clusters. If you see the message "Invalid account or account/partition combination specified", try submitting without the --account or -A parameter.

Storage quotas and filesystem characteristics

Filesystem Default Quota Backed up? Purged? Mounted on Compute Nodes?
Home Space 52 GB and 512K files per user Yes No Yes
Scratch Space 20 TB and 1M files per user No Not yet implemented Yes
Project Space 1 TB and 512K files per group Yes No Yes

Node Characteristics

Nodes Cores Available memory CPU Storage GPU
40 40 186G or 191000M 2 x Intel Xeon Gold 6248 @ 2.5GHz ~720G -
6 40 376G or 385024M 2 x Intel Xeon Gold 6248 @ 2.5GHz ~720G -
2 40 186G or 191000M 2 x Intel Xeon Gold 6148 @ 2.4GHz ~720G 2 x NVIDIA Tesla V100 (32GB memory)
  • "Available memory" is the amount of memory configured for use by Slurm jobs. Actual memory is slightly larger to allow for operating system overhead.
  • "Storage" is node-local storage. Access it via the $SLURM_TMPDIR environment variable.
  • Hyperthreading is turned off.

Operating system: CentOS Linux release 7