User Guide

From ACENET
Jump to: navigation, search


Contents

ACENET Overview

ACENET is a consortium of Atlantic Canadian universities providing high-performance computing (HPC) resources, visualization and collaboration tools to participating research institutions. The ACENET hardware resources are located at several universities and include the following clusters:

  • Fundy (fundy.ace-net.ca) at University of New Brunswick
  • Mahone (mahone.ace-net.ca) at Saint Mary's University
  • Placentia (placentia.ace-net.ca) at Memorial University
  • Glooscap (glooscap.ace-net.ca) at Dalhousie University

Each cluster consists of a number of computers or "nodes", and each node has several CPUs with multiple cores. You can think of these cores as you would of single-core processors. These are primarily AMD Opteron-based machines running Red Hat Enterprise Linux (RHEL) 4. More details on the available hardware resources are available in Compute Resources.

Logging in

Your account grants you access to all of the ACENET clusters with the same username and password. When you log in to a particular cluster, you log in to the head node of this cluster, where you can edit, compile and test your code.

All the communication must be performed over the SSH network protocol using an SSH client. If you are using a Unix-like machine, you can ssh from the command prompt. On Windows systems, we suggest that you download the freely available client PuTTY or MobaXterm. The latter comes integrated with an X11 server and SFTP client.

For example, if you want to access the Placentia cluster using the command line from a Unix-like system, you would type

 $ ssh username@placentia.ace-net.ca

If you are running an X11 server on your machine and want to run interactive graphical X11 applications through the scheduler, then you need to enable X11 forwarding.

After logging in, you will get a command line prompt with the name of the cluster:

 user@mahone: ~ $ ssh user@fundy.ace-net.ca
 Password:
 Last login: Tue Dec 11 13:35:25 2007 from 140.184.24.8
 user@fundy: ~ $

The first time you connect to an ACENET machine via SSH, you will see a message like the following:

 The authenticity of host 'fundy.ace-net.ca (131.202.246.6)' can't be established.
 RSA key fingerprint is ee:28:46:48:78:68:e3:28:ad:45:28:fe:c2:14:0c:d8.
 Are you sure you want to continue connecting (yes/no)?

This is expected and you are safe to answer yes. You will then see a message

 Warning: Permanently added 'fundy.ace-net.ca' (RSA) to the list of known hosts.

After connecting to the machine, you will be prompted for your credentials. Once you have logged in you should change your initial password. You must choose a secure password! If you need advice on this, read this link. Then to change your password, type

 $ passwd

You will be prompted for your current password and a new password. Within minutes, your password change will be replicated across ACENET.

File Transfer

The basic way to transfer files to and from the cluster is to use a program that supports SFTP (SSH File Transfer Protocol). You can also use Rsync to synchronize/transfer files and directories while minimizing network traffic. Finally, there is Globus, which is the most efficient and easy to use, but requires some setup on your part and is not yet available everywhere.

SFTP is similar to regular FTP, however instead of sending your data in a readable plain-text format, SFTP encrypts the traffic. The commands for SFTP are the same as FTP. It is available from the command line on most Unix-like systems. Mac OS X users: for a graphical SFTP client, check out Cyberduck. Windows users can use MobaXterm and its integrated SFTP client, WinSCP, PSFTP, or FileZilla.

Command-line SFTP programs are similar to connecting via SSH. You can initiate a file transfer session with the following syntax

 $ sftp user@fundy.ace-net.ca

You will be prompted for your password and, upon successful authentication, will see an interactive SFTP prompt.

 $ sftp user@fundy.ace-net.ca
 Connecting to fundy.ace-net.ca...
 Password:
 sftp>

Type help at this prompt to see a list of available commands.

Glooscap data transfer node

Glooscap, with the largest storage capacity at ACENET, also has a dedicated data transfer node (DTN), dtn.glooscap.ace-net.ca. Globus Online connects to Glooscap only through the DTN. You are encouraged to use the DTN for significant transfers using SFTP or Rsync also.

Data Policies

Before transfering or creating data on ACENET systems, users are encouraged to familiarize themselves with our Data Policies.

Storage System

Main page: Storage System
Policy document: ACENET Data Policies

Each ACENET cluster has its own data storage facility or "disk array". This disk array contains

  • your home directory for that cluster, and
  • a large temporary workspace known as /nqs/username.

Files in /nqs are subject to automatic removal at periodic intervals. In order to place files in /nqs you must first read and understand the policies described under No-quota Scratch.

Files in these areas are visible from any node in that cluster. Files on one ACENET cluster are not visible on a different ACENET cluster. In order to get data from one cluster to another you must use one of the file transfer tools described above, or GridFTP.

For more details on disk and storage matters, including quotas and cleanup policies, please see Storage System.

Backup
ACENET does not provide backup services, but MUN users can take advantage of MUN's RDB system for data on Placentia. Our clusters provide reasonable (though not iron-clad) assurances against data loss from system failures, but not against accidental deletion. You are responsible for your own backup plan. We strongly encourage you to make off-site (or multi-site) copies of your critical data.
Archiving
Neither are ACENET filesystems intended to provide archival, long-term storage of inactive data.

Command Line Interface

The usual way to work with ACENET machines is via the Linux (or Unix) command line. If you have not used the command line interface before, you can learn the basics from any of a number of tutorials available on the Internet, such as Learn UNIX in 10 minutes, UNIX Tutorial for Beginners, or Learning the shell.

In order to modify a file you can transfer it from the cluster to your local machine, edit it there, and then transfer it back with the tools mentioned in File Transfer above, but most users quickly tire of this practice and learn to use an editor. A good choice for a beginner is

  • nano, a simple and nearly self-explanatory text editor.

Most veteran computational scientists use one or the other of

both of which have considerable power at the price of a significant early learning curve.

Unix Shell

If you are a beginner with the Linux command line, all you need to know about shells is that "the shell" is what we call the part of the operating system that interacts with you directly: You type a command, the shell parses it and then (probably) executes some other program to do what you want. Then the shell prompts you for another command.

If you're more experienced with the command line, you may need to know that shells come in different flavours.

  • If your ACENET account was created after Jan 30, 2009, then your default shell is bash.
  • bash is also the default shell for jobs.
  • Older accounts may still have tcsh as a login shell, unless you've asked to have it changed.

If you need to determine which shell you are using, type echo $0. If you want to change your login shell please contact support; the usual tool (chsh) will not work. See Job Control if you want to change the shell for running a job.

Each time you log in, the shell reads a startup file to set certain environment variables. This lets you easily use the job scheduler, compilers, and other software, and normally you don't need to touch the startup files. To access software that's not available by default you should use Modules. However, if you know what you are doing and wish to modify your startup files — or if you wish to add module commands to your startup files — please read the appropriate section that follows:

If you lose or mangle startup files (dot profiles) you can get fresh copies from /usr/local/lib/profiles. They are .bashrc, .profile, .cshrc. Please remove any other dot shell profiles, such as .bash_profile.

Bourne shells: bash or sh

The commands /bin/bash and /bin/sh reference the same executable, which behaves a bit differently depending on the name it's invoked with, in order to mimic the behavior of historical versions of sh.

There are two files for bash that you should have in you home directory: .bashrc and .profile. Please delete the file .bash_profile if you have one.

The content of the default user .bashrc file
  # Load default ACENET cluster profile
  if [ -f /usr/local/lib/bashrc ]; then
    source /usr/local/lib/bashrc
  fi
  #
  # Add your settings below
  #
The content of the default user .profile file
  # Do not delete or change this file
  [[ -f ~/.bashrc ]] && . ~/.bashrc

C shells: csh or tcsh

Please note that /bin/tcsh and /bin/csh reference the same shell. Tcsh is an enhanced but completely compatible version of the csh.

The content of the default user .cshrc file.
 # Load default ACENET cluster profile
 if ( -f /usr/local/lib/cshrc ) then
   source /usr/local/lib/cshrc
 endif
 #
 # Add your settings below
 #

Note: When adding settings to your .cshrc file, always make sure that there is an empty line in the end of the file.

Passwordless SSH

Running Programs

Main page: Job Control

Head node policies

The login node or "head node" on each cluster is intended for managing jobs and files, and not for computing. You cannot run on the head node any application that is computationally intensive, consumes a lot of memory (or other resources), and requires more than 15 minutes of CPU time. Note that this is not the same as 15 minutes of elapsed time. Login sessions, for example, may last arbitrarily long but consume little CPU.

Any longer or "heavier" job must be submitted to the compute hosts via the scheduler, which manages the available resources and assigns them to waiting jobs.

Production jobs

Please refer to the Job Control page for detailed information on how to manage jobs, including example job scripts. The key commands discussed there are qsub, qstat, qdel, qsum and showq.

Test jobs

A small fraction of our resources is reserved as highly available test resources. These test nodes must be requested through Grid Engine, and can be used for either regular job submission or for interactive sessions.

Available software

Main pages: Software, Modules

ACENET supports a selection of research software, both open-source and closed-source, free and commercial. For a current list of supported packages see Software. This page will lead you to individual pages describing each program, where to find it and (in many cases) how to use it.

You should determine whether the application you intend to use can be run in parallel. If so, then you must become familiar with the appropriate material on scheduling and execution of parallel programs under Job Control.

Code development

Programming in Fortran, C and C++ is supported with compilers and libraries that provide shared memory (OpenMP) and message-passing (MPI) parallel programming. If you plan on compiling code for use on ACENET please consult the appropriate web-pages first.

Other languages such as Java, Python, Perl, etc. are available as described on the Compilers page, but may not be supported as intensively as Fortran and C.

Utilities are also available for debugging and performance profiling of user-compiled code, including the TotalView parallel debugger.

Parallel programming

The principal benefit of high-performance computing arises from the ability to apply many CPUs to a single problem --- parallel computing. Three models of parallel computing are supported at ACENET:

  • "Embarassingly" or "perfectly" parallel problems can be treated with a collection of independent serial jobs. Such usage is supported via the job scheduler, particularly task arrays. Some sources refer to this as High-Throughput Computing as distinct from High-Performance Computing, but regardless of what it is called, it is supported at ACENET.
  • Message-passing parallel computing is supported with the MPI application programming interface (API). If you intend to either develop or run code which uses MPI, please read the MPI page.
  • Shared-memory parallel computing is supported with the OpenMP API. If you intend to either develop or run code built with it, please read the OpenMP page.

Other APIs for parallel computing (e.g. pthreads, PVM) may be available but are not supported.

Further reading

User Support
Resources