User Guide
Contents |
ACEnet Overview
The Atlantic Computational Excellence Network (ACEnet) is a consortium of Atlantic Canadian universities providing high-performance computing (HPC) resources, visualization and collaboration tools to participating research institutions. The ACEnet hardware resources are located at several universities and include the following clusters:
- Brasdor (
brasdor.ace-net.ca) at St. Francis Xavier University - Fundy (
fundy.ace-net.ca) at University of New Brunswick - Mahone (
mahone.ace-net.ca) at Saint Mary's University - Placentia (
placentia.ace-net.ca) at Memorial University - Glooscap (
glooscap.ace-net.ca) at Dalhousie University - Courtenay (
courtenay.ace-net.ca) at University of New Brunswick (Saint John)
Each cluster consists of a number of computers or "nodes", and each node has several CPUs with multiple cores. You can think of these cores as you would of single-core processors. These are AMD Opteron-based machines running Red Hat Enterprise Linux (RHEL) 4. More details on the available hardware resources are available in Compute Resources.
Logging in
Your account grants you access to all of the ACEnet clusters with the same username and password. When you log in to a particular cluster, you log in to the head node of this cluster, where you can edit, compile and test your code.
All the communication must be performed over the SSH network protocol using an SSH client. If you are using a Unix-like machine, you can ssh from the command prompt. On Windows systems, we suggest that you download the freely available client PuTTY (if you are using version 0.61 and are getting the message Access denied between the user ID prompt and the password prompt then uncheck the Attempt GSSAPI authentication option in Connection » SSH » Auth » GSSAPI).
For example, if you want to access the Brasdor cluster using the command line from a Unix-like system, you would type
$ ssh username@brasdor.ace-net.ca
If you are running an X11 server on your machine and want to launch X11 applications, then you need to add the optional -X flag to enable the X11 forwarding:
$ ssh -X username@brasdor.ace-net.ca
After logging in, you will get a command line prompt with the name of the cluster:
user@mahone: ~ $ ssh user@fundy.ace-net.ca Password: Last login: Tue Dec 11 13:35:25 2007 from 140.184.24.8 user@fundy: ~ $
The first time you connect to an ACEnet machine via SSH, you will see a message like the following:
The authenticity of host 'fundy.ace-net.ca (131.202.246.6)' can't be established. RSA key fingerprint is ee:28:46:48:78:68:e3:28:ad:45:28:fe:c2:14:0c:d8. Are you sure you want to continue connecting (yes/no)?
This is expected and you are safe to answer yes. You will then see a message
Warning: Permanently added 'fundy.ace-net.ca' (RSA) to the list of known hosts.
After connecting to the machine, you will be prompted for your credentials. Once you have logged in you should change your initial password. You must choose a secure password! If you need advice on this, read this link. Then to change your password, type
$ passwd
You will be prompted for your current password and a new password. Within minutes, your password change will be replicated across ACEnet.
File Transfer
The best way to transfer files to and from the cluster is to use a program that supports SFTP (SSH File Transfer Protocol). SFTP is similar to regular FTP, however instead of sending your data in a readable plain-text format, SFTP encrypts the traffic. The commands for SFTP are the same as FTP. It is available from the command line on most Unix-like systems. Mac OS X users: for a graphical SFTP client, check out Cyberduck. Windows users can also use a program similar to PuTTY called PSFTP, FileZilla, or WinSCP for a graphical interface similar to Windows Explorer. Command-line SFTP programs and PSFTP are similar to connecting via SSH. You can initiate a file transfer session with the following syntax
$ sftp user@fundy.ace-net.ca
You will be prompted for your password and, upon successful authentication, will see an interactive SFTP prompt.
$ sftp user@fundy.ace-net.ca Connecting to fundy.ace-net.ca... Password: sftp>
Type help at this prompt to see a list of available commands.
Storage System
- Main page: Storage System
Each ACEnet cluster has its own data storage facility or "disk array". This disk array contains
- your home directory for that cluster,
/home/username, - a large-file permanent storage area known as
/globalscratch/username, and - a temporary workspace known as
/nqs/username.
Files in /nqs are subject to automatic removal at periodic intervals. In order to place files in /nqs you must first read and understand the policies described under No-quota Scratch.
Files in these areas are visible from any node in that cluster. Files on one ACEnet cluster are not visible on a different ACEnet cluster. In order to get data from one cluster to another you must use one of the file transfer tools described above.
For more details on disk and storage matters, including quotas and cleanup policies, please see Storage System.
Command Line Interface
The usual way to work with ACEnet machines is via the Linux (or Unix) command line. If you have not used the command line interface before, you can learn the basics from any of a number of tutorials available on the Internet, such as Learn UNIX in 10 minutes, UNIX Tutorial for Beginners or An introduction to the Linux command line.
Unix Shell
The recommended and default login and job schedule shell is bash. If your account was created before Jan 30, 2009 then your login shell was set to tcsh. You can also determine which shell your are using by typing echo $0. If you want to change your shell from tcsh to bash or vice versa then please contact support (you cannot change it yourself, because the command chsh is not LDAP-aware).
The global ACEnet shell profiles set necessary cluster-specific parameters and environment variables so that you can easily use the scheduler, compilers, MPI wrappers/libraries and other software. It's a centralized location used by ACEnet staff to manage and support the user environment.
In order for the global profiles to work, you need to load (i.e. source) them with help of the local dot profiles in your home directory. The proper dot profiles can be copied to your home directory from /usr/local/lib/profiles at any site, and are explained below.
Bourne shells: bash or sh
The commands /bin/bash and /bin/sh reference the same executable, which behaves a bit differently depending on the name it's invoked with, in order to mimic the behavior of historical versions of sh. Bash is a Unix shell written for the GNU Project.
There are two files for bash or sh that you should have in you home directory: .bashrc and .profile. Please delete the file .bash_profile if you have one.
- The content of the default user
.bashrcfile
# Load default ACEnet cluster profile if [ -f /usr/local/lib/bashrc ]; then source /usr/local/lib/bashrc fi # # Add your settings below #
- The content of the default user
.profilefile
# Do not delete or change this file [[ -f ~/.bashrc ]] && . ~/.bashrc
C shells: csh or tcsh
Please note that /bin/tcsh and /bin/csh reference the same shell. Tcsh is an enhanced but completely compatible version of the csh.
- The content of the default user
.cshrcfile.
# Load default ACEnet cluster profile if ( -f /usr/local/lib/cshrc ) then source /usr/local/lib/cshrc endif # # Add your settings below #
Note: When adding settings to your .cshrc file, always make sure that there is an empty line in the end of the file.
Passwordless SSH access
Note: Paswordless SSH access within the cluster is already configured in your account. Grid Engine relies on SSH to start job processes. If you want to configure passwordless SSH access yourself then you have to generate an SSH key with the following set of commands:
$ ssh-keygen -t rsa (hit enter three times or answer 'y') $ cd ~/.ssh $ cp id_rsa.pub authorized_keys $ chmod 600 authorized_keys
If you want to set passwordless SSH between different sites then you need to copy three files id_rsa, id_rsa.pub and authorized_keys to other clusters to the ~/.ssh directory. For example, to copy these files to the Fundy cluster, type the following:
$ cd ~/.ssh $ scp id_rsa id_rsa.pub authorized_keys fundy.ace-net.ca:.ssh/
Running Programs
- Main page: Job Control
The login node or "head node" on each cluster is intended for managing jobs and files, not for significant computing. You cannot run on the head node any application that is computationally intensive, consumes a lot of memory (or other resources), and requires more than 15 minutes of CPU time. Note that this is not the same as 15 minutes of elapsed time. Login sessions, for example, may last arbitrarily long but consume little CPU.
Any longer or "heavier" job must be submitted to the compute hosts via the scheduler, which manages the available resources and assigns them to waiting jobs. The scheduler used at ACEnet is the Sun Grid Engine (SGE).
A small fraction of our resources is reserved as highly available test resources. These test nodes must be requested through Grid Engine, and can be used for either regular job submission or for interactive sessions.
Please refer to the Job Control page for detailed information on how to manage jobs, including example job scripts. The key commands discussed there are qsub, qstat, qdel, qsum and showq.
Available software
- Main page: Software
ACEnet supports a selection of research software, both open-source and closed-source, free and commercial. For a current list of supported packages see Software. This page will lead you to individual pages describing each program, where to find it and (in many cases) how to use it.
You should determine whether the application you intend to use can be run in parallel. If so, then you must become familiar with the appropriate material on scheduling and execution of parallel programs under Job Control.
Code development
Programming in Fortran, C and C++ is supported with compilers and libraries that provide shared memory (OpenMP) and message-passing (MPI) parallel programming. If you plan on compiling code for use on ACEnet please consult the appropriate web-pages first.
Other languages such as Java, Python, Perl, etc. are available as described on the Compilers page, but may not be supported as intensively as Fortran and C.
Utilities are also available for debugging and performance profiling of user-compiled code, including the TotalView parallel debugger.
Parallel programming
The principal benefit of high-performance computing arises from the ability to apply many CPUs to a single problem --- parallel computing. Three models of parallel computing are supported at ACEnet:
- "Embarassingly" or "perfectly" parallel problems can be treated with a collection of independent serial jobs. Such usage is supported via the job scheduler, particularly task arrays. Some sources refer to this as High-Throughput Computing as distinct from High-Performance Computing, but regardless of what it is called, it is supported at ACEnet. The Brasdor cluster is the preferred platform for such work.
- Message-passing parallel computing is supported with the MPI application programming interface (API). If you intend to either develop or run code which uses MPI, please read the MPI page.
- Shared-memory parallel computing is supported with the OpenMP API. If you intend to either develop or run code built with it, please read the OpenMP page.
Other APIs for parallel computing (e.g. pthreads, PVM) may be available but are not supported.