SGI user manual

From ACENET
Jump to: navigation, search

Overview

The MUN SGI legacy cluster is a set of 3 separate NUMA-Link SMP machines with Intel Itanium2 processors. They are used primarily for quad-precision calculations and also house Gaussian, a popular computational chemistry package.

More information on the individual machines and other ACENET resources can be found here.

Logging In

When your account is created on the SGI cluster, you are given access to the 3 machines urdur, skuld and verdandi. Access to these machines is only available through the use of an SSH client. On a linux or Unix machine, the SSH client can be accessed from the command prompt. For Windows machines, it is recommended that you get PuTTY, a freely available SSH client.

For example, if you want to access urdur from the command line (linux/Uinx/OS X), then you would type

ssh -X username@urdur.physics.mun.ca

where the optional -X flag enables X11 forwarding.

The first time you connect to an SGI machine, you will see the message

The authenticity of host 'urdur.physics.mun.ca (134.153.141.40)' can't be established.
RSA key fingerprint is 71:9e:b5:d5:00:94:1e:5a:cd:3e:2f:c0:82:5d:b0:14.
Are you sure you want to continue connecting (yes/no)?

This is not unexpected so answer 'yes', at which point you will see

Warning: Permanently added 'urdur.physics.mun.ca,134.153.141.40' (RSA) to the list of known hosts.

After you see this message, you will be prompted for your password.

The first time you log in, you will need to change your default password. This can be done by typing

passwd

and following the prompts to give your current password, type a new password, and confirm it by typing it again.

NOTE After you log in for the first time, change your password! This need only be done from urdur as the password change is then propagated to the other 2 machines. Also, this machine is not currently authenticating via our ACENET LDAP system so the passwords you use for the SGI cluster must be managed separately from your ACENET password.

Special Network Access Considerations: If you are on the university network at MUN, you are able to ssh directly to all 3 machines. However, if you are trying to access the SGI cluster outside of the MUN network, you will need to go through the MUN firewall. Currently, only urdur is available via external access so to access skuld and verdandi, you need to first ssh to urdur. Alternatively, they can also be accessed via ssh from placentia.

File Transfer

to and from the SGI machines

The preferred method of file transfer to and from the SGI machines is to use an sftp client, which is similar to ftp for file transfer. The difference is that sftp encrypts the traffic that goes over the network instead of sending it as plain text.

Use of sftp is very similar to ftp. From a Unix/linux/OS X command line, you can type

sftp username@urdur.physics.mun.ca

and type your password when prompted. From the sftp> prompt, you can type help to see a list of commands you can use.

Windows users can use an SFTP or SCP program like PSFTP or WinSCP. Both are freely available.

between SGI machines

Within the SGI cluster, there are partitions on the disks that are NFS-mounted between the machines and are available for file transfer. The standard Unix/linux commands (i.e. cp, mv, rm, etc.) can then be used to move files around between the machines.

Storage

There are several disk spaces available to users on the SGI clusters. They are outlined in the following table:

Name Location Size Purpose
/data1 urdur, skuld and verdandi 68 GB local scratch (/data1/scratch) and home directories (/data1/hpc)
/mnt/<machine>/data1 urdur, skuld, verdandi N/A NFS mounts of the local /data1 directories
/scratch02 urdur 147 GB scratch space

A few notes regarding the storage:

  1. /data1/hpc/<username> is the home space on each machine and the data in those directories is separate on each machine. It is backed up.
  2. /mnt/<machine>/data1 is the NFS mount of the /data1 directory and includes both the local scratch and home spaces. Each machine has an entry for the other 2 machines' home spaces in this directory. So, for example, on urdur, there is a separate subdirectory for each machine, named skuld and verdandi. So to access the home space of skuld or verdandi, you look in /mnt/skuld/data1/hpc or /mnt/verdandi/data1/hpc, respectively.
  3. /disk3 is a repository for old files off of herzberg and new user accounts will have nothing here.
  4. /scratch02 is the scratch space for temporary files and is NFS-mounted. It is not backed up.
  5. Please note that there are no imposed quotas on the SGI machines and be mindful of your disk use. It is not backed up.
  6. Currently subdirectories for users are not created on /scratch02, so you will need to create your own by typing:
cd /scratch02
mkdir <username>

where you replace <username> with your username.

Compilers

Intel Compiler Suite

Description

Intel compiler suite version 8.0, located in /opt/intel_cc_80 and /opt/intel_fc_80 for C/C++ and fortran compilers, respectively

Commands

icc
icpc
ifort

Help for commands

man <command>
<command> -help

GNU Compiler Suite

Description

The GNU version is not consistent across the machines and there is more than one version. This table gives the versions and where they are available:

Version Location Machine
2.96 /usr/bin urdur, verdandi
3.0.4 /usr/bin urdur, skuld, verdandi
3.4.4 /usr/bin skuld

Commands

gcc
gcc3
g++
g++3
g77
g77-3
gij
gij3
gcj
gcj3

Help for commands

man <command>
<command> -help

MPI

MPI is a library for interprocess communication which is widely used in various codes. On the SGI machines the version is 1.2.7 and is located in /usr/local/mpich-1.2.7. It has been built with the Intel compiler suite. The commands for compiling/running in parallel with MPI are

mpicc
mpiCC
mpif77
mpif90
mpirun

Help with commands can be obtained using

man mpicc
man mpif77

Job Management

Jobs that run less than 15 minutes can be run interactively, but all other jobs must be submitted through the job scheduler. The job scheduler is the Sun Grid Engine (SGE) as at most other ACENET sites.

See Job Control for general documentation on using SGE at ACENET. There are a small number of differences between SGE on the SGI machines and elsewhere:

  • Each machine (urdur, skuld, verdandi) is a separate SGE domain, so a job will only run on the machine from which it was submitted.
  • The time limit, the maximum h_rt, is 720 hours.
  • Specification of h_vmem is not required.
  • Only the gaussian parallel environment is supported at the present time. It may be possible to run OpenMP or MPI jobs using -pe gaussian, but this option has not yet been tested. Other parallel environments can be added if there is demand, please contact support if this concerns you.


SGE replaced PBS Pro as the job scheduler on 10 August 2009. Notable changes from job submission practices with PBS Pro include:

  • No separate queues for parallel and serial jobs. A queue need not be specified explicitly; identify parallel jobs using -pe pe_name slot_count.
  • pbsnodes, tracejob, xpbs are no longer available, but qsum, showq, qhost are.
  • Scripts may be adapted by replacing #PBS prefixes with #$, but certain other flags and variables may also need to be changed. See Job Control for example SGE scripts.


You should ensure that /usr/local/lib/cshrc or /usr/local/lib/bashrc is referenced in your personal .cshrc or .bashrc as described in the User Guide.

Gaussian

Gaussian is a program for performing computational chemistry calculations. It is an extensive program with a variety of functions. For more details , please see http://www.gaussian.com.

Setting Up the Environment

Gaussian is not set up by default on the SGI machines. To access Gaussian, you will need to set some of your environment variables. There are scripts for csh and bash available in /usr/local/lib to automate this process. To set up your environment for Gaussian, add the following line

source /usr/local/lib/gaussenv.csh

to your .cshrc or

source /usr/local/lib/gaussenv.sh

to your .bashrc. Please note that this sets the environment variable GAUSS_SCRDIR to /scratch02/$USER, where "$USER" prints the environment variable $USER, your username. So please make sure that this is where you want those files to go and that you have created a directory within /scratch02 named with your username, otherwise you will need to reset it with

setenv GAUSS_SCRDIR /location/to/your/scratch/space

in csh or

export GAUSS_SCRDIR=/location/to/your/scratch/space

in bash. Make sure you change /location/to/your/scratch/space to an actual location!

License Agreement

Please note that our Gaussian license has restrictions and that by using Gaussian on the SGI machines you agree to its terms and conditions. Please see the Gaussian license on the Gaussian page for more details.

Running Gaussian

Test jobs with Gaussian can be run from the command line using

g03 < input > output

where "input" and "output" should be replaced with the name of your input and output files, respectively. Please note that interactive jobs are to be kept to a 15 minute maximum and if this time is not sufficient, they should be run through the job scheduler.

Production jobs need to be run through the SGE job scheduler. A sample SGE script for a Gaussian job is given below.

#$ -S /bin/csh
#$ -N gauss_job_name
#$ -cwd
#$ -l h_rt=0:30:0
g03 < testjob.com > testjob.log

This script will run the input file testjob.com and redirect the output to the file testjob.log.

Support

If you have any additional questions regarding the usage of the SGI machines, please contact us by following instructions on the Ask Support page.