| Legacy documentation
This page describes a service provided by a retired ACENET system. Most ACENET services are currently provided by national systems, for which please visit https://docs.computecanada.ca.
- 1 Description
- 2 Getting Access / License Agreement
- 3 Available Hardware
- 4 Setting Up Your Environment
- 5 Running Gaussian
- 6 Gaussian Error Messages
- 7 Visualizing structures and results
- 8 Accessing Gaussian-03
Gaussian is an electronic structure program that is commonly used by chemists, chemical engineers, biochemists, physicists and others for research in established and emerging areas of molecular science. More information about Gaussian can be found on the Gaussian homepage.
Getting Access / License Agreement
Please follow the instructions at https://docs.computecanada.ca/wiki/Gaussian#License_limitations to request permission to access Gaussian on Compute Canada clusters.
Gaussian is no longer available at any ACENET resources.
Gaussian is also available on the new Compute Canada Graham cluster. Please see Gaussian on the Compute Canada technical documentation site for more guidance. You will be asked there to re-register your agreement with the Gaussian license terms, even if you are already registered for Placentia.
Setting Up Your Environment
Before running Gaussian you need to choose which version of Gaussian you want to run, and have certain environment variables set properly. You can use environment modules for this purpose.
$ module load gaussian
will select the current default version of Gaussian for you to use.
$ module avail gaussian
will show you all the versions that are currently available.
module load command can also be added to a job script, or to your
~/.cshrc file. In the same place you may also wish to set the environment variable
GAUSS_SCRDIR to tell Gaussian where to write intermediate files. See "Scratch directory" below for more on this.
Gaussian can be run both interactively on placentia.ace-net.ca and through the Sun Grid Engine (SGE). Users are asked to run all jobs through SGE unless they are testing a configuration and wanting to make sure there are no errors prior to submitting a job to the scheduler.
You can run small, short Gaussian calculations at the terminal prompt. Remember to load a Gaussian module first. Gaussian can be executed by typing
$ g09 < input > output
Keep an eye on any interactive job. No interactive job should be left to run more than 15 minutes. Jobs longer than this are considered Production Jobs and must be submitted via SGE.
The following minimal script could be used to run a serial (single-CPU) Gaussian job.
#$ -cwd #$ -j y #$ -l h_rt=0:30:0 module load gaussian g09 < test001.com > test001.log
Save the above file as
Gaussian_submit.sh and then submit the job by typing:
$ qsub Gaussian_submit.sh
This will read input from the file
test001.com and will put the results in the file
test001.log. Scratch files will be created (and destroyed) in whatever directory you were working in, and the job will be run with one process (one CPU). The job is limited to 30 minutes of run time by the line containing
Gaussian writes intermediate files during a run, which can be very large --- Sizes in the hundreds of gigabytes are often seen at ACENET. These files look like this:
$ ls Gau* Gau-24596.inp Gau-24598.chk Gau-24598.d2e Gau-24598.int Gau-24598.rwf Gau-24598.scr
If a Gaussian job finishes normally these files are automatically deleted at the end of the run. However, if Gaussian doesn't have enough disk quota for these files the job will crash, probably (but not always) near the beginning. Furthermore, a typical Gaussian job does an enormous number of small input/output transactions to those files, which can have a negative effect on filesystem performance. Therefore the location of these intermediate files is very important.
In the interactive and script examples shown above, intermediate files are written to the current working directory, i.e. the directory where the job was submitted. By setting an environment variable
GAUSS_SCRDIR, you can direct Gaussian to put these intermediate files in other places. There are effectively three options at ACENET:
- /home filesystem
- Local scratch
- /home filesystem
- This default location has a small quota. This limits the size of jobs you can run, but also limits the amount of disk Gaussian can consume by accident. It is the default for the beginning user, and requires no knowledge of environment variables or changes to the example job script shown above. This is suitable if you only expect to run one or two Gaussian jobs at a time, and if the size of their intermediate files does not exceed your quota.
- Local scratch
- Using node-local disk for scratch files reduces load on the network filesystem and relieves space pressure on
/home. For these reasons it is the recommended best practice. Request the
localscratchresource from Grid Engine and set
$TMPDIR. Here is a fragment of a job script as an example (
#$ -pe gaussian 4 #$ -l localscratch=230G export GAUSS_SCRDIR=$TMPDIR
- 231 GB is the largest volume of scratch disk you can access with this method at Placentia. Other nodes are available with 122G and 94G of local scratch. The machines with 231G also have a second local disk, but this cannot be accessed through $TMPDIR. Write to firstname.lastname@example.org for help if you want to try using this second local disk.
- See the next section for more about MAXDISK, and the page on Local Scratch.
- If your jobs generate intermediate files that may exceed your quota in
/homeor the capacity of local scratch, you can use "no-quota scratch",
/nqs/$USER. It is simpler to manage than local scratch, but you must read about it and accept its terms of usage on the Storage System page before using it. Once you have done that and followed the instructions there, you set
GAUSS_SCRDIRin your script:
export GAUSS_SCRDIR=/nqs/$USER # bash syntax setenv GAUSS_SCRDIR /nqs/$USER # tcsh syntax
NOTE WELL: If too many jobs run at the same time using /home or NQS for
GAUSS_SCRDIR it can badly affect the responsiveness of the entire cluster, via the Lustre file system. We do not yet know what an appropriate maximum number is. We ask that you limit the number of Gaussian jobs writing to NQS to under 10 per user, or 20 per research group.
A MAXDISK of 200GB is set in the global Default.Route file to reduce the risk of accidentally running out of disk space. If your job requires more than 200GB of scratch disk then you can either specify MAXDISK explicitly in the route section of the Gaussian input file, or create your own Default.Route file in the job's working directory.
Note that MAXDISK only reduces the risk of running out of disk:
- Two or more jobs running simultaneously can each write up to MAXDISK, and their sum may exceed the quota or fill up the disk.
- MAXDISK has different effects for different methods as described on the Gaussian web-site.
The default memory reserved for your job by SGE is 2GB. Gaussian handles memory in such a way that this provides about 1GB of extra capacity with Gaussian. This is often insufficient, so you may need to know how to modify this setting.
The %MEM setting in the Gaussian input file is used to specify additional memory. The value for %MEM should be at least 1GB less than the value specified in the SGE job submission script. Conversely, the value specified for h_vmem in your job script should be at least 1GB greater than the amount specified in the %MEM directive in your Gaussian input file. The exact increment needed seems to depend on the job type and input details; 1GB is a conservative value determined empirically.
For example, if your input file
testmem.com begins like this:
%mem=5GB # MP2/GenECP Pseudo=Read ...etc...
...then your job script,
Gaussian_memory.sh should look like this:
#$ -cwd #$ -j y #$ -l h_rt=12:00:00 #$ -l h_vmem=6G module load gaussian g09< testmem.com > testmem.out
h_vmem specification, 1GB greater than the %MEM specification.
Also note that decimal fractions are not understood in the
h_vmem directive. If you want to try 6.5GB per process, for example, you must specify
- Note added Feb 2010
- There is some evidence that 1Gb may be too small an increment for G09 jobs. If you get either
- a message in the G09 output
galloc: could not allocate memory, or
- no message at all, but the job dies mysteriously early in the run,
- a message in the G09 output
- then try boosting
h_vmemhigher still and re-submitting. And please consider reporting your experiences to us.
The submission of parallel jobs to SGE is similar to the submission of serial jobs, with three exceptions:
- The Gaussian input file must include an %NProcS directive
- The job script must include a parallel environment directive,
#$ -pe gaussian N, where N is the value for %NProcS specified in the input file
- Memory specification is slightly more complicated.
Gaussian at ACENET uses "shared memory" rather than "Linda" parallelism. For some links Gaussian spawns one more process than the number requested with %NProcS and -pe gaussian N. This extra process does no computing, but does occupy memory. To get a reasonable estimate of the amount of memory to specify in your job script (-l h_vmem), use this formula:
h_vmem = (%nprocs + 1) * (%mem + 1Gb) / (%nprocs)
In other words: Add 1 GB to the %MEM request, then multiply that by one greater than the number of processes you want. This will give you an pseudo-total memory requirement for the job. But Grid Engine computes the memory limit on a per-slot basis, so divide that total memory requirement by the number of Grid Engine slots requested (-pe gaussian N), which should be identical to the number of processes requested in the input file (%NProcS). Supply that quotient as the argument to
-l h_vmem= in your job script.
Consider an example input file,
testpar.com which begins with the directives
%MEM=5GB %NPROCS=4 ...rest of input file...
The corresponding job script should look like this:
#$ -cwd #$ -j y #$ -l h_rt=2:00:00 #$ -l h_vmem=7500M #$ -pe gaussian 4 module load gaussian g09 < testpar.com > testpar.out
Add 1GB to the %MEM request, giving 6GB per process. Multiply that by 5 (four processes requested, plus one "idle" process) to get 30GB of total memory. Divide that by 4 slots requested from SGE, yielding h_vmem of 7.5G per slot, which must be specified as 7500M.
(Explanatory note: Our sources suggest that %MEM actually specifies the amount of memory the Gaussian processes will share, which means that the above calculations over-count the total memory required. However, SGE does not account for shared memory and the operating system understands each process's virtual memory use as shared+private. Thus the over-counting is necessary to avoid jobs being killed.)
"Your mileage may vary" when it comes to the performance of parallel Gaussian work. Typically Gaussian does not scale well beyond 8 processors, and for some methods much less. This means, running a job on 8 processors will not give you your results in anything like one-eighth the time it would take on one processor. If you are going to run Gaussian as a parallel application, please do some preliminary experiments to determine how well it scales for your methods.
The specialized Gaussian hosts (gaussian.q) are all 4-core machines, so in normal operation you will not request more than 4 parallel processes.
Gaussian with NBO5
We recently aquired a license for NBO5 and have made available a copy of Gaussian with
l607.exe rebuilt. To access this version in
#$ -cwd #$ -j y #$ -l h_rt=12:00:00 #$ -l h_vmem=2G module load gaussian/g09.nbo5 g09 < test.com > test.out
Felix Kanneman has supplied a job script which analyzes a Gaussian input file and automatically submits the job with the appropriate memory request. ACENET technical staff have not had the opportunity to test it extensively, but you are invited to try it at your own risk. The script and documentation can be found on Felix's page.
This was testing with g03, but should still work with g09. Still use precaution when using this utility.
Gaussian Error Messages
Please visit the Gaussian Error Messages page for a list of common errors and how to diagnose and solve them.
Visualizing structures and results
Using Open Babel and VMD, it is possible to visualize results from Gaussian. The visualization of Gaussian results can also be completed using JMOL and Molekel which are both freely available and can read in Gaussian output files directly without the need for conversion.
Even though ACENET now supports Gaussian-09 as the default, Gaussian-03 is still available in
/usr/local/gaussian/g03. So those who need to finish work started with this version can. You can access it by loading the appropriate module:
#$ -cwd #$ -j y #$ -l h_rt=12:00:00 #$ -l h_vmem=2G module purge module load pgi/8 gaussian/g03.e01 g03 < test.com > test.out
Gaussian 03 with NBO5
To access Gaussian 03 with NBO5:
#$ -cwd #$ -j y #$ -l h_rt=12:00:00 #$ -l h_vmem=2G module load gaussian/g03.nbo5 g03 < test.com > test.out
Known issue - ntrext1 error
Some users are seeing a buffer allocation error with Gaussian
g03.E01 when using a NFS mounted directory as the Gaussian scratch directory. If you are seeing this problem, you can
- migrate to Gaussian-09, or
module load gaussian/g03.d02, or
- use local scratch.