Parallel Jobs

From ACEnet
Jump to: navigation, search
Main page: Job Control

Contents

Parallel environments

A parallel environment is essentially a collection of CPU slots, along with suitable set-up and tear-down code, for running a certain class of parallel program. The parallel environment you request for a job dictates how Grid Engine will start and stop the job and how it will distribute the processes to nodes.

The two most important parallel environments offered at ACEnet are:

Parallel Environment Description
ompi* For any parallel MPI job
openmp For OpenMP(shared memory) jobs

Other parallel environments may be available. These can be listed with qconf -spl, but we do not recommend you use these unless you have consulted with a Computational Research Consultant.

MPI jobs

Here's an example script that runs a 16-process MPI application mpi_app from the current directory:

#$ -cwd
#$ -l h_rt=01:00:00
#$ -pe ompi* 16

mpirun mpi_app

The * in ompi* is a wildcard and must be quoted or escaped if used on the command line, e.g.:

$ qsub -pe 'ompi*' 16 jobscript

This is because at some sites an MPI job cannot span all nodes, so there is a parallel environment for each block of nodes. For example, at Mahone, there are ompi_2000 and ompi, corresponding to the Myrinet-2000 switch (hosts cl001-cl063) and the Myrinet-10G switch (cl064-cl141), respectively. Using the wildcard notation, -pe ompi*, will allow your job to go to either switch, but will ensure that the job is not split across the two switches. At others sites, a similar approach may be used with Infiniband.

You can use the openmp environment if you want all the MPI processes to run on a single host.

If you submit an MPI job that requests a lot of slots, (e.g. more than 16), you are encouraged to turn on the Reservation option:

#$ -R yes

For more on how Reservation works please see the page on Scheduling Policies and Mechanics.

OpenMP or threaded jobs

Since OpenMP is for shared memory programming, OpenMP programs must execute on only one node -- they cannot span nodes like MPI programs do. Consequently, they need a distinct parallel environment under the Grid Engine, -pe openmp np. The maximum number of slots you can request with this parallel environment at ACEnet is 16.

#$ -cwd
#$ -l h_rt=01:00:00
#$ -pe openmp 4

export OMP_NUM_THREADS=$NSLOTS
./openmp_job

Variable process counts

You can ask Grid Engine for "as many slots as are available" (within reason) by using a range descriptor instead of a simple number for the slot count.

#$ -pe ompi* 8,16

This example will schedule when the system can provide either 16 or 8 slots. Grid Engine tries the largest number first, but if it can schedule the job sooner with the smaller number of slots it will do that.

The range descriptor can also be a range with a hyphen rather than a list:

#$ -pe openmp 4-16

This example requests anywhere from 4 to 16 slots on a single host --- As many slots as possible, but don't wait for a larger number.

Resources
User Support
News and Events
Organization
About Us