Sun Grid Engine

From ACENET
(Redirected from Grid Engine)
Jump to: navigation, search
Achtung.png Legacy documentation

This page describes a service provided by a retired ACENET system. Most ACENET services are currently provided by national systems, for which please visit https://docs.computecanada.ca.


Main page: Job Control
Short Description
Sun Grid Engine (also known as N1 Grid Engine) provides policy-based workload management and dynamic provisioning of application workloads. This is the job scheduling system in the ACENET environment.
Version
6.1u6
Help
$ man qsub

Job parameters

Option Description
-l h_rt=time Hard run time either in seconds or in hh:mm:ss format.
-l h_vmem=mem Hard virtual memory limit; mem specifier may include k, K, m, M, g, G; details at man queue_conf.
-l h_stack=mem Stack size limit; mem specifier may include k, K, m, M, g, G; default is 10M.
-l test=true Used to request access to the high-availability test nodes. h_rt must also be 1 hour or less. (*)
-l lscratch=mem Ensure each host assigned has at least mem space in $TMPDIR. (*)
-l chip=name Request hosts equipped with a certain CPU type. Allowed values are xeon and opteron (*)
-cwd Start the job script in the same directory it was submitted from, the "current working directory". If absent, job will start in your home directory, /home/$USER.
-j y Join the stderr output stream to the stdout stream. Error messages will be mixed in with the job script standard output. If -j n then standard error will go into job_name.ejob_id.
-N name Assigns a name to the job other than the name of the job script.
-o file Redirects the standard output to the named file.
-S shell Shell to interpret the job script: /bin/bash (default) or /bin/csh.
-m e Send mail at job end; -m eas sends on end, abort, suspend. Do not use -m unless -M is also set.
-M user@mail.host Where to send mail if -m is set.
-pe pename nslots Parallel environment and number of slots (CPU cores) for the job; nslots can be a range: -pe ompi* 4-8.
-r y Rerunnable? Jobs which fail in certain ways (e.g. a host crashes) may be restarted automatically; if you do not want this to happen, set -r no.
-R y Reserve slots for this job; useful if the job requires a large number of slots
-V Job has the same environment variables as the submission shell. This option is on by default, but it will not work for csh as an execution shell
-hold_jid Defines or redefines the job dependency list of the submitted job
-t n[-m[:s]] Submits an array job.

(*) indicates a custom resource defined at ACENET.

For a comprehensive list of the available options see man qsub.

Environment variables

When a job runs, several variables are preset into the job's environment. They can be used in job submission scripts. For example, this is how a job ID number can be printed in the output.

echo $JOB_ID

The following can also be used as pseudo-variables in the directives section (#$):

$HOME       home directory on execution machine
$USER       user ID of job owner
$JOB_ID     current job ID
$JOB_NAME   current job name (see -N option)
$HOSTNAME   name of the execution host
$TASK_ID    array job task index number

Below is a nearly-comprehensive list; the most useful ones are highlighted in boldface:

  • ARC – The architecture name of the node on which the job is running. The name is compiled into the sge_execd binary.
  • ENVIRONMENT – Always set to BATCH. This variable indicates that the script is run in batch mode.
  • HOME – The user's home directory path as taken from the passwd file.
  • HOSTNAME – The host name of the node on which the job is running.
  • JOB_ID – A unique identifier assigned by the sge_qmaster daemon when the job was submitted. The job ID is a decimal integer from 1 through 9,999,999.
  • JOB_NAME – The job name, which is built from the file name provided with the qsub command, a period, and the digits of the job ID. You can override this default with qsub -N.
  • LOGNAME – The user's login name as taken from the passwd file.
  • NHOSTS – The number of hosts in use by a parallel job.
  • NQUEUES – The number of queues that are allocated for the job. This number is always 1 for serial jobs.
  • NSLOTS – The number of queue slots in use by a parallel job.
  • PATH – A default shell search path of: /usr/local/bin:/usr/ucb:/bin:/usr/bin.
  • PE – The parallel environment under which the job runs. This variable is for parallel jobs only.
  • PE_HOSTFILE – The path of a file that contains the definition of the virtual parallel machine that is assigned to a parallel job by the grid engine system. This variable is used for parallel jobs only. See the description of the $pe_hostfile parameter in sge_pe for details on the format of this file.
  • QUEUE – The name of the queue in which the job is running.
  • REQUEST – The request name of the job. The name is either the job script file name or is explicitly assigned to the job by the qsub -N command.
  • RESTARTED – Indicates whether a job has been restarted. If set to value 1, the job was interrupted at least once. See qsub -r documentation.
  • SGE_ROOT – The root directory of the grid engine system as set for sge_execd before startup, or the default /usr/SGE directory.
  • SGE_BINARY_PATH – The directory in which the grid engine system binaries are installed.
  • SGE_CELL – The cell in which the job runs.
  • SGE_JOB_SPOOL_DIR – The directory used by sge_shepherd to store job-related data while the job runs.
  • SGE_O_HOME – The path to the home directory of the job owner on the host from which the job was submitted.
  • SGE_O_HOST – The host from which the job was submitted.
  • SGE_O_LOGNAME – The login name of the job owner on the host from which the job was submitted.
  • SGE_O_MAIL – The content of the MAIL environment variable in the context of the job submission command.
  • SGE_O_PATH – The content of the PATH environment variable in the context of the job submission command.
  • SGE_O_SHELL – The content of the SHELL environment variable in the context of the job submission command.
  • SGE_O_TZ – The content of the TZ environment variable in the context of the job submission command.
  • SGE_O_WORKDIR – The working directory of the job submission command.
  • SGE_CKPT_ENV – The checkpointing environment under which a checkpointing job runs. The checkpointing environment is selected with the qsub -ckpt command.
  • SGE_CKPT_DIR – The path ckpt_dir of the checkpoint interface. Set only for checkpointing jobs. For more information, see the checkpoint(5) man page.
  • SGE_STDERR_PATH – The path name of the file to which the standard error stream of the job is diverted. This file is commonly used for enhancing the output with error messages from prolog, epilog, parallel environment start and stop scripts, or checkpointing scripts.
  • SGE_STDOUT_PATH – The path name of the file to which the standard output stream of the job is diverted. This file is commonly used for enhancing the output with messages from prolog, epilog, parallel environment start and stop scripts, or checkpointing scripts.
  • SGE_TASK_FIRST - Array Job first parameter.
  • SGE_TASK_ID – The task identifier in the Array Job represented by this task.
  • SGE_TASK_LAST - Array Job second parameter.
  • SGE_TASK_STEPSIZE - Array Job third parameter.
  • SHELL – The user's login shell as taken from the passwd file. Note – SHELL is not necessarily the shell that is used for the job.
  • TMP – The same as TMPDIR.
  • TMPDIR – The absolute path to the job's temporary working directory.
  • TZ – The time zone variable imported from sge_execd, if set.
  • USER – The user's login name as taken from the passwd file.