Qstat

From ACENET
Jump to: navigation, search
Main page: Job Control

The fundamental command for monitoring job status in Grid Engine is qstat. By default (that is, with no arguments) it will show information about your own jobs in a one-line-per-job format which is described on the Job Control page.

qstat support many options. Some control which jobs are displayed, e.g.:

$ qstat -u \*                 Jobs belonging to all users
$ qstat -q test.q -f          Jobs running in test.q
$ qstat -q \*@cl001 -f -u \*  Jobs running on node cl001

Some control what sort of information is displayed, e.g.:

$ qstat -g t                  One line per parallel process
$ qstat -j jobid              Details on one job including error causes and resource usage

The definitive and complete reference is man qstat.

When a job has ended it no longer appears in qstat, but some information about it is available through qacct.

Job statuses

The common job status identifiers output by qstat are listed below. They often appear in combinations like Eqw, hqw, or dr:

qw job is waiting
r job is currently running
t job is being transferred to the compute nodes
s or S job is suspended, should only see this in Subordinate Queues
h job is being held due to a job dependency or due to sysadmin action
E submission is in error state, use qstat -j job_id to find out why
R job has been restarted (Rr) or is waiting to be restarted (Rq), typically follows a node crash.
d job has been registered for deletion, usually seen if a node has crashed