Once you have submitted your job(s) to the queue, you have various ways of checking in on the status of your jobs including e-mail notification of job completion and command line access to both your jobs status and the current state of the pool.
Managing Job Status
You can monitor progress of your batch processing using the
condor_q commands. This section describes how to check the status of your processes at any time, and how to remove a process from the Condor queue.
After you submit a job for processing, you can check the status of the Condor machine pool and verify that machines are available on which your jobs can execute.
To check the status of the Condor pool, type the command condor_status. This command returns information about the pool resources. Output lists the number of slots available in the pool and whether they are in use. If there are no idle slots, your batch processing is queued when it is submitted.
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
firstname.lastname@example.org LINUX X86_64 Claimed Busy 1.060 19750+17:43:50
email@example.com LINUX X86_64 Claimed Busy 1.060 1975 0+17:43:48
firstname.lastname@example.org LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:43
email@example.com LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:36
firstname.lastname@example.org LINUX X86_64 Unclaimed Idle 0.010 1975 0+00:03:57
email@example.com LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04
firstname.lastname@example.org LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 7 0 4 3 0 0 0
Total 7 0 4 3 0 0 0
To check the cumulative use of resources within in the Condor pool, include the option
-submitter with the command
condor_status. This command returns information about each user in the Condor queue. Output lists the user's name, machine in use, and current number of jobs per machine. Use this command to help determine how many resources Condor has available to run your jobs. An example is shown here:
> condor_status -submitter
Name Machine Running IdleJobs HeldJobs
email@example.com w4.hmdc.ha 2 0 0
firstname.lastname@example.org x1.hmdc.ha 9 0 0
email@example.com x3.hmdc.ha 40 0 0
firstname.lastname@example.org. x5.hmdc.ha 32 0 0
RunningJobs IdleJobs HeldJobs
email@example.com 49 0 0
firstname.lastname@example.org. 32 0 0
email@example.com 2 0 0
Total 83 0 0
Cluster Status Summary
To view a summary of the Condor cluster available resources, run:
rce-info.shTo view a summary of the resources currently in use on the Condor cluster, run:
rce-info.sh -t used
Removing your job
To remove a process from the queue, type the command condor_rm <cluster ID>.<process ID>. For example:
Job 9.9 marked for removal
To find a list of your jobs type:
To remove all jobs affiliated with a cluster, type the command
condor_rm <cluster ID> . For example, the command condor_rm 4 removes all jobs assigned to cluster 4.
To remove all of your clusters' jobs from the Condor queue, type condor_rm -a. For example:
All jobs marked for removal.
Jobs must be deleted from the host they were submitted from.
Each of these sections represents a different RCE Login server.
When you submit a job, the server you are logged in to is responsible for "scheduling" that job and keeping track of its status.
Each RCE Login server maintains this status separately, so when you want to remove a job, you must also specify the server where you started it.
The full syntax to remove a job is thus:
condor_rm <cluster ID>[.<process ID>] -name <schedd_string>
condor_rm 4806 -name "HMDC.firstname.lastname@example.org"