RCE Documentation

We designed our RCE cluster (a large, powerful pool of computers) around open standards for reliability, scalability, extensibility, and interoperability. We use hardware from major vendors and a standard, enterprise-grade Linux distribution customized to address the specific needs of our users. Our infrastructure is designed to provide the greatest possible range of options to you, rather than obliging you to restrict yourself to a narrow range of tools and methodologies. We provide a stable platform on which a wide range of technologies can be deployed.

Our computing clusters consist of two main pools of resources:

Batch processing is intended for long-running processes that are CPU intensive and able to run in parallel. Batch servers enables users to perform multiple commands and functions without waiting for results from one set of instructions before beginning another, and to execute these processes without being present.

Interactive servers are intended for large processes that are memory intensive. Our interactive cluster allows users to view and engage with their jobs in real time.

Both batch and interactive servers at HMDC run on a high throughput cluster, based on HTCondor, on which users can perform extensive, time-consuming calculations without the technical limitations imposed by a typical workstation. Our computing clusters use parallel processing to enable faster execution of computation-intensive tasks. Many computing tasks can benefit from implementation in a parallel processing form. The cluster is extremely useful for the following applications:

Jobs that run for a long time: You can submit a batch processing job that executes for days or weeks and does not tie up your RCE session during that time.

Jobs that are too big to run on your desktop: You can submit jobs that requires more infrastructure than your workstation provides. For example, you could use a dataset that is larger in size than the memory on your workstation.

Groups of dozens or hundreds of jobs that are similar: You can submit batch processing that entails multiple uses of the same program with different parameters or input data. Examples of these types of submission are simulations, sensitivity analysis, or parameterization studies.

Access to our computing clusters is available to all RCE users.

Accessing the RCE

This guide assumes you already have an RCE account. If you do not, please contact us at help@iq.harvard.edu to request one. RCE accounts are available to researchers at Harvard and MIT; see http://projects.iq.harvard.edu/user-services/research-computing-environment-sla for details.

You can access the RCE login nodes from any modern workstation or laptop with a high-speed connection.  Please note, you'll need to connect to Harvard's VPN in order to use the RCE, whether you are connecting via NoMachine (NX), or SFTP (eg FileZilla) or SSH.   In order to connect to VPN, you'll need to claim your HarvardKey and set up two-factor authentication. 

The RCE provides a familiar, consistent user experience to all researchers; our desktop environment is built on NoMachine's NXCentOS Linux and Gnome. This allows sessions to be suspended and resumed at will: you can begin a session from one workstation, suspend it, move to another system, and resume the previous session, all with no disruption to your environment. If you work in a command line environment, and want the highest throughput without a graphical interface, the RCE is also accessible via SSH connection.

Remote access provides three categories of services:

  • Research Environment (graphical desktop)
  • Secure Shell (command-line tools)
  • File Access (home directory and non-confidential project space)

This guide assumes you already have an RCE account. If you do not, please contact us at help@iq.harvard.edu to request one.

Graphical Desktop (NX)

To connect to the RCE remote desktop, please follow the instructions on our NX4 documentation page.

Command-line (SSH)

To connect to the RCE using command-line, simply ssh to rce.hmdc.harvard.edu using a terminal in OSX and Linux, or using a utility such as PuTTY.

File Transfer (FileZilla)

To transfer files to or from the RCE, please follow the instructions on our FileZilla documentation page.

Working in the RCE

This guide provides information about basic features and functions that might be useful when you begin working within the RCE. Ready to get started? Follow the menu items on the left to learn the RCE basics.

RCE Basics

Desktop

screen_shot_2013-11-01_at_11.45.13_am.pngWhen you open the RCE, the desktop is displayed in a window. The open area in the RCE window, the icons on it, the toolbar and toolbar icons, and the workspaces together comprise the RCE. The term desktop sometimes refers to the open space in the RCE window, but this guide refers to that area as the workspace.

Mouse

RCE documentation often assumes use of a thee-button mouse. The buttons sometimes are named left-mouse, middle-mouse, and right-mouse. If you use a two-button mouse, you can emulate middle-mouse by pressing left-mouse and right-mouse simultaneously. If you use a wheel mouse, the wheel functions as middle-mouse. Functions available for each mouse button are as follows:

  • Left-mouse - Select text, select items, drag items, activate items
  • Middle-mouse - Paste text, move items, move windows to the back
  • Right-mouse - Open a context menu for an item (if a menu applies)

Note: You can switch button assignments if you are left-handed. In the RCE, choose SystemsPreferencesMouseGeneral.

Workspaces

A workspace is a distinct and separate area in the RCE, which provides a convenient tool for organizing work in progress. You can open more than one application in the default workspace, or you can open applications in different workspaces and move from one workspace to the other. Click the gray tabs in the lower-right corner of the RCE to move among your workspaces. To move open applications between workspaces, drag the application's icon from one workspace tab to another.

Directories

- jsmith
|- bin (directory for user built packages)
|- cvswork (the CVS working directory)
|- Desktop
|- lib (directory for user built packages)
|- man (directory for user built packages)
|- printjobs (documents printed to a PDF file)
|- public_html (old web hosting folder; service is no longer available)
|- pylib (custom Python libraries)
|- shared_space (symbolic links to project and web spaces)

All home directories are kept on storage separate from the RCE and cluster, so that you can access files no matter which server you are logged in to.

  • Your home directory is located at ~/ or /nfs/home/J/jsmith
  • Project space is located at /nfs/projects{_ci3,_nobackup,_nobackup_ci3}/j/jsmith
  • Shortcuts to your project space are located at ~/shared_space/my_project_name/
  • Backups are located in every directory, and kept in hidden directories named .snapshot

Projects & Shared Space

Project space can be used privately, or shared with collaborators (hence the name, "shared space"). Because our researchers bring confidential data to the RCE, we keep all project space separate from your home directory. There are four types of project space:

  • Project space with long-term backups
  • Project space without long-term backups
  • Confidential project space with long-term backups
  • Confidential project space without long-term backups

When you apply for an RCE account, you are asked which category would best suit your needs. Therefore, you should know ahead of time if your data is rated as confidential information by your IRB.

HMDC offers two backup plans for your data. If you have large data sets, there is a nominal fee involved; please contact us for more information.

Project Backups

  • Monthly full backups and daily incremental backups to tape, available up to three (3) months. Recovery from tape backup can take anywhere from a few hours to over a day.
  • Multiple hourly snapshots, multiple daily snapshots, plus a single weekly snapshot. Recovery from snapshots take only a few minutes and is accessible directly to the researchers, look for a hidden .snapshot folder (note the dot).

No Project Backups

  • No tape backups.
  • Weekly snapshot. Data must reside on disk for at least one week, after which the researcher can recover data up to a week.

Locating Project Space

Home directories are always named with your username, and can be found at /nfs/home/J/jsmith/, or by using the tilde shortcut: ~/. Personal project space is also named after your username (jsmith in this example), but shared space can take any name. It will be located depending on the type of project space you requested:

  • Project space with backups: /nfs/projects/p/projectname
  • Project space without backups: /nfs/projects_nobackup/p/projectname
  • Confidential project space with backups: /nfs/projects_ci3/p/projectname
  • Confidential project space without backups: /nfs/projects_nobackup_ci3/p/projectname

(In this example, ci3 denotes level 3 confidential data. For more information on security ratings, see Harvards data classification table.)

To make accessing your project space easier, a shortcut (called a symbolic link, or symlink) is created in your home directory in ~/shared_space/. In this directory, you will find symlinks to all the projects of which you are a member. For example, as a researcher, I may store my raw data in ~/shared_space/ci3_politicalfunds/data/ and my output in ~/shared_space/ci3_politicalfunds/output/. I could then additionally access my output from /nfs/projects_ci3/p/ci3_politicalfunds/output/....

You are welcome to create your own symlinks! In this example, I want a shortcut to my project space on the desktop, for easier access in my NX session. From a terminal, I would execute this command: ln -s ~/Desktop/politicalfunds /nfs/projects_ci3/p/ci3_politicalfunds (please take note there is no trailing slash).

NOTICE: Do not use your home directory to store IRB-rated confidential (i.e. level 3) data, as home directories do not have the same security applied as project spaces. You should also avoid saving large files to your home directory as there is a 2GB quota (and exceeding that quota will prevent new NX sessions) however all project directories have individual quotas.

Project Space Collaboration

Making files group writeable in your project space

If you create files in a project space, the default is to create files that your whole project group can read, but only you can modify. We run an automatic nightly process to ensure that whatever permissions you have on files in your project space, they are also granted to your project group. Any project files you can modify will become group-modifiable, and any project files you can execute will become group-executable.

In order to grant group writeable permissions immediately (to other members of your shared_space project), please do the following:

  1. Open a terminal window from ApplicationsAccessoriesTerminal
  2. Determine which project directory you wish to modify: ls ~/shared_space/
  3. Run the script: fixGroupPerm.sh
  4. You will be prompted for the project directory name. This is a directory located under ~/shared_space/, which you obtained in step 2.

Running this script will grant writeable permissions to all files under this location. Please give this script only one argument (i.e. one project space at a time).

You will need to run this command after each time you create files in your shared project space in order to grant your collaborators the same level of access you have. If you are running a script to create the files (e.g. R code, or a Stata .do file), it may be simplest to add a call to the fixGroupPerms.sh script at the end of your code.

Configure your default file sharing preferences

If most or all of your work in the RCE is done in collaborative project spaces, you may want to change the default file creation mode (i.e. the file access permissions) for your RCE account so that all files you create can be modified by members of the group which owns them. If you decide to pursue this option, two important caveats apply:

  1. This change only affects the default access mode assigned to newly-created files. The application creating the file can override the default to set a more restrictive access mode on the files.
  2. The access mode on the files/directories you create apply to whichever group owns the file/directory, which may or may not be your collaborative group.

Other notes regarding file permissions:

  • You can run the command ls -l on a file to view its ownership. Group ownership will be displayed in the fourth column.
  • If you are creating files under ~/shared_space, then they should automatically be created with ownership by your collaborative group.
  • If you create files in your home directory, however, they will be owned by the "users" group and the access permissions (e.g. allow read/write/access by the group) will apply to all RCE users.
  • Restricting access to your home directory itself can help limit the impact of such an exposure.

If you understand and accept these caveats, you can proceed to change your default file creation mode for your RCE account by doing the following:

  1. Navigate to ApplicationsRCE UtilitiesFile Sharing Config Helper
  2. Choose the option for 002.
  3. This can also be run from a terminal window with the command fileSharing.
  4. Terminate your current RCE session and start a new RCE session.

Working With Your Jobs

Overview

When you run an RCE Powered Application, you must request a certain amount of processors and memory. If the amount you requested is available, a job is created on the cluster. It is not tied to your desktop session; if you terminate your session, or your NoMachine desktop becomes unusable, your job is it retrievable. Below are some tips on managing your RCE jobs.

Checking Resources

Submitting Interactive Jobs

RCE cluster interactive jobs are persistent and come without lease periods, so you never have to request job extensions. However, after a three day idle period in which there has been no activity, your job becomes preempt-able, and may be terminated to make sure resources are fairly shared. See our documentation for more information.

You will receive email notices if your job becomes idle; an example is below:

Dear <username>,

Your RCE powered job xstata-mp 14.0 has been idle for 2 days. If your job
remains idle for three or more days, your job will become preemptible.

Under conditions of RCE cluster saturation, RCE powered jobs, idle for
three or more days, can be pre-empted in order to satisfy resource
requirements of newly submitted RCE powered jobs. If your RCE powered
job is pre-empted, you will lose all currently unsaved work within that
job. If you don't plan on actively utilizing xstata-mp 14.0 within the next
day, please make sure to save your work or terminate your job if you've
successfully accomplished your tasks. Otherwise, using xstata-mp within the
next 24 hours will stave off pre-emptability for another two days.

Monitoring Your Job

  • From a desktop NX session: Applications > RCE Utilities > SSH to RCE job server
  • From a terminal: condor_ssh_to_job_util

Finding and Retrieving Your Job

  • From a desktop NX session: Applications > RCE Utilities > Attach all jobs
  • From a terminal: condor_q username

Running Batch Jobs

This section describes the batch processing environment in our facilities.

What is Batch Processing?

Batch processing is a procedure by which you submit a program for delayed execution. Batch processing enables you to perform multiple commands and functions without waiting for results from one command to begin another, and to execute these processes without your attendance. The terms process and job are interchangeable.

The batch processing system at HMDC runs on a high throughput cluster on which you can perform extensive, time-consuming calculations without the technical limitations imposed by a typical workstation.

Why Use Batch Processing?

HMDC provides a large, powerful pool of computers that are available for you to use to conduct research. This pool is extremely useful for the following applications:

  • Jobs that run for a long time - You can submit a batch processing job that executes for days or weeks and does not tie up your RCE session during that time.  In fact a user does not need to run a RCE desktop session to submit a batch process.  Batch jobs can be submitted from command-line via ssh.

  • Jobs that are too big to run on your desktop - You can submit batch processing that requires more infrastructure than your workstation provides. For example, you could use a dataset that is larger in size than the memory on your workstation.

  • Groups of dozens or hundreds of jobs that are similar - You can submit batch processing that entails multiple uses of the same program with different parameters or input data. Examples of these types of submission are simulations, sensitivity analysis, or parameterization studies.

If you are interested in learning more about our batch cluster resource manager continue reading below.  For those that want to move on and learn how to submit a batch job please click the Batch Basics link on the left side menu.

Condor System for Batch Processing

The Condor system enables you to submit a program for execution as batch processing, which then does not require your attention until processing is complete. The Condor project website is located at the following URL:

http://www.cs.wisc.edu/condor/

To view the user manual for this software, go to the following URL and choose a viewing option:

http://www.cs.wisc.edu/condor/manual/

Condor System Components and Terminology

A Condor system comprises a central manager and a pool. A Condor central manager machine manages the execution of all jobs that you submit as batch processing. An associated pool of Condor machines associated with that central manager execute individual processes based on policies defined for each pool member. If a computing installation has multiple Condor pools or additional machine clusters dedicated to Condor system use, these pools and clusters can be associated as a flock.

Listed below are some common Condor terms and references, which are unique to Condor:

  • Cluster - A group of jobs or processes submitted together to Condor for batch processing is known as a cluster. Each job has a unique job identifier in a cluster, but shares a common cluster identifier.

  • Pool - A Condor pool comprises a single machine serving as a central manager, and an arbitrary number of machines that have joined the pool. Simply put, the pool is a collection of resources (machines) and resource requests (jobs).

  • Jobs - In a Condor system, jobs are unique processes submitted to a pool for execution and are tracked with a unique process ID number.

  • Flock - A Condor flock is a collection of Condor pools and clusters associated for managing jobs and clusters with varying priorities. A Condor flock functions in the same manner as a pool, but provides greater processing power.

When you submit batch processing to the Condor system, you use a submit description file (or submit file) to describe your jobs. This file results in aClassAd for each job, which defines requirements and preferences for running that job. Each pool machine has a description of what job requirements and preferences that machine can run, called the machine ClassAd. The central manager matches job ClassAds with pool machine ClassAds to select the machine on which to execute a job.

Process Identification Numbers

For Condor batch processing, there are two identification numbers that are important to you:

  • Cluster number - The cluster number represents each set of executable jobs submitted to the Condor system. It is a cluster of jobs, or processes. A cluster can consist of a single job.

  • Process number - The process number represents each individual job (process) within a single cluster. Process numbers for a cluster always start at zero.

Each single job in a cluster is assigned a process identification number, called the process ID or job ID. This ID consists of both cluster and process number in the form <cluster>.<process>.

For example, if you submit a batch that consists of a single job, and your batch submission to the Condor queue is assigned cluster number 20, then your process ID is 20.0. If you submit a batch that consists of fifteen jobs that all use the same executable, and your batch submission to the Condor queue is assigned cluster number 8, then your process IDs range from 8.0 to 8.14.

Batch Basics

Batch
 Processing 
Terminology


These 
terms
 are
 specific 
to
 HTCondor,
 our
 batch
 processing
 scheduler:


  • Node - 
A
 processor
 (or 
set 
of
 processors) 
capable 
of
 running 
a 
job

  • Pool
 - A
 collection 
of 
nodes

  • Job
 - A 
single 
process 
with 
an
 executable, 
arguments,
 input, 
output
 and
 error 
files

  • Cluster - 
A
 collection 
of 
jobs 
that 
share 
common
 executables 
and/or
 input
files

  • Queue
 - 
The
 list 
of 
jobs 
that 
have
 been 
submitted 
to 
run 
on
 the 
pool

  • Scheduler - 
A
 process
 responsible 
for 
determining 
which
 jobs 
in
 the
 queue
 are 
run
next

Batch Processing Utilities 

  • condor_status
 - shows 
the 
status 
of 
all 
of 
the
 nodes
 in
 the
 pool

  • condor_q
 - 
shows 
the
 status 
of 
all 
jobs 
in 
the
 queue

  • condor_submit
 - submit
 a
 cluster
 of
 jobs
 to
 the
 queue

  • condor_submit_util - 
RCE 
helper 
application
 that
 automates 
the 
submission 
process (should use in place of condor_submit in most cases)

  • condor_userprio
 - shows
 usage
 statistics
 and
 priorities 
for 
users 
who
are 
actively
using 
pool
 resources

What do I need to get started?
To get started setting up and submitting a job follow the Batch Workflow link in the left menu

Batch workflow

The workflow to submit batch processing to the Condor system is as follows:

  1. Create a directory in which to submit jobs to the Condor system.

    Make sure that the directory and files with which you plan to work are readable and writable by other users, which include Condor processes.

    For example, type the following:

    mkdir condor
    cd condor
    

    You can request that a project directory be set up for you to use for batch processing. If you perform you batch processing within your home directory, the space used for your data and program files can consume much of your allotted resources, and this can cause problems with logging into the system, so working in a project space is recommended. For more information on project spaces go to our Projects and Shared Space page.

  2. Choose an execution environment, called a universe, for your jobs.

    At HMDC, you always use the vanilla universe. This execution environment supports processing of individual serial jobs, but has few other restrictions on the types of jobs that you can execute.

  3. Make your jobs batch ready.

    Batch processing runs in the background, meaning that you cannot input to your executable interactively. You must create a program or script that reads in your inputs from a file, and writes out your outputs to another file.

    You also must identify the full path and executable source to use for your Condor cluster. The default executable for the condor_submit_util script is the R language. In the RCE, the path and executable source for this language is /usr/bin/R.  Any command line application or program can be submitted as a batch job (Matlab, Stata, Python, etc)

  4. If you choose to use the condor_submit_util script to create the submit description file (or submit file) and submit your jobs to the Condor system for batch processing automatically, skip to step the next step.

    If you choose to submit your batch processing to the Condor system manually, create a submit file.

    A submit file is a plain-text file that describes a batch of jobs for the Condor software. This file contains the following descriptors:

    • Environment (vanilla)

    • Executable program path and file name

    • Program arguments (properly quoted -- see manual)

    • Input and output file names

    • Log and error file names

    Here is an example of a basic submit file:

    Universe        = vanilla
    Executable      = /usr/bin/R
    Arguments       = --no-save --no-restore
    should_transfer_files = NO
    Requirements = Memory >= 32
    output  = $HOME/mybatchjob/output.txt
    error   = $HOME/mybatchjob/error.txt
    Log     = $HOME/mybatchjob/log.txt
    Queue   1
    
  5. Execute the condor_submit_util command to write the submit file and submit your program automatically to the Condor job queue.

    If you chose to write your own submit file, execute the condor_submit <submit file>.submit command to submit your jobs to the queue.

    Condor then checks the submit file for errors, creates a ClassAd object and the object attributes for that cluster, and then places this object in the queue for processing.

Determining batch parameters

Before you submit your program for batch processing, you need to determine the parameters for this submission. To use the Condor system for batch processing, you must define these parameters by assigning values to submit file arguments, which describe the jobs that you choose to submit for processing.

In the RCE, you always use the vanilla environment.

To determine the remaining submit file arguments, answer the following questions:

  • What is the executable path and file name?

    For any shell script or statistical application installed in the RCE, the condor_submit_util script can determine the full path for the executable. At the script prompt, you type in the name of your script, program, or application. The default executable in the RCE is the R language, and the path and executable name are /usr/bin/R.  Any command line applications or programs can be used for batch processing, including Matlab, Stata, Python, Perl, etc.

  • Do you have any arguments to supply to the executable?

    Arguments are parameters that you specify for your executable. For example, the default arguments in the condor_submit_util script are --no-save and --vanilla, which specify how to launch and exit the R program. The argument --no-save specifies not to save the R workspace at exit. The argument --vanilla instructs R to not read any user or site profiles or restored data at start up and to not save data files at exit.

  • What are the input file names?

    If you are using the R program, your input file(s) will be whatever R script you want to execute.

  • What do you plan to name the output files?

    A general rule for batch processing is that you have one output file for each input file. Therefore, if you have seven input files, you expect to have seven output files after processing is complete. A useful practice is to correlate the names of input and output files.

  • How many times do you need to execute this script or program?

    A general rule for batch processing is that you execute your job one time for each input file that you use.

Download Batch Example

To set up our batch processing example for use, you first download the source material, and then determine your batch processing parameters.

Downloading the Source Files

To download the source files for use in this example:

  1. Log in to your RCE session.

  2. Open this page in a web browser in your RCE session.

  3. Click the file condor_example.tar.gz (see below) to download it.

  4. Click the Save to Disk option, and then click OK to save the tar file to your desktop.

  5. Open a terminal window, and unzip the tar file in the Desktop directory. Type:

    tar zxvf Desktop/condor_example.tar.gz

    The contents of the uncompressed example file will look like this:

    condor_example/
    condor_example/condor_submit_util/
    condor_example/bootstrap.R

You now have a directory named condor_example in your home directory, which contains the files necessary to run our example.

condor_example.tar.gz667 bytes

Using condor_submit_util (recommended for new users)

After you set up your working directory and define your batch processing parameters, you can submit your script and input files for processing. You can use the condor_submit_util to set up your submit file and submit your batch or you can create your submit file with the condor_submit utility and submit to the cluster.  If you are new to batch processing with HTCondor, we recommend using the condor_submit_util.

To build a submit file automatically and submit your program for batch processing, you can use the Automated Condor Submission script (aka condor_submit_util) in two modes: interactive or command line.

Note: If you do not specify any options when you use condor_submit_util, it enters interactive mode automatically. Also, if you do not specify required options when you use condor_submit_util in command-line mode, the script enters interactive mode automatically, or it reports an error and returns you to the command-line prompt.

For examples of using condor_submit_util, see Other Batch Examples.

Working interactively with condor_submit_util

When you use the script in interactive mode, you can press the Return key to accept default values. Default values are specified in the prompts inside square brackets, and appear at the end of the prompt.

To use the condor_submit_util script in interactive mode:

  1. Execute the condor_submit_util command.

    Type the following at the command prompt in your Condor working directory:

    > condor_submit_util
    *** No arguments specified, defaulting to interactive mode...
    *** Entering interactive mode.
    *** Press return to use default value.
    *** Some options allow the use of '--' to unset the value.
  2. The script first prompts you to define the executable program that you choose to submit for batch processing, and then requests the list of arguments to provide to that executable:

    Enter executable to submit [/usr/bin/R]: <executable name>
    Enter arguments to /usr/bin/R [--no-save --vanilla]: <arguments>

    The default argument --no-save specifies not to save the R workspace at exit. The default argument --vanilla instructs R to not read any user or site profiles or restored data at start up and to not save data files at exit.

    If you do not have any arguments to apply to your executable, then type -- to supply no arguments.

  3. Next, the script prompts you to provide a name or pattern for the input, output, log, and error files for this Condor cluster submission. You can include a relative path in these entries, if you choose:

    Enter input file base [in]: <input path and file name or pattern>
    Enter output file base [out]: <output path and file name or pattern>
    Enter log file base [log]: <log path and file name or pattern>
    Enter error file base [error]: <error path and file name or pattern>

    Note, if using the batch example the input file is bootstrap.R

  4. After specifying the files, the script prompts you to define the number of iterations that you choose to execute your program for processing:

    Enter number of iterations [10]: <integer>
  5. The system creates the submit file for this batch process using your responses to script prompts.

    An example submit file is shown here. To view the contents of your submit file, include the option -v (verbose) when you launch the condor_submit_util script:

    *** creating submit file '<login account name>-<date-time>.submit'

    Universe = vanilla
    Executable = /usr/bin/R
    Arguments = --no-save --vanilla
    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = <output file>

    input = <input file>
    output = <output file>
    error = <error file>
    Log = <log file>
    Queue <integer>
  6. If you use the verbose option, the script prompts you to confirm that the submit file is correct. To continue, press Return or type y.

    Condor checks the submit file for errors, creates the ClassAd object for your submission, and adds that object to the end of the queue for processing. The script lists messages that report this progress in your terminal window, and includes the cluster number assigned to the batch process. For example:

    Is this correct? (Enter y or n) [yes]: y
    ] submitting job to condor...
    ] removing submit file '<login account name>-<date-time>'
    *** Job successfully submitted to cluster <cluster ID>.
  7. Finally, the script prompts whether you choose to receive email when execution of your batch processing is complete. Press Return or type y to receive email, or type n to not send email and exit the script.

    If you choose to receive email, before exiting, the script prompts you to enter the email address to which you choose to send the notification. The default email address for notification is your email account on the server on which you launched the script. For example:

    Would you like to be notified when your jobs complete? (Enter y or n)
    [yes]: y
    Please enter your email address [<your email account on this server>]:
    *** creating watch file '/nfs/fs1/projects/condor_watch/<Condor machine>.<batch cluster>.<your email>'
  8. View your job queue to ensure that your batch processing begins execution successfully.

    See for complete details about checking the queue. An example is:

    > condor_q

    -- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> : vnc.hmdc.harvard.edu
    IDOWNER SUBMITTED RUN_TIME STPRISIZECMD
    9.0arose10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
    9.1arose10/4 11:02 0+00:00:00 R0 9.8 dwarves.pl
    9.2arose10/4 11:02 0+00:00:00 I 0 9.8 dwarves.pl
    9.3arose10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl

    4 jobs; 1 idle, 3 running, 0 held

Working with command arguments to condor_submit_util

When you use the script in command-line mode, you must specify all required options or the script does not execute. For example, the default number of iterations for the script is 10. If you do not have 10 input files in your working directory and you do not enter the option to specify the correct number of iterations that you plan to perform, the script does not execute and returns a message similar to the following:

> condor_submit_util -v
*** Fatal error; exiting script
*** Reason: could not find input file 'in.7'.

To use the condor_submit_util script in command-line mode:

  1. Execute the condor_submit_util command with the appropriate arguments. See for detailed information about script options.

    At a minimum, you must include the following options on the command line:

    • Executable program file name

    • Executable file arguments, or --noargs option

    • Input file, or --noinput option

    • Number of iterations, if you do not have 10 input files

    At a minimum, type the following at the command prompt from within your Condor working directory:

    > condor_submit_util -x <program> -a <arguments> -i <input files> 
  2. Condor creates a submit file and checks it for errors, creates the ClassAd object, and adds that object to the end of the queue for processing. The script supplies messages that report this progress, and includes the cluster number assigned to your Condor cluster. For example:

    > condor_submit_util -x <program> --noargs

    Submitting job(s)..........
    Logging submit event(s)..........
    10 job(s) submitted to cluster 24.

    If the script encounters a problem when creating the submit file, it enters interactive mode automatically and prompts you for the correct inputs.

  3. View your job queue to ensure that your batch processing begins execution.

    See for complete details about checking the queue.

Passing Arguments to the Program

You can pass arguments to the batch program using the --args flag in your submit file. For example, if you change the arguments line in your submit file to something like the following:

Arguments = --no-save --vanilla --args <arguments>

Then the contents of <arguments> will be passed in to the program as command-line arguments. The syntax for passing and handling these arguments differs depending on the statistics program in use.

Passing Arguments to R

To parse command-line arguments in R, use the following command in your R script:

args <- commandArgs(TRUE)

This puts the command-line arguments (the contents of <arguments>) into the variable args.

Script options

The condor_submit_util makes the task of running jobs using the batch servers easier and more intuitive.  condor_submit_util negotiates all job scheduling; it constructs the appropriate submit file for your job, and calls the condor_submit function. To use this utility you need a program to run. The format for using this script is:

condor_submit_util [OPTIONS]

In addition, the script can notify you when your job is done via email so you do not have to check the queue constantly using condor_q. In future releases, the script also will be able to keep usage data so administrators can track overall performance.

The script can be run in two ways, interactively or from the command line. When running interactively, the script prompts you for the values required to run the batch job. If you supply arguments on the command line, these arguments are used in addition to default values for any values you do not supply.

Options

  • -h, --help
    Print help page and exit.
  • -V, --version
    Print version information and exit.
  • -v, --verbose
    Show information about what goes on during script execution.
  • -I, --Interactive
    Enter interactive mode, in which the script prompts you for the required values.
  • -s, --submitfile FILE
    Specify the name of the created submit file (default is <user-name-datetime>.submit).
  • -k, --keep
    Do not delete the created submit file.
  • -N, --Notify
    Receive notification by email when jobs are complete.
  • -x, --executable FILE
    The executable for condor to run (default is /usr/bin/R).
  • -a, --arguments ARGS
    Any arguments you want to pass to the executable (should be quoted, default is "--no-save --vanilla").
  • -i, --input [FILE|PATT]
    Either an explicit file name or base name of input files to the executable (default is in).
  • -o, --output [PATT]
    Base name of output files for the executable (default is out).
  • -e, --error [PATT]
    Base name of error files for the executable (default is error).
  • -l, --log [PATT]
    Base name of log files for the executable (default is log).
  • -n, --iterations NUM
    Number of iterations to submit (default is 10).
  • -f, --force
    Overwrite any existing files.
  • --noinput
    Use no input file for executable.
  • --noargs
    Send no arguments to executable.

Examples

  1. You have a compiled executable (named foo) that takes a data set and does some analysis. You have five different data sets to run against (named data.0, data.1 ... data.4). You want to save the submit file and be notified when the job is done.

    condor_submit_util -x foo -i "data" -k -N
  2. You have an R program that has some random output. You want to run it 10 times to see the results.

    condor_submit_util -i random.R -n 10
  3. You have an R program that will take a long time to complete. You only need to run it once, but you want to be notified when it is done.

    condor_submit_util -i long.R -n 1 -N

Notes: For -o, -e, and -l, these options are considered base names for the implied files. The actual file names are created with a numerical extension tied to its condor process number (0 indexed). This means that if you execute condor_submit_util -o "out" -n 3, three output files named out.0, out.1, and out.2 are created.
Also, for -i, the script first checks to see if the name supplied is an actual file on disk, if not it uses the argument as a base name, similar to -o, -e, and -i.

    Option conventions

    For most condor_submit_util options, there are two conventions that you can use to specify that option on the command line:

    • The -<letter> convention - Use this simple convention as a short cut.

      For example, the simple option to receive email notification when your batch processing is complete is -N.

    • The --<term> convention - Use this lengthy convention to make it easy to determine what option you use.

      For example, the lengthy option to receive email notification when your batch processing is complete is --Notify.

    Both conventions for specifying an option perform the same function. For example, to receive email notification when your batch processing is complete, the options -N and --Notify perform the same function.

    Pattern Arguments

    For file-related options, such as the output file name or the error file name, you can use a pattern-matching argument. For example, if you specify the option -i "run", Condor looks for an input file with the name run. If there is no file named run, Condor looks for a file name that begins with run., such as run.14.

    If there are multiple files with names that begin with the pattern that you specify, then for the first execution within a cluster, Condor uses the file with the name that matches first in alphanumeric order. For successive executions within a cluster, Condor uses the files with names that match successively in alphanumeric order.

    Saving and Reusing a Submit File

    When you use condor_submit_util in command-line mode to submit a program for batch processing, include the option -k (keep) to save the submit file created by the utility.

    You can edit and reuse that submit file to submit similar programs to the Condor queue for batch processing. You also can include Condor macros to further improve the usability of the file. See the HTCondor documentation for detailed information about how to use Condor macros.

    For example, if you plan to submit several iterations of a program for batch processing, you can use a single submit file for all iterations. In that submit file, you use the $(PROCESS) macro to specify unique input, output, error, and log files for each iteration.

    Use of the $(PROCESS) macro requires that you develop a naming convention for files or subdirectories that includes the full range of process IDs for your iterations.

    To use an existing submit file when you submit a batch process, you cannot use the script and must execute the condor_submit command instead. Type the following:

    condor_submit my.submit

    Manual batch submit (only recommended for experienced users)

    You use the command condor_submit to submit batch processing manually to the Condor system.

    If you are new to batch processing please see the previous section using condor_submit_util

    In the RCE, you must include the attribute Universe = vanilla in every submit file. If you do not include this statement, Condor attempts to enable job-check pointing, which consumes the central manager resource.

    Perform the following to submit batch processing manually:

    1. Before you submit your program for batch processing, create a directory in which to run your submission, and then change to that directory. Make sure that you set permissions to enable the Condor software to read from and write to the directory and its contents.

      Also make sure that your program is batch ready.

    2. Create a submit file for your program.

      For information about how to create a submit file, see Submit file basics.

      Note: You can use the HMDC Automated Condor Submission script and include the -k option to create a submit file, and then edit and reuse that submit file for other submissions.

    3. Submit your program for batch processing.

      Type the following at the command prompt:

      condor_submit <submit file>

      Condor then checks the submit file for errors, creates the ClassAd object, and places that object in the queue for processing. New jobs are added to the end of the queue. For example:

      condor_submit myjob.submit

      Submitting job(s)..........
      Logging submit event(s)..........
      10 job(s) submitted to cluster 24.
    4. View your job queue to ensure that execution begins. 

      condor_q <username>
      For example:
      condor_q wharrell

    Submit file basics

    You send input to the Condor system using a submit file, which is a text file of <attribute> = <value> pairs. The naming convention for a submit file is <file name>.submit. Before you submit any batch processing, you first set up a directory in which to work, and create the executable script or program that you choose to submit for processing.

    Basic attributes used in the submit file include the following:

      • Universe - At HMDC you specify the vanilla universe, which supports serial job processing. HMDC does not support use of other Condor environments.

      • Executable - Type the name of your program. In the job ClassAd, this becomes the Cmd value. The default value in the RCE for this attribute is the R program.

      • Arguments - Include any arguments required as parameters for your program. When your program is executed, the Condor software issues the string assigned to this attribute as a command-line argument. In the RCE, the default arguments for the R program are --no-save and --vanilla.

      • Input - Type the name of the file or the base name of multiple files that contain inputs for your executable program.

      • Output - Type the name of the file or the base name of multiple files in which Condor can place the output from your batch job.

      • Log - Type the name of the file or the base name of multiple files in which Condor can record information about your job's execution.

      • Error - Type the name of the file or the base name of multiple files in which Condor can record errors from your job.

      • Queue - The command queue instructs the Condor system to submit one set of program, attributes, and input file for processing. You use this command one time for each input file that you choose to submit. (Note: this keyword should be specified without an equals sign, e.g. "Queue 10".)

      • Request_Cpus - The number of physical CPU cores required to run each instance of the job, as an integer.
      • Request_Memory - The amount of physical memory (RAM) required to run each instance of the job, in MB. Your job may also have access to additional system swap memory if available, but this value guarantees a minimum amount of available system main memory for your job.

    The full documentation of all available submit file options can be found at http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html

    An example submit file with the minimum required arguments is as follows:

    cat myjob.submit
    
    Universe = vanilla
    Executable = /usr/bin/R
    Arguments = --no-save --vanilla
    
    input = <program>.R
    output = out.$(Process)
    error = error.$(Process)
    Log = log.$(Process)
    
    Request_Cpus = 2
    Request_Memory = 4096
    Queue 10

    This file instructs Condor to execute ten R jobs using one input program (<program>.R) and to write unique output files to the current directory. Each job process will run with access to 2 CPU cores and 4GB of RAM.

    When you specify file-related attributes (executable, input, output, log, and error), either place those files within the directory from which you execute the Condor submission or include the relative path name of the files. 

    Managing your batch job

    Once you have submitted your job(s) to the queue, you have various ways of checking in on the status of your jobs including e-mail notification of job completion and command line access to both your jobs status and the current state of the pool.

    Managing Job Status

    You can monitor progress of your batch processing using the condor_status and condor_q commands. This section describes how to check the status of your processes at any time, and how to remove a process from the Condor queue.

    After you submit a job for processing, you can check the status of the Condor machine pool and verify that machines are available on which your jobs can execute.

    To check the status of the Condor pool, type the command condor_status. This command returns information about the pool resources. Output lists the number of slots available in the pool and whether they are in use. If there are no idle slots, your batch processing is queued when it is submitted.

    For example:

    > condor_status

    Name OpSys Arch State Activity LoadAv Mem ActvtyTime

    vm1@mc-1-1.hm LINUX X86_64 Claimed Busy 1.060 19750+17:43:50
    vm2@mc-1-1.hm LINUX X86_64 Claimed Busy 1.060 1975 0+17:43:48
    vm1@mc-1-2.hm LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:43
    vm2@mc-1-2.hm LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:36
    vm1@mc-1-3.hm LINUX X86_64 Unclaimed Idle 0.010 1975 0+00:03:57
    vm2@mc-1-3.hm LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04
    vm1@mc-1-4.hm LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04

    Total Owner Claimed Unclaimed Matched Preempting Backfill

    X86_64/LINUX 7 0 4 3 0 0 0
    Total 7 0 4 3 0 0 0

    To check the cumulative use of resources within in the Condor pool, include the option -submitter with the command condor_status. This command returns information about each user in the Condor queue. Output lists the user's name, machine in use, and current number of jobs per machine. Use this command to help determine how many resources Condor has available to run your jobs. An example is shown here:

    > condor_status -submitter

    Name Machine Running IdleJobs HeldJobs

    mkellerm@hmdc.harvar w4.hmdc.ha 2 0 0
    jgreiner@hmdc.harvar x1.hmdc.ha 9 0 0
    jgreiner@hmdc.harvar x3.hmdc.ha 40 0 0
    kquinn@hmdc.harvard. x5.hmdc.ha 32 0 0

    RunningJobs IdleJobs HeldJobs

    jgreiner@hmdc.harvar 49 0 0
    kquinn@hmdc.harvard. 32 0 0
    mkellerm@hmdc.harvar 2 0 0

    Total 83 0 0

    Cluster Status Summary

    To view a summary of the Condor cluster available resources, run: rce-info.shTo view a summary of the resources currently in use on the Condor cluster, run: rce-info.sh -t used

    Removing your job

    To remove a process from the queue, type the command condor_rm <cluster ID>.<process ID>. For example:

    > condor_rm 9.9
    Job 9.9 marked for removal

    To find a list of your jobs type:

    > condor_q  $USER


    To remove all jobs affiliated with a cluster, type the command condor_rm <cluster ID> . For example, the command condor_rm 4 removes all jobs assigned to cluster 4.

    To remove all of your clusters' jobs from the Condor queue, type condor_rm -a. For example:

    > condor_rm -a
    All jobs marked for removal.

    Jobs must be deleted from the host they were submitted from.

    When you run condor_q you may see multiple "Schedd" sections:

    -- Schedd: HMDC.rce@rce6-1.hmdc.harvard.edu
    -- Schedd: HMDC.rce@rce6-2.hmdc.harvard.edu
    -- Schedd: HMDC.rce@rce6-3.hmdc.harvard.edu

    Each of these sections represents a different RCE Login server.

    When you submit a job, the server you are logged in to is responsible for "scheduling" that job and keeping track of its status.

    Each RCE Login server maintains this status separately, so when you want to remove a job, you must also specify the server where you started it.

    The full syntax to remove a job is thus:

      condor_rm <cluster ID>[.<process ID>] -name <schedd_string>

    e.g.
      condor_rm 4806 -name "HMDC.rce@rce6-4.hmdc.harvard.edu"

    Other Batch Examples

    We created the condor_submit_util script to automate the process of writing a submit file and submitting a cluster of jobs to the Condor queue. When you execute this script, you can include all arguments on the command line. Or, you can execute the script in interactive mode and be prompted for your submit file attributes.

    The default settings for the Automated Condor Submission script support creation of submit files for programs that are written in the R language. To submit another type of program to the Condor queue, such as an Octave program, specify the full path and program for the executable (in this example, Octave). You then define your program file as the input to the executable.

    Note: To use the condor_submit_util script, you must have an RCE account.  See for more information.

    The following are example uses of the condor_submit_util script and options to submit batch processing in the RCE. A complete description of options is provided in .

    Example Using Multiple Input Files

    Start with an executable program (named foo) that uses a set of input data files (named data0 - data4) and does some analysis.

    To save the submit file and receive notification when processing is done, type the following command:

    > condor_submit_util -x foo -i "data" -k -N

    The submit file for this batch looks like this:

    Universe = vanilla
    Executable = /usr/bin/foo
    Arguments = --no-save --vanilla
    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = out.$(Process)
    Notification = Complete

    input = data.$(Process)
    output = out.$(Process)
    error = err.$(Process)
    Log = log.$(Process)
    Queue 5

    Example Using Multiple Iterations of One Executable Program

    An R program (named random.R) produces random output.

    To execute this program eight times and place the output of each execution in separate files in your default working directory, type the following command:

    > condor_submit_util -i random.R -n 8 -o "outrun"

    Following is the submit file for this batch:

    Universe = vanilla
    Executable = /usr/bin/R
    Arguments = --no-save --vanilla
    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = outrun.$(Process)

    input = random.R
    output = outrun.$(Process)
    error = error.$(Process)
    Log = log.$(Process)
    Queue 8

    Example Checking Process Status

    To check the status of the Condor queue after submitting your program for processing, type:

    > condor_q

    -- Submitter: x1.hmdc.harvard.edu : <10.0.0.47:60603> : x1.hmdc.harvard.edu
    ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
    24.4 mcox8/18 16:35 0+00:00:01 R 0 0.0 R --no-save--vani
    24.5 mcox8/18 16:35 0+00:00:00 R 0 0.0 R --no-save--vani
    24.6 mcox8/18 16:35 0+00:00:00 R 0 0.0 R --no-save--vani
    24.7 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani
    24.8 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani
    24.9 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani

    6 jobs; 3 idle, 3 running, 0 held

    The column ID lists the process IDs for your jobs. The column ST lists the status of each job in the Condor queue. A value of R indicates that the job is running. Valid status values are listed in .

    Submit a batch job from a RCE Powered (Interactive) job

    If you try to submit a batch job from within an RCE Powered (Interactive) job, you will encounter this error:

    ERROR: Can't find address of local schedd

    To work around this, you must include the name of the batch scheduler as an argument to condor_submit.

    To submit a batch job from RCE Powered shell (or from within any RCE Powered application) you need to run this command to get the batch scheduler IP address and port number:

    COLLECTOR=`condor_status -collector -autoformat Machine` ; condor_status -schedd -constraint Machine==\"$COLLECTOR\" -autoformat ScheddIpAddr

    COLLECTOR=`condor_status -collector -autoformat Machine` ; condor_status -schedd -constraint Machine==\"$COLLECTOR\" -autoformat ScheddIpAddr

    The output will look something like this <10.0.0.32:12345>  Take this and add it as an argument to the condor_submit script's "-name" parameter:

    condor_submit -name '<10.0.0.32:12345>' my-job.submit

     For more information on using condor_submit, please see our documentation on Running Batch Jobs.

    Troubleshooting Problems

    The Condor central manager stops (evicts or preempts) a process for several reasons, including the following:

    • Another job or another user's job in the queue has a higher priority and preempts or evicts your job.

    • The pool machine on which your process is executed encounters an issue with the machine state or the machine policy.

    • You specified attributes in your submit file that cannot process without error.

    Refer to the Condor manual for detailed information about submission, job status, and processing errors:

    http://research.cs.wisc.edu/htcondor/manual/latest/2_Users_Manual.html

    Note: A simple action can help you to diagnose problems if you submit multiple jobs to Condor. Be sure to specify unique file names for each job's output, history, error, and log files. If you do not specify unique file names for each submission, Condor overwrites existing files that have the same names. This can prevent you from locating information about problems that might occur.

    Priorities and Preemption

    Job priorities enable you to assign a priority level to each submitted Condor job. Job priorities, however, do not impact user priorities.

    User priorities are linked to the allocation of Condor resources based upon a user's priority. A lower numerical value for user priority means higher priority, so a user with priority 5 is allocated more resources than a user with priority 50. You can view user priorities by using the condor_userprio command. For example:

    > condor_userprio -allusers

    Condor continuously calculates the share of available machines. For example, a user with a priority of 10 is allocated twice as many machines as a user with a priority of 20. New users begin with a priority of 0.5 and, based upon increased usage, their priority rating rises proportionately in relation to other users. Condor enforces this function such that each user gets a fair share of machines according to user priority and historical volume. For example, if a low-priority user is using all available machines and a higher-priority user submits a job, Condor immediately performs a checkpoint and vacates the jobs that belong to the lower-priority user, except for that user's last job.

    User priority rating decreases over time and returns to a baseline of 0.5 as jobs are completed and idle time is realized relative to other users.

    Process Tracking

    To track progress of your processes:

    • Type condor_q to view the status of your process IDs.

    • Check your output directory for the time stamps of your output, log, and error files.

      If the output file and log file for a submitted process are more current than the error file, your process probably is running without error.

    Process Queue

    To view detailed information about your processes, including the ClassAd requirements for your jobs, type the command condor_q -analyze.

    Refer to the Condor Version 6.8.0 Manual for a description of the value that represents why a process was placed on hold or evicted. Go to the following URL for section 2.5, "Submitting a Job," and search for the text JobStatus under the heading "ClassAd Job Attributes":

    http://www.cs.wisc.edu/condor/manual/v6.8.0/2_5Submitting_Job.html

    For example:

    > condor_q -analyze
    Run analysis summary. Of 43 machines, 
    43 are rejected by your job's requirements
    0 are available to run your job
    WARNING: Be advised:
    No resources matched request's constraints
    Check the Requirements expression below:
    Requirements = ((Memory > 8192)) && (Disk >= DiskUsage)

    Error Log

    An error file includes information about any errors occurred when your batch processing executed.

    To view the error file for a process and determine where an error occurred, use the cat command. For example:

    > cat errorfile
    Error in readChar(con, 5) : cannot open the connection
    In addition: Warning message:
    cannot open compressed file 'Utilization1.RData'
    Execution halted

    History File

    When batch processing completes, Condor removes the cluster from the queue and records information about the processes in the history file. History is displayed for each process on a single line. Information provided includes the following:

    • ID - The cluster and process IDs of the job

    • OWNER - The owner of the job

    • SUBMITTED - The month, day, hour, and minute at which the job was submitted to the queue

    • CPU_USAGE - Remote user central processing unit (CPU) time accumulated by the job to date, in days, hours, minutes, and seconds

    • ST - Completion status of the job, where C is completed and X is removed

    • COMPLETED - Time at which the job was completed

    • CMD - Name of the executable

    To view information about processes that you executed on the Condor system, type the command condor_history. For example:

    > condor_history
    IDOWNER SUBMITTED RUN_TIME ST COMPLETED CMD
    1.0 arose 9/26 11:45 0+00:00:00 C 9/26 11:45 /usr/bin/R --no
    2.0 arose 9/26 11:48 0+00:00:01 C 9/26 11:48 /usr/bin/R --no
    3.0 arose 9/26 11:49 0+00:00:00 C 9/26 11:50 /usr/bin/R --no
    3.1 arose 9/26 11:49 0+00:00:01 C 9/26 11:50 /usr/bin/R --no
    6.0 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.1 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.2 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.5 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.3 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.4 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.6 arose 10/3 15:52 0+00:00:01 C 10/3 15:52 /nfs/fs1/home/A
    9.0 arose 10/4 11:02 0+00:00:00 C 10/4 11:02 /nfs/fs1/home/A
    9.1 arose 10/4 11:02 0+00:00:01 C 10/4 11:02 /nfs/fs1/home/A
    9.2 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.3 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.5 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.6 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.4 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A

    Search through the history file for your process and cluster IDs to locate information about your jobs.

    To view information about all completed processes in a cluster, type the command condor_history <cluster ID>. To view information about one process, type the command condor_history <cluster ID>.<process ID>. For example:

    > condor_history 9
    IDOWNER SUBMITTED RUN_TIME ST COMPLETED CMD
    9.0 arose 10/4 11:02 0+00:00:00 C 10/4 11:02 /nfs/fs1/home/A
    9.1 arose 10/4 11:02 0+00:00:01 C 10/4 11:02 /nfs/fs1/home/A
    9.2 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.3 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.5 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.6 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.4 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A

    Process Log File

    A log file includes information about everything that occurred during your cluster processing: when it was submitted, when execution began and ended, when a process was restarted, if there were any issues. When processing finishes, the exit conditions for that process are noted in the log file.

    Refer to the Condor Manual for a description of the entries in the process log file. Go to the following URL for section 2.6, "Managing a Job," and go to subsection 2.6.6, "In the log file":

    http://research.cs.wisc.edu/htcondor/manual/latest/2_6Managing_Job.html

    To view the log file for a process and determine where an error occurred, use the cat command. For example, the following log file indicates that the process completed normally:

    > cat log.1
    000 (012.001.000) 10/04 12:14:51 Job submitted from host: <10.0.0.47:60603>
    ...
    001 (012.001.000) 10/04 12:15:00 Job executing on host: <10.0.0.61:37097>
    ...
    005 (012.001.000) 10/04 12:15:00 Job terminated.
    (1) Normal termination (return value 0)
    Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
    Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
    Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
    Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
    7 - Run Bytes Sent By Job
    163 - Run Bytes Received By Job
    7 - Total Bytes Sent By Job
    163 - Total Bytes Received By Job
    ...

    Following is an example log file for a process that did not complete execution:

    > cat log.4
    000 (09.000.000) 09/20 14:47:31 Job submitted from host:
    <x1.hmdc.harvard.edu>
    ...
    007 (09.000.000) 09/20 15:02:10 Shadow exception!
    Error from starter on x1.hmdc.harvard.edu: Failed
    to open 'scratch.1/frieda/workspace/v67/condor-
    test/test3/run_0/b.input' as standard input: No such
    file or directory (errno 2)
    0 - Run Bytes Sent By Job
    0 - Run Bytes Received By Job
    ...

    Held Process

    To view information about processes that Condor placed on hold, type condor_q -hold. For example:

    > condor_q -hold
    
    -- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> : vnc.hmdc.harvard.edu
     ID OWNER HELD_SINCEHOLD_REASON
     17.0 arose 10/5 12:53via condor_hold (by user arose)
     17.1 arose 10/5 12:53via condor_hold (by user arose)
     17.2 arose 10/5 12:53via condor_hold (by user arose)
     17.3 arose 10/5 12:53via condor_hold (by user arose)
     17.4 arose 10/5 12:53via condor_hold (by user arose)
     17.5 arose 10/5 12:53via condor_hold (by user arose)
     17.6 arose 10/5 12:53via condor_hold (by user arose)
     17.7 arose 10/5 12:53via condor_hold (by user arose)
     17.9 arose 10/5 12:53via condor_hold (by user arose)
    
    9 jobs; 0 idle, 0 running, 9 held

    Refer to the Condor Manual for a description of the value that represents why a process was placed on hold. Go to the following URL for section 2.5, "Submitting a Job," and look for subsection 2.5.2.2, "ClassAd Job Attributes." Look for the entry HoldReasonCode:

    http://research.cs.wisc.edu/htcondor/manual/latest/2_5Submitting_Job.html

    To place a process on hold, type the command condor_hold <cluster ID>.<process ID>. For example:

    > condor_hold 8.33
    Job 8.33 held

    To place on hold any processes not completed in a full cluster, type condor_hold <cluster ID>. For example:

    > condor_hold 8
    Cluster 8 held.

    The status of those uncompleted processes in cluster 8 is now H (on hold):

    > condor_q
    
    -- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> vnc.hmdc.harvard.edu
     ID OWNER SUBMITTED RUN_TIME STPRISIZECMD
     8.2 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
     8.5 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
     8.6 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
    
    3 jobs; 0 idle, 0 running, 3 held

    To release a process from hold, type the command condor_release <cluster ID>.<process ID>. For example:

    > condor_release 8.33
    Job 8.33 released.

    To release the full cluster from hold, type the command condor_release <cluster ID>. For example:

    > condor_release 8
    Cluster 8 released.

    You can instruct the Condor system to place your batch processing on hold if it spends a specified amount of time suspended (that is, not processing). For example, include the following attribute in your submit file to place your jobs on hold if they spends more than 50 percent of their time suspended:

    Periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime /2.0)

    Running Interactive Jobs

    Interactive Servers are intended for large processes that are memory intensive. If you have a group of dozens or hundreds of jobs, or jobs that will run for hours, days, or longer, please use the Batch Servers.

    Learn how to submit an interactive job

    Read http://hmdc.github.io/rce-interactive-tools/gui/using_the_rce_gui_client.html#submitting-an-rce-powered-job

    Working with RCE Powered Statistical Applications

    The Applications menu in the RCE includes statistical applications in the RCE Powered Applications folder. Apps in the RCE Powered Applications menu are executed on the RCE cluster servers dedicated to handling interactive applications, also known as execute (EXEC) servers. 

    22launchingr_1.png

    Running Custom Interactive Jobs

    Running Custom Interactive CLI Jobs

    If you are using a command-line SSH connection to the RCE, you can still launch and use RCE Interactive Applications.

    Launch your application in a disconnectible terminal emulator, and you will be able to access it after it is started asynchronously by the job scheduler.

    1. Create a condor submit file to specify the application you want to run.
      •  In your submit file, wrap your command in a disconnectible terminal. We recommend either tmux or screen.
      • You can create this file manually, or with condor_submit_util.
      • Refer to Running Batch Jobs for more details.
      • For Example:
        • condor_submit_util -I --noinput --nosubmit --keep --submitfile myapp.submit -x /usr/bin/screen -a '-S myterminal /usr/bin/R' -m 2048 -c 1 -d myapp_outdir -n 1
           

          cat myapp.submit

          Universe        = vanilla
          Executable      = /usr/bin/screen
          Arguments       = -S myterminal /usr/bin/R
          request_memory  = 2048
          request_cpus    = 1
          transfer_executable = false
          should_transfer_files = NO
          output  = /nfs/home/W/whorka/myapp_outdir/out
          error   = /nfs/home/W/whorka/myapp_outdir/error
          Log     = /nfs/home/W/whorka/myapp_outdir/log
          +UsingCondorSubmitUtil = true
          Queue   1

    2. Submit the job
      condor_submit myapp.submit
      Submitting job(s).
      1 job(s) submitted to cluster 38965.

    3. Connect to the server where your job is running.
      condor_ssh_to_job_util

    4. Connect to the disconnected terminal.
      screen -d -r myterminal

    5. Interact with your command-line application as normal.
      • The hotkey for your disconnectible terminal emulator will be interpreted by the terminal emulator and not the application. For screen, this is ^A (Ctrl-A) by default, and for tmux this is ^B (Ctrl-B) by default.

    6. If you disconnect, follow steps 3-4 to reconnect.

    7. When you are finished running your application, exit the application, and then exit from the connection to the server where your job is running.
      • Once you exit the "control" connection, your job will terminate.

    If you find this feature useful, or this process too cumbersome, please contact us regarding future process improvements.

    Statistical Applications

    The following statistical applications are available on the RCE:

    Unavailable software:

    For in-depth statistical and application-related questions, including workshop schedules and links to statistical research resources, please contact:

    For help using the Harvard Dataverse network, please see the Dataverse Network Guide

    Installing Modules

    Many programming languages and applications have extensions (aka add-ons, modules or packages) that can be installed in a users home directory.   See the list to the left of applications that allow users to install add-ons.

    Python Modules & Environments

    Using Conda

    Conda is a package manager application that quickly installs, runs, and updates packages and their dependencies. It can query and search the package index and current installation, and install and update packages into existing conda environments. Conda is only aware of a subset of Python packages, and is not meant as a replacement to pip.

    A virtual environment is a named, isolated, working copy of Python that that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Conda is also an environment manager application. For example, you may have one environment with NumPy 1.7 and its dependencies, and another environment with NumPy 1.6 for legacy testing. If you change one environment, your other environments are not affected. You can easily activate or deactivate (switch between) these environments.

    For example, to create an Python 3.6 environment with the Pandas version of 0.19 and its dependencies, open the Anaconda shell from the RCE applications menu, then issue these commands:

    conda create --name mypandas019 python=3.6 pandas=0.19
    source activate mypandas019

    You can find full documentation for using Conda at https://conda.io/docs/using/envs.html.

    We also have additional documentation for using Anaconda Python in the RCE.

    If you install a module using conda, you can make it available to a limited set of conda environments if you so choose. If you install a module with pip, it automatically becomes available in all conda environments [for that version of Python].

    Using pip

    As an RCE user, you have the ability to install Python modules locally to your home directory and use them in your projects.

    1. Determine the required python version: You need to determine which version of Python you'd like to develop with. Currently, Python 2.7 is available via Anaconda 2, and Python 3.6 is available via Anaconda 3.
    2. Load the appropriate Anaconda environment: Open an Anaconda Shell via the RCE Powered Applications menu. Also see Working With Anaconda Python.
    3. Search for the Python module: Each version of Python installed on the RCE maintains its own module path. Determine whether a Python module is installed for a specific version of Python by using pip, to list packages installed for a desired version.
      pip list | grep $MODULE
      Example: pip list | grep simplejson
    4. Install your module: If your module is installed for the Python version you need, you're done. If the module is not installed install the module locally to your home directory.
      pip install $MODULE --user
      Example: pip install simplejson --user
    5. Can't find your module? If you're unable to locate your module using pip, maybe you're searching for the wrong module name. If you've decided you needed to install a module because, for example, import simplejson, did not work from a Python interactive console, you may have the wrong name. Often Python class names differ from Python module names. Try using the pip search feature.
    6. Still not found? Try searching the PyPI repository, the official Python module repository. Use google. Very rarely, some modules require that you manually compile Python packages using setup.py.
    7. Need help? Open a ticket by sending an email to support@help.hmdc.harvard.edu.

    Installing modules from source code

    Here's an example of installing the Python module 'rtree' by compiling it from source.

    The library libspatialindex 1.7 is required for package 'rtree'. Only version 1.6 is available for CentOS 6, unfortunately. You can try building the package in your home directory.

    1. wget http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz
    2. tar -xzf spatialindex-src-1.8.5.tar.gz
    3. cd spatialindex-src-1.8.5
    4. ./configure --prefix=~/
    5. make
    6. make install
    7. set the environmental variables:
      1. export SPATIALINDEX_LIBRARY=~/lib/libspatialindex.so
      2. export SPATIALINDEX_C_LIBRARY=~/lib/libspatialindex_c.so
    8. pip install git+https://github.com/Toblerity/rtree.git --user

    Python has trouble finding the libspatialindex library and this is a known bug for which there was a specific fix implemented. You can find more on the discussion here: https://github.com/Toblerity/rtree/issues/56.

    Unfortunately, the details of how to use the fix are poorly documented. The name of the environmental variable looks like it changed between the discussion, where it's referred to as SPATIALINDEX_LIBRARY_PATH; however, the sequence of commands above seem to work.

    R

    Installing an R Package

     

    The RCE provides almost all stable libraries maintained in the Comprehensive R Archive Network (CRAN), and others. For a full list refer to "Which R packages are available?"

    If you would like to install a library separately for your own personal use, follow these instructions:

    1. In R, type library(<package_name>).

      For example, to install R Commander, type the following:

      > library(Rcmdr)

      R prompts you with a warning if the package that you chose to install uses other packages that are not installed already.

    2. To install missing packages on which your target package depends:

      1. Click Yes to continue. The Install Missing Packages window is displayed.

      2. Click OK to continue. R prompts you to select a mirror site from which to download the packages' sources.

    3. Select a site from which to download the sources, and then click OK.

      The dependent packages and your target package are now installed. If it is an executable, the function is executed.

    Alternatively, you can install packages from within R like this:

    install.packages("package_name")

    If the R package fails to compile, you may need a newer version of GCC. See this page for using updated versions of developer tools.

    Stata

    Start Stata on the RCE by going to Applications->RCE Powered Applications->RCE Powered Stata.

    Once stata is running, you can do one of the following:

    • Install command via ssc, by submitting the following command:

    ssc install <package-name>

    eg. ssc install outreg

    For more information on ssc you can check out this Stata help page

    • You can also install from a 3rd party:

    net install <package-name>, from(SOME.SITE.EDU/package-name) replace

    For example to install a package named rdrobust, you can submit:

    net install rdrobust, from(http://www-personal.umich.edu/~cattaneo/rdrobust/stata) replace

     

    Storing Anaconda files in a project space

    By default, Anaconda environments and their packages reside in ~/.conda in your home folder, and thus count against the limited 2.5GB quota on users' home folders.

    Before you go down the path of setting up a virtual environment and populating it with packages, you may want to do something like this, to create a .conda folder in a project space, and then redirect ~/.conda to there instead:

    Make a new .conda folder inside a project space:
    cd ~/shared_space/nameoftheprojectspace
    mkdir .conda

    Then, create a symlink from your home folder to the new location:
    cd ~
    ln -s ~/shared_space/nameoftheprojectspace/.conda ~/.conda

    This should let you then create an environment as described inhttps://rce-docs.hmdc.harvard.edu/book/anaconda-python and populate it with the packages you need. if you run out of room to hold your packages, send a request to us to scale up the size of project space.

    Developing Software

     The RCE provides many ways to develop and test your own code, using common languages, editors and source code utilities.

    Anaconda Python

    Overview

    We offer Python 2.7 and 3.6 through the Anaconda environment manager. The benefit of using the Anaconda environment is that it is built with data science in mind: popular Python modules are already included in the environment, module versions are maintained by Anaconda for compatibility, and researchers can install additional modules to their home directory at any time. You can also create your own Python environments using Conda.

    You can read about Anaconda and Conda at https://docs.continuum.io/anaconda/#anaconda-navigator-or-conda.

    Anaconda Terminology

    • Using the "Shell" opens a new command-line environment (CLI) where the selected version of Anaconda (2 or 3) becomes the default Python environment. For example, if Anaconda3 is selected, running "python" invokes Python 3.6, and using "pip" installs modules for Python 3.6.
    • The "Navigator" is a desktop graphical user interface (GUI) that allows you to launch applications and easily manage conda packages, environments and channels without using command-line commands.

    Running Anaconda

    Note: The Anaconda GUI and CLI are available only on RCE exec nodes, and cannot be run on the login node.

    There are several ways of invoking Anaconda:

    1. Launch an RCE Powered Shell
    2. Run "anaconda3-shell"
    3. You will get a bash shell configured so that when you run "python" it executes Anaconda Python 3.6 with the Anaconda libraries.

    -or-

    1. Submit a batch job with Executable and Arguments as follows:
      - Executable = /usr/local/bin/python3
      - Arguments = script.py
    2. Have your script executed by Anaconda Python 3.6 with the Anaconda libraries.

    -or-

    Some combination of the above, like opening an RCE Powered Shell, and running a python script as:

    python3 ~/script.py

    The Anaconda 2 versions are:

    • anaconda2-shell
    • python2

    Tip: Using Anaconda via SSH

    1. Log in to a RCE desktop session
    2. Select Applications > RCE Powered Applications > Anaconda Shell
      1. Enter your desired CPU and RAM
    3. Get the Condor job ID
      1. The job ID is listed in the top left-hand side of the window toolbar
        -or-
      2. Select Applications > System Tools > Terminal
        1. Run condor_q <username>
    4. Close your RCE desktop session (e.g. close your browser)
    5. SSH into the RCE from your local computer
      1. Execute condor_ssh_to_job <job_id>
      2. You should be within your chosen Anaconda environment

    Creating R Modules

    Building R modules in the RCE

    The IQSS Data Science team is putting together the finishing touches on a new R package build system using Jenkins CI platform and GitHub. Check back for more information on the new Rbuild platform.

    Programming Languages

    Common programming languages available in the RCE include:

    • C, C++
    • Java
    • Perl (5.10, 5.16)
    • Python (2.6, 2.7, 3.6)
    • R
    • Ruby
    • Shell

    Programming Tools and Utilities

    Code/text editors available:

    • Emacs
    • Eclipse
    • Gedit
    • Bluefish
    • Kwrite
    • Vim

    Tools to interact with a number of well-known source code repositories:

    • git (to interface with GitHubGitorious or private git repo's)
    • Subversion (svn)
    • CVS

    Using Current Development Tools

    The RCE is built with stability in mind. If you need a newer version of GCC or similar development tools, we offer Devtoolset via Software Collections. You can enable the tools from a Terminal:

    scl enable devtoolset-4 bash

    If you need these tools available on the cluster (e.g. to compile an R package) start an RCE Shell from the Applications > RCE Powered Applications menu. From there, enable the devtoolset as above, then call the appropriate statistical application (e.g. R, xstata-mp, etc.). When you've finished, type exit.

    For a full list of updated packages provided by devtoolset-*, please see http://mirror.centos.org/centos/6/sclo/x86_64/rh/devtoolset-4/.

    Below are a couple of examples of using the Software Collections Developer Toolset:

     

    Install R package xgboost

    Installing "xgboost" requires compilation using a newer version of GCC than is supported by default on the RCE. However, you can enable the software collections developer tools to use a newer version of GCC.

    - In ~/.R/, create a file named Makevars
    with these contents:
    CXX14 = g++ -std=c++1y
    CXX14FLAGS += -fPIC

    - Start an RCE Powered Shell and enter the following
    scl enable devtoolset-4 bash
    R #or rstudio - these both work
    chooseCRANmirror(81)
    # I pick 72 here but any mirror should work
    install.packages("xgboost")

    -----
    Install R package lme4

    Installing "lme4" requires compilation using a newer version of GCC than is supported by default on the RCE. However, you can enable the software collections developer tools to use a newer version of GCC.

    -Start an RCE powered shell
    scl enable devtoolset-4 bash
    R

    chooseCRANmirror(81)
    # I pick 72 here but any mirror should work

    install.packages("minqa")

    packageurl <- "https://cran.r-project.org/src/contrib/Archive/nloptr/nloptr_1.2.1.tar.gz"; install.packages(packageurl, repos=NULL, type="source");

    install.packages("lme4")
    library("lme4")

     

    Outage Notification

    Outages

    We strive to provide advance notice whenever we must schedule an interruption of service; in addition, we post announcements on our web site in the event of an unplanned service outage. We recommend that you use at least one of the channels of communication described in this section so that you are not caught unaware by a service outage.

    When you receive notification that an interruption of service will take place, please be sure to save all work and disconnect from all of our login servers at least 30 minutes before the start of the scheduled outage window. The outage notification specifies which managed services will be affected.

    If you have questions concerning an interruption in service, please contact us.

    Mailing List

    You can sign up to receive outage announcements by email.

    Calendar

    Notifications also posted on our website calendar. If you have an RSS reader you can import the feed, or add the Ical feed to your calendar (e.g. Google calendar).

    RCE Outage Notifier

    A Gnome panel widget displays outages for researchers logged in via NX (NoMachine). The icon can be left-clicked to display all pop-up notifications at any time, or right-clicked to open the outages calendar on this website.

    outages_gnome_01.png outages_gnome_02.png

    If you are logged in via SSH, you can display all outages by using the command outages.

    outages_ssh_01.png

    Research Database Hosting

    Requesting

    If your research project requires database hosting, please contact HMDC Support at support@help.hmdc.harvard.edu.

    Hosting

    HMDC runs a MariaDB database for our researchers. MariaDB is compatible with MySQL. For technical differences, please see: MariaDB versus MySQL.

    Connecting

    Once your database and account has been setup, you may connect via RCE:

    [user@rce ~] mysql -h mariadb-1.priv.hmdc.harvard.edu -u username -p database

    Make sure to replace username and database. Your database will be assigned a host, so it may be different than what's here. You will be prompted to enter your password before you're connected.

    If you don't want to type out the entire command each time you connect, you can create a config file with your credentials. If you do the following, do not skip changing the file permissions, which ensure security.

    1. Create a file named .my.cnf in your home directory; note that it's a hidden file. You can do this from a terminal or using an application such as gedit.

    2. Enter the following, replacing appropriate values:
      [mysql]
      host=mariadb-1.priv.hmdc.harvard.edu
      user=your_username
      password=your_password
      database=your_database

    3. Save and close the file.

    4. Make the permissions '400'; from a terminal, execute: chmod 400 ~/.my.cnf

    Once you have created this file with the proper permissions, you can connect to your database by using the command: mysql

    Data Manipulation

    Importing

    If you have a .sql file, upload it to your project directory on the RCE (or your home directory). The following instructions assume you have a .my.cnf file, otherwise you'll need to use the full command (see above) to connect to your database.

    [user@rce ~] mysql < /path/to/my_data.sql

    Exporting

    Saving Query Results

    We do not grant FILE permissions to our users, so if you're attempting to use SELECT INTO OUTFILE, this won't work. There are however multiple ways to export data or collect query results. (Note: these instructions assume you have a .my.cnf file.)

    Replace the query with your own in this example:

    [user@rce ~] mysql -Be 'SELECT * FROM table_name;' > /path/to/results.tsv

    This creates a tab deliminated file. If you want a csv, you can pipe it through sed:

    [user@rce ~] mysql -Be 'SELECT * FROM table_name;' | sed 's/\t/,/g' > /path/to/results.csv

    This has some caveats, especially if your data already has commas. If that's the case, you should export it as tsv and manipulate the deliminators appropriate to the system processing your data.

    Full Database Dump

    You can use mysqldump to export your entire database at once.

    [user@rce ~] mysqldump -h mariadb-1.priv.hmdc.harvard.edu -u username -p database > /path/to/dump.sql

    Make sure to replace username and database. as well as host, if applicable.

    You can use your .my.cnf for this functionality as well. Enter these lines after the [mysql] section. Note that database is not defined in this section.

    [mysqldump]
    host=mariadb-1.priv.hmdc.harvard.edu
    user=your_username
    password=your_password

    Now your command will look like this: [user@rce ~] mysqldump your_database > /path/to/dump.sql