Analyses on the University of Iowa Argon HPC require submission using a job script file. Technical details for the contents of the job script file can be found at: https://wiki.uiowa.edu/display/hpcdocs/Basic+Job+Submission. This page provides some basic details on the options for these files for High throughput jobs.
High throughput jobs consist of jobs that run fairly short scripts multiple times. In my line of work, this often means replications of a simulation study.
The example in this page is a simple item response theory simulation study to examine the accuracy of model parameter recovery with a sample size of 100. For each simulation replication, we will generate data from and estimate parameters of a unidimensional two-parameter normal ogive item response theory model. The estimation method will be Markov Chain Monte Carlo using the JAGS program. Each replication of this analysis won’t take long on most computers, but we will want to run multiple replications, necessitating the use of the HPC. For each replication analysis, we will estimate of four separate Markov chains, which we will run in parallel on four separate cores.
The files for this example can be found at:
- Simulation R Script for Single Replication: https://jonathantemplin.com/wp-content/uploads/2020/04/IRTsimulationRep.r
- Simulation Job Script: https://jonathantemplin.com/wp-content/uploads/2020/04/Simulation.job
- Simulation Results Aggregation R Script: https://jonathantemplin.com/wp-content/uploads/2020/04/IRTsimulationResults.r
- Simulation Results Aggregation Job Script: https://jonathantemplin.com/wp-content/uploads/2020/04/Results.job
This analysis requires two phases:
- Running a high throughput array job (a job that runs multiple times) that conducts the simulation replications. Each replication outputs replication-specific results.
- Running a high performance job that aggregates all replication-specific results.
The simulation job script is slightly different from those used in high performance analyses as it uses an array. The script text is:
#!/bin/bash #####Set Scheduler Configuration Directives##### #Name the job: #$ -N IRT-Simulation #Send e-mail at beginning/end/suspension of job #$ -m bes #E-mail address to send to #$ -M PUT-YOUR-EMAIL-ADDRESS-HERE@uiowa.edu #Start script in current working directory #$ -cwd #####End Set Scheduler Configuration Directives##### #####Resource Selection Directives##### #See the HPC wiki for complete resource information: https://wiki.uiowa.edu/display/hpcdocs/Argon+Cluster #Select the queue to run in #$ -q all.q #Request four cores #$ -pe smp 4 #####End Resource Selection Directives##### module load R module load jags Rscript IRTsimulationRep.R $SGE_TASK_ID
Job Script Syntax Formatting
In the job script file, the following distinctions are made:
- Lines beginning with # are comments and are not interpreted
- Lines beginning with #$ are called directives and tell the scheduler program the specs under which to run the analysis
- Lines beginning with #! tell the scheduler which Linux shell to use
- Lines beginning with none of the above are Linux commands that will be run to execute the analysis
Details about the commands in the job script file are provided below:
Linux Shell Selection
Sets the Linux shell. If you are unfamiliar with shells, using Bash is probably best, so leave this as-is. More information on shells can be found at: https://en.wikipedia.org/wiki/Unix_shell.
Scheduler Configuration Directives
Scheduler configuration directives give details about the analysis and specifics about how the scheduler will provide information once the job is running or has finished running.
#Name the job: #$ -N IRT-Simulation
Sets the name of the job. Useful for high performance jobs that take a long time as any notification or listing of the job will include the name. This can be used to differentiate which jobs are running and which are stopped.
#Send e-mail at beginning/end/suspension of job #$ -m bes
Tells the scheduler to send an email notification at the beginning, end, and suspension of a job run. Change “bes” to which you prefer by deleting letters (i.e., “s” only sends email notifications if jobs have been suspended–typically meaning canceled).
#E-mail address to send to #$ -M PUT-YOUR-EMAIL-HERE@uiowa.edu
Defines the email address where notifications are to be sent. Be sure to put your email address here.
#Start script in current working directory #$ -cwd
Tells the scheduler to start the script in the directory where the script was submitted from. This is useful to not have to specify the full path of analysis files in the script (see the final commands below).
Resource Selection Directives
Resource selection directives instruct the scheduler to request specific types of computational resources to run the job. These resources are the types of machines that are used for the analysis.
#Select the queue to run in #$ -q all.q
Selects the queue where the job will be run. A queue is a set of jobs that are run with under specific policies. Users will only have access to a few queues.
The list of queues at the University of Iowa are shown here: https://wiki.uiowa.edu/display/hpcdocs/Queues+and+Policies.
The all.q is the queue that will enable the quickest access for running high throughput jobs. As noted by the Argon HPC Documentation:
This queue encompasses all of the nodes and contains all of the available job slots. It is available to everyone with an account and there are no running job limits. However, it is a low priority queue instance on the same nodes as the higher priority investor and UI queue instances. The all.q queue is subordinate to these other queues and jobs running in it will give up the nodes they are running on when jobs in the high priority queues need them. The term we use for this is “job eviction”. Jobs running in the all.q queue are the only ones subject to this.https://wiki.uiowa.edu/display/hpcdocs/Queues+and+Policies (The all.q queue section)
Queue selection can be made simple:
- Use the UI queue for high performance jobs that may take some time to run. The UI queue ensures the job will be able to finish.
- Use the all.q queue for high throughput jobs (such as simulation replications) that can be “evicted” and re-run later. The all.q queue ensures more jobs can be run at once.
#Request four cores #$ -pe smp 4
This directs the scheduler to run the analysis on a machine with four cores. A core is a processing unit. Here, the use of four cores ensures we can run each of our four Markov chains for our analysis in parallel.
For high throughput jobs, this line is often omitted.
module load R module load jags Rscript IRTsimulationRep.R $SGE_TASK_ID
The analysis syntax portion includes commands that can be run in the terminal window separately. The commands listed here are the following:
module load R
Loads R so that it can be run in Argon.
module load jags
Loads JAGS so it can be run in Argon.
Modules are programs that are installed in Argon that can be used for analyses. To see which modules are available, in a terminal window in Argon, run the command:
Finally, we run our analysis file using the command:
Rscript IRTsimulationRep.R $SGE_TASK_ID
This command assumes the IRTsimulationRep.R file is in the same folder as the job syntax file (and the job syntax file has the directive #$ -cwd).
The option $SGE_TASK_ID provides the replication number to R. In the R script file, you will see the use of the commandArgs() function. Here, we capture the replication number and use it to set the replication-specific random number seed and set the names of replication-specific output files.
Submitting the Array Job
To submit this job, the following command is used:
qsub -t 1-10 Simulation.job
The -t 1-10 command notes there will be 10 “tasks” to run for this job (the array). Each task will have a number from 1 through 10.
Simulation Job Completion
When all jobs have completed, you should see a series of .RData files in the job script folder. All simulation replications that successfully completed will have a file named simResultsRep[#].RData (with [#] replaced by the task number in the job array).
If this job was run in the all.q, it is possible some of the replications were “evicted” prior to completion. This happens when the owner of the HPC resources being used for a job submits their own job script.
Results Aggregation Script
Once all replication scripts have finished, to aggregate the results across all simulation replications, the simulation results aggregation script is used. (https://jonathantemplin.com/wp-content/uploads/2020/04/Results.job). This is a high performance job script. Details on these scripts are available here: https://jonathantemplin.com/university-of-iowa-argon-hpc-system-job-script-file-example-for-high-performance-jobs/.
Results Aggregation Job Completion
Upon successful completion of the results aggregation job script, you will find a file named finalResults.RData. This file will contain all aggregated results from the simulation replications.