University of Iowa Argon HPC System: Job Script File Example for High Performance Jobs

List

University of Iowa Argon HPC System: Job Script File Example for High Performance Jobs

April 24, 2020 | Jonathan | HPC

Analyses on the University of Iowa Argon HPC require submission using a job script file. Technical details for the contents of the job script file can be found at: https://wiki.uiowa.edu/display/hpcdocs/Basic+Job+Submission. This page provides some basic details on the options for these files for High performance jobs.

High performance jobs are jobs that require long run times or large amounts of computing resources (e.g., memory or number of cores).

The example in this page is an item response theory analysis. In this analysis, we will estimate a unidimensional two-parameter logistic item response theory model using Tatsuoka’s fraction subtraction data. The estimation method will be marginal maximum likelihood using the mirt package in R. This analysis won’t take long on most computers, but we will use it to demonstrate an example of a High Performance analysis with the Argon cluster. The analysis specifies the estimation of four separate Markov chains, which we will run in parallel on four separate cores.

The files for this example can be found at:

Analysis R Script: https://jonathantemplin.com/wp-content/uploads/2023/08/IRTanalysis.r
Job Script: https://jonathantemplin.com/wp-content/uploads/2023/08/DataAnalysis.job

The job script is shown here, too:

#!/bin/bash

#####Set Scheduler Configuration Directives#####
#Name the job:
#$ -N FS-Data-Analysis

#Send e-mail at beginning/end/suspension of job
#$ -m bes

#E-mail address to send to
#$ -M PUT-YOUR-EMAIL-HERE@uiowa.edu

#Start script in current working directory
#$ -cwd

#####End Set Scheduler Configuration Directives#####


#####Resource Selection Directives#####
#Select the queue to run in
#$ -q UI

#Request four cores
#$ -pe smp 4

#####End Resource Selection Directives#####

##### Analysis Syntax #####

module load stack/2022.2
module load stack/2022.2-base_arch
module load r/4.2.2_gcc-9.5.0

R CMD BATCH IRTanalysis.R

Job Script Syntax Formatting

In the job script file, the following distinctions are made:

Lines beginning with # are comments and are not interpreted
Lines beginning with #$ are called directives and tell the scheduler program the specs under which to run the analysis
Lines beginning with #! tell the scheduler which Linux shell to use
Lines beginning with none of the above are Linux commands that will be run to execute the analysis

Details about the commands in the job script file are provided below:

Linux Shell Selection

#!/bin/bash

Sets the Linux shell. If you are unfamiliar with shells, using Bash is probably best, so leave this as-is. More information on shells can be found at: https://en.wikipedia.org/wiki/Unix_shell.

Scheduler Configuration Directives

Scheduler configuration directives give details about the analysis and specifics about how the scheduler will provide information once the job is running or has finished running.

#Name the job:
#$ -N FS-Data-Analysis

Sets the name of the job. Useful for high performance jobs that take a long time as any notification or listing of the job will include the name. This can be used to differentiate which jobs are running and which are stopped.

#Send e-mail at beginning/end/suspension of job
#$ -m bes

Tells the scheduler to send an email notification at the beginning, end, and suspension of a job run. Change “bes” to which you prefer by deleting letters (i.e., “s” only sends email notifications if jobs have been suspended–typically meaning canceled).

#E-mail address to send to
#$ -M PUT-YOUR-EMAIL-HERE@uiowa.edu

Defines the email address where notifications are to be sent. Be sure to put your email address here.

#Start script in current working directory
#$ -cwd

Tells the scheduler to start the script in the directory where the script was submitted from. This is useful to not have to specify the full path of analysis files in the script (see the final commands below).

Resource Selection Directives

Resource selection directives instruct the scheduler to request specific types of computational resources to run the job. These resources are the types of machines that are used for the analysis.

#Select the queue to run in
#$ -q UI

Selects the queue where the job will be run. A queue is a set of jobs that are run with under specific policies. Users will only have access to a few queues.

The list of queues at the University of Iowa are shown here: https://wiki.uiowa.edu/display/hpcdocs/Queues+and+Policies.

The UI queue noted in the directive above is used for jobs that typically take a long time to run. Here, each user can have up to five jobs running at one time. There is no time limit to how long the job may run.

The all.q is another option to run analyses. As noted by the Argon HPC Documentation:

This queue encompasses all of the nodes and contains all of the available job slots. It is available to everyone with an account and there are no running job limits. However, it is a low priority queue instance on the same nodes as the higher priority investor and UI queue instances. The all.q queue is subordinate to these other queues and jobs running in it will give up the nodes they are running on when jobs in the high priority queues need them. The term we use for this is “job eviction”. Jobs running in the all.q queue are the only ones subject to this.

https://wiki.uiowa.edu/display/hpcdocs/Queues+and+Policies (The all.q queue section)

Queue selection can be made simple:

Use the UI queue for high performance jobs that may take some time to run. The UI queue ensures the job will be able to finish.
Use the all.q queue for high throughput jobs (such as simulation replications) that can be “evicted” and re-run later. The all.q queue ensures more jobs can be run at once.

#Request four cores
#$ -pe smp 4

This directs the scheduler to run the analysis on a machine with four cores. A core is a processing unit. Here, the use of four cores ensures we can run each of our four Markov chains for our analysis in parallel.

For high throughput jobs, this line is often omitted.

Analysis Syntax

module load stack/2022.2
module load stack/2022.2-base_arch
module load r/4.2.2_gcc-9.5.0

R CMD BATCH IRTanalysis.R

The analysis syntax portion includes commands that can be run in the terminal window separately. The commands listed here are the following:

module load stack/2022.2
module load stack/2022.2-base_arch
module load r/4.2.2_gcc-9.5.0

Loads R so that it can be run in Argon.

Modules are programs that are installed in Argon that can be used for analyses. To see which modules are available, in a terminal window in Argon, run the command:

module avail

Finally, we run our analysis file using the command:

R CMD BATCH IRTanalysis.R

This command assumes the IRTanalysis.R file is in the same folder as the job syntax file (and the job syntax file has the directive #$ -cwd).

Submitting the Job to Argon

To submit the job, put the job file and the analysis file in the same folder in your Argon files directory. Then, at the Argon terminal window, submit the job using the following command:

qsub DataAnalysis.job

Once submitted, you can check on the status of your job using the command:

qstat -u [username]

Be sure to replace [username] with your HawkID (the username you used to access Argon).

One Response to “University of Iowa Argon HPC System: Job Script File Example for High Performance Jobs”

University of Iowa Argon HPC System: Job Script File Example for High Throughput Jobs | Jonathan Templin's Website
April 24, 2020 at 12:24 pm

[…] Once all replication scripts have finished, to aggregate the results across all simulation replications, the simulation results aggregation script is used. (https://jonathantemplin.com/wp-content/uploads/2020/04/Results.job). This is a high performance job script. Details on these scripts are available here: https://jonathantemplin.com/university-of-iowa-argon-hpc-system-job-script-file-example-for-high-per…. […]

Posts

February 28th, 2022

IRT estimation with R packages mirt and lavaan

This is a brief how-to for estimating IRT models in the R packages mirt and lavaan. The example is based […]

April 24th, 2020

Introduction to the University of Iowa High Performance Computing System (Argon) and Iowa Interactive Data Analytics Service (IDAS)

Updated: August 20, 2023 High performance computing is seemingly becoming a way of life in many fields, both for research […]

April 24th, 2020

Jonathan Templin

Clemson University

Search Jonathan’s Website