Page tree
Skip to end of metadata
Go to start of metadata

Overview

After an extensive evaluation period, we have chosen SLURM to be the resource manager and job scheduler for the new Pearcey cluster. We believe that Slurm is more suitable for effectively managing the compute resources we have on our compute clusters.

This reference guide provides information on migrating from Torque to Slurm.

See also the SLURM cheat sheet.

Ruby

As Ruby is a NUMA system it is critical to co-locate cores and local memory. As such the batch system has a default memory per core enabled (12.8GB) and you should not request --mem. See the Requesting resources in Slurm guide for information about how to request resources for jobs on Ruby.

Common job commands

Command
Torque
Slurm

Submit a job

qsub <job script>
sbatch <job script>

Delete a job

qdel <job ID>
scancel <job ID>

Job status (all)

qstat
showq
squeue

Job status (by job)

qstat <job ID>
squeue -j <job ID>

Job status (by user)

qstat -u <user>
squeue -u <user>

Job status (detailed)

qstat -f <job ID>
scontrol show job <job ID>

Show expected start time

showstart <job ID>
squeue -j <job ID> --start

Queue list / info

qstat -q [queue]
scontrol show partition [queue]

Node list

pbsnodes -a
scontrol show nodes

Node details

pbsnodes <node>
scontrol show node <node>

Hold a job

qhold <job ID>
scontrol hold <job ID>

Release a job

qrls <job ID>
scontrol release <job ID>

Cluster status

qstat -B
sinfo

Start an interactive job

qsub -I <args>
sinteractive <args>

X forwarding

qsub -l -X <args>
salloc <args>
srun --pty <args>

Read stdout messages at runtime

qpeek <job ID>
No equivalent command / not needed.
Use the --output option instead.
Monitor cluster and jobs
pbstop
slurmtop
Monitor or review a jobs resource usage 
sacct -j <job_num> --format JobID,jobname,NTasks,nodelist,CPUTime,ReqMem,MaxVMSize,Elapsed

Job submission options

 
Option
Torque (qsub)
Slurm (sbatch)

Script directive

$PBS
#SBATCH

Job name

-N <name>
--job-name=<name>

Queue

-q <queue>
--partition=<queue>

Wall time limit

-l walltime=<hh:mm:ss>
--time=<hh:mm:ss>

Node count

-l nodes=<count>
--nodes=<count>

Process count per node

-l ppn=<count>
--ntasks-per-node=<count>
core count (per process) 
--cpus-per-task=<cores>

Memory limit

-l vmem=<limit>
--mem=<limit>  
(in mega bytes - MB)

Minimum memory per processor

no equivalent
--mem-per-cpu=<memory>

Request generic resource

(on the bragg-l accelerator cluster)

-l gpus=<count> 
or 
-l mics=<count>
--gres=gpu:<count>  
or 
--gres=mic:<count>

Request specific nodes

-l nodes=<node>[,node2[,...]]>
-w, --nodelist=<node>[,node2[,...]]>
-F, --nodefile=<node file>

Job array

-t <array indices>
-a <array indices>

Standard output file

-o <file path>
--output=<file path>
Standard error file
-e <file path>
--error=<file path>

Combine stdout/stderr to stdout

-j oe
--output=<combined out and err file path>

Copy environment

-V
--export=ALL (default)

Copy environment variable

-v <variable[=value][,variable2=value2[,...]]>
--export=<variable[=value][,variable2=value2[,...]]>

Job dependency

-W depend=after:jobID[:jobID...]
-W depend=afterok:jobID[:jobID...]
-W depend=afternotok:jobID[:jobID...]
-W depend=afterany:jobID[:jobID...]
--dependency=after:jobID[:jobID...]
--dependency=afterok:jobID[:jobID...]
--dependency=afternotok:jobID[:jobID...]
--dependency=afterany:jobID[:jobID...]

Request event notification

-m <events>
--mail-type=<events>

Note: require multiple mail-type requests for multiple events, such as:

#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END

Email address

-M <email address>
--mail-user=<email address>

Defer job until the specified time

-a <date/time>
--begin=<date/time>

Job environment

The  SLURM system will propagate the module environment of a users current environment (the environment of the shell from which a user calls sbatch) through to the worker nodes, with some exceptions noted in the following test. By default  SLURM does not source the files ~./bashrc or ~/.profile when requesting resources via sbatch (although it does when running sinteractive/ salloc ).  So, if you have a standard enviroment that you have set in either of these files or your current shell then you can do one of the following:

  1. Add the command #SBATCH --get-user-env to your job script (i.e. the module environment is propagated).

  2. Source the configuration file in your job script: 

    Sourcing your .bashrc file
    < #SBATCH statements >
    source ~/.bashrc
  3. You may want to remove the influence of any other current environment variables by adding #SBATCH --export=NONE to the script. This removes all set/exported variables and then acts as if #SBATCH --get-user-env has been added (module environment is propagated).

OpenMP can require a variable named OMP_NUM_THREADS to be set so as to specifiy the number of threads to create when code encounters an OpenMP code block, either in your own code or in a library function e.g. in MKL BLAS. An appropriate value for OMP_NUM_THREADS  can be obtained from the SLURM environment variable  $SLURM_CPUS_PER_TASK  that is set when  --cpus-per-task is specified in an sbatch script.

Setting OMP_NUM_THREADS
# Set the number of cores available per process
# if the $SLURM_CPUS_PER_TASK variable is not zero length
if [ ! -z  $SLURM_CPUS_PER_TASK ] ; then
	export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
else
	# $SLURM_CPUS_PER_TASK is a zero length
	# The default value is a core per task
	export OMP_NUM_THREADS=1
fi
Info
Torque
Slurm
Notes

Version

$PBS_VERSION
-
Can extract from "sbatch --version".

Job name

$PBS_JOBNAME
$SLURM_JOB_NAME
 

Job ID

$PBS_JOBID
$SLURM_JOB_ID
 

Batch or interactive

$PBS_ENVIRONMENT
-
 

Batch server

$PBS_SERVER
-
 

Submit directory

$PBS_O_WORKDIR
$SLURM_SUBMIT_DIR
Slurm jobs starts from the submit directory by default.

Submit host

$PBS_O_HOST
$SLURM_SUBMIT_HOST
 
Node file
$PBS_NODEFILE
 
A filename and path that lists the nodes a job has been allocated

Node list

cat $PBS_NODEFILE
$SLURM_JOB_NODELIST
The SLURM variable has a different format to the PBS one.
To get a list of nodes use:
"scontrol show hostnames $SLURM_JOB_NODELIST"

Job array index

$PBS_ARRAY_INDEX
$SLURM_ARRAY_TASK_ID
 

Walltime

$PBS_WALLTIME
-
 

Queue name

$PBS_QUEUE
$SLURM_JOB_PARTITION
 

Number of nodes allocated

$PBS_NUM_NODES
$SLURM_JOB_NUM_NODES
$SLURM_NNODES
 

Number of processes

$PBS_NP
$SLURM_NTASKS
 

Number of processes per node

$PBS_NUM_PPN
$SLURM_TASKS_PER_NODE
 

List of allocated GPUs

$PBS_GPUFILE
-
 

Requested tasks per node

-
$SLURM_NTASKS_PER_NODE
 

Requested CPUs per task

-
$SLURM_CPUS_PER_TASK
 

Scheduling priority

-
$SLURM_PRIO_PROCESS
 

Job user

-
$SLURM_JOB_USER
 
Hostname$HOSTNAME$HOSTNAME == $SLURM_SUBMIT_HOST Unless a shell is envoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environmentsp will be the same on all allocated nodes.

Sample job script

Torque

#!/bin/bash
#PBS -N testjob
#PBS -l walltime=2:00:00
 
echo "running job"
sleep 120
echo "bye"

SLURM

#!/bin/bash
#SBATCH --job-name="testjob"
#SBATCH --time=2:00:00
 
echo "running job"
sleep 120
echo "bye"
  • No labels