UBELIX

  • {{ layer1.title }}
    • {{ layer2.title }}
      • {{ layer3.title }}
        • {{ layer4.title }}
  1. Home

From Grid Engine to Slurm

View in Confluence Edit Page Log In Log Out

Description

This page provides information to make the transition from Grid Engine to Slurm as smooth as possible.

On this page

Slurm vs. Grid Engine Terminology

Grid Engine Slurm
queues partitions

Important Commands

Grid Engine Slurm Description Remarks
qsub sbatch Submit a script (batch mode) to the scheduler for later execution  
  srun Run parallel task within allocation Use srun within batch script to execute parallel tasks.
qstat squeue High-level overview of your active jobs  
qstat -j <job_id> scontrol show jobid <job_id> Detailed information about active (e.g. pending, running) jobs  
qacct sacct

Accounting information for active and completed jobs

By default only jobs from current day are shown. Use --starttime,
and/or --endtime option to extend/restrict accounting period.
qacct -j <job_id> sacct --jobs=<job_id>   By default only jobs from current day are shown. Use --starttime,
and/or --endtime option to extend/restrict accounting period.
qdel <job_id> scancel <job_id> Delete jobs  

sbatch Command Options

Most command options support a short form as well as a long form (e.g. -u <username>, and --user=<username>). Because few options only support the long form, we will consistently use the long form throughout this documentation.

Prefix for Embedded Options

Grid Engine Slurm
#$ <option> #SBATCH <option>

General Options

While for Grid Engine it is mandatory to request h_vmem and h_cpu, Slurm provides default values for --mem-per-cpu and --time if not requested explicitly.

Grid Engine Slurm Description Example (Slurm)
-M <mail> --mail-user=<mail> Valid email address --mail-user=foo@id.unibe.ch
-m <mail type> --mail-type=<mail type> When to send a mail

--mail-type=end,fail

--mail-type=none

-N <name> --job-name=<name> Job name --job-name="Matlab Job"
-l h_cpu=hh:mm:ss --time=hh:mm:ss Job runtime --time=06:00:00
-l h_vmem=<memory>

--mem-per-cpu=<memory>

Memory required per slot/cpu. Suffix [K|M|G|T].

--mem-per-cpu=2G

-l scratch=1|0

-l scratch_size=<space>

-l scratch_files=<#files>

--tmp=<MB> Specify a minimum amount of disk space that must be available on the compute node(s). Local scratch space for the job is referenced by the variable TMPDIR. The local scratch space for the job is referenced by the variable TMPDIR. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T]. --tmp=8G
--tmp=2048
-cwd --workdir=<dir>

Set the current working directory. In Slurm, the default working directory is the directory from where the sbatch command was executed. All relative paths in the job script are relative to the current working directory.

 
-e <path_to_err_file> --error=<path_to_err_file>

Connect standard error to file. By default stderr and stdout are connected to the same file slurm-%j.out, where '%j'
is replaced with the job allocation number. For more replacement symbols see below.

--error=err/%j.err
-o <path_to_out_file> --output=<path_to_out_file>

Connect standard output to file. By default stderr and stdout are connected to the same file slurm-%j.out, where '%j'
is replaced with the job allocation number. For more replacement symbols see below.

--output=out/%j.out
-q <queue> --partition=<partition>

Explicitly request a partition.

--partition=long
-hold_jid <job_list>

--dependency=<type:job_id[:job_id][,type:job_id[:job_id]]>
or
--dependency=<type:job_id[:job_id][?type:job_id[:job_id]]>

Defer the start of this job until the specified dependencies have been satisfied. See sbatch manpage (man sbatch) for a description of all valid types. --dependency=afterany:11908
-h --hold Submit job in held state. Job is not allowed to run until it is released.  
-r y[es]|n[o] --requeue
--no-requeue 
Specifies if the job should be requeued after a node failure. By default, a job is requeued unless explicitly disabled by the user (--no-requeue).  
-now y[es] --immediate Only submit the batch script to the controller if all specified resources are immediately available  
  --exclusive Use the compute node(s) exclusively, i.e. do not share nodes with other jobs. CAUTION: Only use this option if you are an experienced user, and you really understand the implications of this feature. If used improperly, the use of this option can lead to a massive waste of computational resources.  
  --constraint=<features> Request nodes with certain features. This option does not make sense on the current testing system. --constraint=broadwell
  --parsable Print only the job id number.  
  --test-only Validate the batch script and return the estimated start time considering the current cluster state.  

Replacement Symbols

Supported replacement symbols that can be used for --output, --error, and --input options:

\\
Do not process any of the replacement symbols.

%%
The character %

%A
Master job allocation number of array job.

%a
Job array task number.

%j
Job allocation number.

%N
Name of the node that runs the batch script.

%u
User name.

Array Job Options

Grid Engine Slurm Description Example (Slurm)
-t n[-m[:s]] --array=n[,k[,...]][-m[:s]] Submit an array job --array=1,4,16-32:4
-tc max_con_task % Max number of tasks allowed to run concurrently --array=16-32%4

Parallel Job Options

Grid Engine Slurm Description Example (Slurm)
-pe <mpi env> <#slots>

--nodes=<minnodes[-maxnodes]>

--ntasks=<#tasks>

--ntasks-per-node=<#tasks>

See section "Parallel Environments" below.

--nodes=4

--ntasks=8

--ntasks-per-node=2

-pe <smp env> <#slots> --cpus-per-task=<#cpus> See section "Parallel Environments" below. --cpus-per-task=8

Environment Variables

Among others, Slurm sets the following variables in the environment of the batch script

Grid Engine Slurm
JOB_ID SLURM_JOB_ID
JOB_NAME SLURM_JOB_NAME
  SLURM_ARRAY_JOB_ID
SGE_TASK_ID SLURM_ARRAY_TASK_ID
SGE_TASK_LAST SLURM_ARRAY_TASK_MAX
SGE_TASK_FIRST SLURM_ARRAY_TASK_MIN
SGE_TASK_STEPSIZE SLURM_ARRAY_TASK_STEP
NSLOTS (mpi) SLURM_NTASKS
  SLURM_NTASKS_PER_NODE
NSLOTS (smp) SLURM_CPUS_PER_TASK
TMPDIR or TMP TMPDIR

Parallel Environments

There is no such concept as a parallel environment in Slurm. To run parallel jobs under Slurm, simply request a certain number of nodes (–nodes), the number of processes (–ntasks), and/or the number of threads (--cpus-per-task) required by your job. For MPI jobs one would request --ntasks, and for shared memory jobs (SMP) one would request --cpus-per-task. The number of requested nodes defaults to as many as are required to fulfill the other resource requests (–ntasks, and/or --cpus-per-task). You can guide the distribution of the requested tasks using the --ntasks-per-node option.

Job Examples

Single Core Serial Job

Grid Engine:

job.sh
		
    #!/bin/bash
# Grid Engine options
#$ -M foo@bar.unibe.ch
#$ -m ae
#$ -N "Example Job"
#$ -cwd
#$ -l h_cpu=01:00:00
#$ -l h_vmem=2G
 
# Put your code below this line
./simple
 
		
    bash$ qsub job.sh

Slurm:

job.sh
		
    #!/bin/bash
# Slurm options
#SBATCH --mail-user=foo@bar.unibe.ch
#SBATCH --mail-type=fail,end
#SBATCH --job-name="Example Job"
#SBATCH --workdir=.
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=2G

# Put your code below this line
./simple

 
		
    bash$ sbatch job.sh

Parallel Shared Memory Job (SMP)

Grid Engine:

job.sh
		
    #!/bin/bash
# Grid Engine options
#$ -M foo@bar.unibe.ch
#$ -m ae
#$ -N "Example Job SMP"
#$ -cwd
#$ -l h_cpu=01:00:00
#$ -l h_vmem=2G
#$ -pe smp 8
  
# Put your code below this line
export OMP_NUM_THREADS=$NSLOTS
./simple_omp
 
		
    bash$ qsub job.sh

Slurm:

job.sh
		
    #!/bin/bash
# Slurm options
#SBATCH --mail-user=foo@bar.unibe.ch
#SBATCH --mail-type=fail,end
#SBATCH --job-name="Example Job SMP"
#SBATCH --workdir=.
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=2G
#SBATCH --cpus-per-task=8

# Put your code below this line
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./simple_omp
 
		
    bash$ sbatch job.sh

Parallel MPI Job

Grid Engine:

job.sh
		
    #!/bin/bash
# Grid Engine options
#$ -M foo@bar.unibe.ch
#$ -m ae
#$ -N "Example Job MPI"
#$ -cwd
#$ -l h_cpu=01:00:00
#$ -l h_vmem=2G
#$ -pe orte 8
  
# Put your code below this line
module load openmpi/1.10.2-gcc
mpirun simple_mpi
 
		
    bash$ qsub job.sh

Slurm:

job.sh
		
    #!/bin/bash
# Slurm options
#SBATCH --mail-user=foo@bar.unibe.ch
#SBATCH --mail-type=fail,end
#SBATCH --job-name="Example Job MPI"
#SBATCH --workdir=.
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=2G
#SBATCH --ntasks=8

# Put your code below this line
module load openmpi/1.10.2-gcc
srun --mpi=pmi2 simple_mpi
 
		
    bash$ sbatch job.sh

Whether all parallel MPI processes run on a single node or be distributed across several nodes depends on the resources (memory, cores) available on the nodes. However, with Slurm you can specify how the parallel processes should be distributes among the allocated resources. This is discussed in detail here.

Related pages: