Slurm priorities
Slurm computes job priorities regularly and updates them to reflect continuous change in the siutation. For instance, if the priority is configured to take into account the past usage of the cluster by the user, running jobs of one user do lower the priority of that users' pending jobs.
The way the priority is updated depends on many configuration details. This document explains how to discover them and find the appropriate documentation so as to be able to understand how priorities are computed for a particular cluster.
Two parameters in Slurm's configuration determine how priorities are computed.
They are named SchedulerType
and PriorityType
.
Internal or external scheduling
The first parameter, SchedulerType
, determines how jobs are
scheduled based on available resources, requested resources, and job
priorities. Scheduling can be taken care of by an external program such as
Moab or
Maui, or by Slurm
itself.
In that later case, the scheduling type can be builtin
, in which
case all jobs run in priority order, or backfill
. Backfill is a
mechanism by which lower priority jobs can start earlier to fill the idle slots
provided they are finished before the next high priority jobs is expected to
start based on resource availability.
To find out which solution is implemented on a cluster, you can issue the following command:
scontrol show config | grep SchedulerType
If the answer is sched/wiki
, this means that scheduling is handled
by Maui, while sched/wiki2
means scheduling is done by Moab. If
scheduling is actually configured to be managed by Slurm, the above command
should return sched/builtin
or sched/backfill
.
See the slurm.conf manpage and search for 'SchedulerType' for more information.
If the scheduling is performed externally to Slurm (Maui or Moab), you will need to look for the proper documentation. If, as most likely, the scheduling is handled internally, the following section explains how to understand priority computations.
Priority computation
The way the priority is computed for a job depends on another parameter which
is called PriorityType
. It can take the following values:
priority/basic
Jobs are given a strictly first arrived first served priority. (Mostly used in case of an external scheduler.)priority/multifactor
Jobs are prioritized according to several criteria such as past cluster usage, job size, queue time, etc.priority/multifactor2
A variation of the previous.
To find out which solution is implemented on a cluster, you can issue the following command:
scontrol show config | grep PriorityType
The most used configuration is most probably priority/multifactor
.
The priority then depends on five elements:
- Job age: how long the job has been waiting in the queue ;
- User fairshare: a measure of past usage of the cluster by the user ;
- Job size: the number of CPUs a job requests ;
- Partition: the partition to which a job is submitted , specified with the
--partition
submission parameter; - QOS: a quality of service associated with the job, specified with the
--qos
submission parameter.
Note that the job age parameter is bounded so that priority stops increasing when the bound is attained. The job size parameter can be configured to favor small or large jobs, although it is used most of the time to favor large jobs. The faishare parameter has a 'forgetting' parameter that leads to considering only the recent history of the user and not its total use over the time life of the cluster.
All these are combined in a weighted average to form the priority. The weights can be found by running
sprio -w
A detailed description of how these are computed (including the fairshare), is given in the Slurm documentation for multifactor and for multifactor2.
The precise configuration for a cluster can be found by running the following command:
scontrol show config | grep ^Priority
Finding a user's current fairshare situation is done with the sshare command.
Getting the priority given to a job can be done either with squeue
squeue -o %Q -j jobid
or with the sprio command which gives the details of the computation.