Overview
There is a desire for jobs to be placed on nodes of a single type. The type groupings can have many values. Job submitters do not care which value of the group their job is placed on as long as all the nodes for the job have the same value for the group. In PBS today, this grouping is called a placement set. Placement sets can be defined complex wide, queue wide, or per-just for the job. The more specific placement sets have priority overrides the less specific (e.g. queue placement sets have priority over override the complex-wide placement sets). The new desire is more finely grained grouping. Instead of per-job, it should be per-rank or set of ranksgrouping be able to be for part of a resource request, but not all of it.
Technical Details
Glossary
chunk complex - Part of a select statement which is between the pluses. One or more identical chunks specified in the form of N:chunk.
New PBS resource: group
There is a new pbs PBS resource named group. This can only be requested in the select statement. The value of the group resource will be a placement set resource (just like the resources in node_group_key). When a chunk complex requests group, placement sets will be created based on the specified resource and the chunks from that chunk complex will be placed on nodes where the resource is set to the same value. A select statement can contain multiple group resources with the same/different values as long as there aren't two group resources requests in the same chunk complex. Two If two chunk complexes can contain the same group = resource value, each chunk complex will be evaluated individually. This means the different chunk complexes can be placed on different placement sets in the same placement pool.
Interaction with other placement sets
Currently a job can be run within one pool of placement sets. These will come from the server, queue, or job. The job's place=group overrides the queue's node_group_key which overrides the server's node_group_key. Per-chunk placement sets grouping will work similarly. If a chunk complex requests a group, it will override any other placement sets for the job. If a job has multiple chunk complexes where some request a group and others do not, the chunk complexes that did do not request a group will be placed by the for the job.over all nodes available to the job (e.g. if the job is in a queue with nodes associated with it, only those nodes). It is invalid to to request place=group and per-chunk grouping.
Example:
Nodes | Color | Shape |
---|---|---|
1-2 | blue | square |
3-4 | blue | triangle |
5-6 | red | square |
7-8 | red | triangle |
Current:
If the server has node_group_key=color and a job requests place=group=shape, the job will be placed
shape=square which consists of nodes 1, 2, 5, or 6
shape=triangle which consists of nodes 3, 4, 7, or 8
...
-
...
If the server has node_group_key=color and a job requests select=group=shape, the job will be placed on:
shape=square nodes 1-2, 5-6
shape=triangle - nodes 3-4, 7-78
Example
...
1: Interaction between server's node_group_key with multiple chunk complexes. One chunk complex has per-chunk
...
grouping and the other does not.
node_group_key=color
select=2:ncpus=1:group=shape+2:ncpus=1
Chunk complex 2:ncpus=1:group=shape will be run on the same node selection as example 1its group=shape
shape=square - nodes 1-2, 5-6
shape=triangle - nodes 3-4, 7-8
chunk complex 2:ncpus=1 will be run on any node
color=blue all nodes - 1-4color=red nodes 5-8
Example
...
2: Per-chunk
...
grouping with two chunk complexes with different groups.
no per-job placement
select=2:ncpus=1:group=color+2:ncpus=1:group=shape
Chunk complex 2:ncpus=1:group=color will be placed on group=color
color=blue (- nodes 1-4) or
color=red (- nodes 5-8)
Chunk complex 2:ncpus=1:group=shape will be placed on
shape=square (- nodes 1-2, 5-6) or
shape=triangle (- nodes 3-4, 7-8)
Example
...
3: Per-chunk
...
grouping where two chunk complexes request the same group.
no per-job placement
select=2:ncpus=1:group=color+2:ncpus=1:group=color
Both chunk complexes will be placed similarly:
color=blue ( nodes 1-4) or
color=red ( nodes 5-8).
The difference between per-job place=group=color and this request is that the two chunk complexes can be placed on different placement sets. It is possible for the first chunk complex to be placed on color=blue and the second chunk complex be placed on color=red
Interaction with placement set spanning
Currently if no placement set is large enough (when empty) to fit a job, the job will span across all nodes available to the job. This can be controlled with the scheduler's do_not_span_psets attribute. If do_not_span_psets is true, and a job can not fit within any placement set, the job not span and will never run.If no
Spanning will not change. The decision to span will be made at the job level.
If the job requests per-job grouping and not per-chunk placement is requested, per-job placement set grouping, spanning will work happen like it does today.
If only per-chunk placement is requested, it will work similarly as per-job placement set spanning. It will only affect the chunk complex though. If the chunk complex can not fit into any placement set, it will span over all nodes. If multiple chunk complexes have different groups, spanning for each chunk complex is evaluated individually.If both per-job placement and per-chunk placement are requested, spanning will happen in a tiered fashion. We first try and place chunks using their per-chunk placement sets. If any chunk can not fit, we will attempt to place the whole job in the per-job placement sets. If the job still can not fit, we will span as we do today over all nodesthe job requests per-chunk grouping, and any chunk can not fit, the entire job will span. This is regardless if other chunks can fit in their placement sets.
Clarifications
The nodes used to create placement sets:
- If the job is in a queue with nodes associated with it (node's queue attribute), only the nodes associated with the queue are available
- If the job is not in a queue with nodes associated with it, but there are other queues with nodes associated with them, then all nodes that are not associated with queues are available
- If there are no nodes associated with any queue, then all nodes are available.
Restrictions
- Per-chunk grouping is incompatible with place=pack. With place=pack, all chunks must be placed on one host.
- Per-chunk grouping is incompatible with place=group. You can either have per-job placement or per-chunk placement, but not both.