Enhancing pbs_ralter for -lselect
Follow the PBS Pro Design Document Guidelines.
Links
- Link to discussion on Developer Forum: http://community.pbspro.org/t/enhancing-pbs-ralter-with-lselect/2074
- Link to pull request (phase 1, non-running reservations): https://github.com/openpbs/openpbs/pull/1842
Overview
Right now the only way pbs_ralter can change the shape of a reservation is via the start/end/duration. The resources requested by the reservation can not be altered. The RFE will allow the select resource request to be modified. It will be done in a limited way to start, but it can be enhanced further in the future.
Interface: pbs_ralter -l select=<select spec>
- This will alter the select spec for a reservation.
- As with modification of start/end times, the request will be sent to the scheduler. The scheduler will attempt to confirm the reservation with the new select spec. The request can be confirmed or denied.
- The requested select spec must request fewer of the same chunks originally requested by the reservation.
- If the reservation has not started yet:
- As with pbs_ralter -R/-E/-D, this may result in the reservation being moved to a completely different set of nodes.
- If a reservation is in in state IN CONFLICT and the reconfirmation is successful, then the reservation will return to state CONFIRMED.
- If the reservation has already started running:
- Attempting to ralter a degraded running reservation will likely fail. This is current behavior of ralter.
- If you attempt to ralter just -R -E -D, the alter will fail.
- If you attempt to alter the select, all of the unavailable nodes will need to be removed for the ralter to succeed.
- This will require the admin to map the resv_nodes chunks to the schedselect chunks and remove the chunks which are unavailable.
- If a reservation is running and in substate IN CONFLICT, the pbs_ralter -l select will be rejected.
- Nodes with running jobs on them can not be released. If the ralter is requesting to release more nodes than don't have running jobs on them, the ralter will fail.
- Attempting to ralter a degraded running reservation will likely fail. This is current behavior of ralter.
- If the reservation alter is successful, a 'Y' (reservation confirmation) record will be printed into the accounting logs
- The current behavior of pbs_ralter and standing reservations will not change. The command will only modify the current occurrence.
- This interface can be used along with the -R/-E/-D options.
- Hooks:
- There is no ralter hook. It is not possible for the admin to intercept the incoming select and modify it
- There are submission hooks (rsub/qsub). The select associated with the reservation may not be the same as the user originally submitted.
- To successfully ralter a select, the user should look at the current Resource_List.select for the reservation in pbs_rstat -f before submitting their pbs_ralter -lselect.
- If the requested select is either more of the same chunks or different chunks, the following message will be returned
- "New select must be made up of a subset of the original chunks"
Examples
% pbs_rsub -l select=2:ncpus=2:color=red+2:ncpus=2:color=yellow
R1 CONFIRMED
- Less of the first chunk
% pbs_ralter -l select=1:ncpus=2:color=red+2:ncpus=2:color=yellow
- Less of both chunks
% pbs_ralter -l select=1:ncpus=2:color=red+1:ncpus=2:color=yellow
- Remove the first chunk
% pbs_ralter -l select=2:ncpus=2:color=yellow
- NOT ALLOWED: increase the first chunk
% pbs_ralter -lselect=4:ncpus=2:color=red+2:ncpus=2:color=yellow
- NOT ALLOWED: change the first chunk
% pbs_ralter -lselect=2:ncpus=2:color=green+2:ncpus=2:color=yellow
OR
% pbs_ralter -lselect=2:ncpus=1:color=red+2:ncpus=2:color=yellow
- NOT ALLOWED: Adding a chunk
% pbs_ralter -lselect=2:ncpus=2:color=red+2:ncpus=2:color=yellow+2:ncpus=2:color=green
- NOT ALLOWED: spreading out the chunks
% pbs_ralter -l select=1:ncpus=2:color=red+1:ncpus=2:color=red+1:ncpus=2:color=yellow+1:ncpus=2:color=yellow
Technical Details
We will extend the existing functionality if altering a reservation.
The following is how a reservation alter works:
- We store all the attributes we are changing on the reservation structure itself.
- We change the reservation attributes to the new values
- We contact the scheduler and tell it to confirm the altered reservation
- If the reservation is confirmed, yay.
- If the reservation fails to confirm, we set the reservation attributes back to their existing values.
In the scheduler, altering a reservation is very similar to reconfirming a degraded reservation.
- disable reservation events in the calendar, so to not have the reservation conflict with itself
- if the reservation is not running, ignore the existing nodes and confirm it as it was unconfirmed
- If the reservation is running, create an internal select based on the exec_vnode to force the reservation back on the same nodes
- call pbs_confirmresv() with success or failure
In the server we will save both the original select and schedselect.
If the reservation is not running yet, #2 remains the same, except that we might be searching for different resources.
If the reservation has started running, #3 needs to be modified in the following ways:
- We will map chunks in the original schedselect to the chunks in the exec_vnode
- When we create our internal select, we will shrink it based on the new schedselect
- The nodes being released first will be the ones which are unavailable (for a degraded reservation), and those with running jobs on them.
- If there are not enough available nodes with no running jobs on them, the alter will fail.
- Once have our new internal select, we call node search to find our new node solution. This may still fail due to requesting an extended end time.
The detection of whether the number of chunks are correct (fewer) will be done in the server and an immediate response will be given to the user.
Project Documentation Main Page