Overview
This design focuses on creating a reservation out of a job after the job has started running.
When a job encounters a problem with application/data/script/etc, the job exits and the resources allocated to the job are released back to the server. If the user wants to run the job again (after correcting the issues that caused the problem), they need to re-submit the job which could take a while to run (depending on various factors like number of jobs in the queue, priority, scheduling policy etc.). If the user needs resources to be allocated for a time period so that the job can be re-submitted and can run without delays, they can use reservations. As of now reservations have a start and end time, so it is not yet possible to have a reservation that is schedule-able.
Technical Details
- Interface 1 - New '--job' option to pbs_rsub command.
- Visibility: public
- Change Control: Stable
- Synopsis: Allow users to create a reservation out of a running job.
- Details:
- This command will create a reservation using the exec_vnodes of the job provided.
- Example -
[root@d_server /]# qstat -f 1.d_server | grep exec_vnode
exec_vnode = (vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)
[root@d_server /]#[root@d_server /]# pbs_rsub --job 1
R2.d_server CONFIRMED[root@d_server /]#
[root@d_server /]# pbs_rstat -f | grep resv_nodes
resv_nodes = (vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)
[root@d_server /]#
- Example -
- This option can only be used for a job in state 'R' and substate 42.
- "request invalid for job state" will be displayed if the job is not in state R/42.
- The newly created reservation will be immediately confirmed as shown above.
- The walltime of the newly created reservation will be the same as that of the job.
- The start time of the newly created reservation will be copied from the job.
- The end time of the newly created reservation will be calculated from the start time and walltime
Other attributes that will be copied from the job are -
JobReservationJob_Owner Reserve_Owner schedselect schedselect exec_vnode resv_nodes - The reservation ID will be prefixed with 'R' as that of advance reservations.
- The reservation will be named R<next_available_id>.
- The job from which the reservation is created will be moved to the newly created reservation queue.
- An array job ID cannot be used with this new option.
- If the job is peer scheduled, the reservation will be created in the pulling complex.
- This command will create a reservation using the exec_vnodes of the job provided.
- Interface 2: A new job attribute "create_resv_from_job"
- Visibility: public
- Change Control: Stable
- Synopsis: Allow users to mark a job for creating a reservation out of it at the time of submission (qsub) or through a runjob hook.
- Details:
- This command will mark the job for creating a reservation out of it.
- Type: boolean
- Example:
[root@d_server /]# qsub -Wcreate_resv_from_job=1 -- /bin/sleep 1111
3016.d_server
[root@d_server /]# qstat -sd_server:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
------------------- -------------- -------- ------------ ---------- ------ ------ ----------- ------ - -------
3016.d_server root R3017 STDIN 10824 1 1 -- -- R 00:00
Job run at Fri Jan 10 at 22:58 on (d_server:ncpus=1)
[root@d_server /]# pbs_rstat
Resv ID Queue User State Start / Duration / End
---------------------------------------------------------------------
R3017.d_se R3017 root@d_s RN Today 22:58 / 157680000 / Wed Jan 08 2025 2
[root@d_server /]#- Example showing creating a reservation out of a job is in the file hook demo.txt .
- Example:
- Points 1.d.iii - 1.d.xii apply here as well.
- Interface 3: A new reservation attribute "reserve_job"
- Visibility: public
- Change Control: Stable
- Synopsis: Allow users to identify if the reservation is created out of a job.
- Details:
- Example:
- [root@d_server /]# pbs_rstat -f | grep job
reserve_job = 3016.d_server
[root@d_server /]#
- [root@d_server /]# pbs_rstat -f | grep job
- Example:
- Interface 4: pbs_rsub error message when creating a reservation out of a reservation job.
- Visibility: public
- Change Control: Stable
- Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
- Details:
- Example:
- [root@d_server /]# pbs_rsub --job 3016
pbs_rsub: Reservation may not be created from a job already within a reservation.
[root@d_server /]#
- [root@d_server /]# pbs_rsub --job 3016
- Example:
- Interface 5: pbs_rsub error message when creating a reservation out of an array job.
- Visibility: public
- Change Control: Stable
- Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
- Details:
- Example:
- [root@d_server /]# pbs_rsub --job 3[]
pbs_rsub: Reservation may not be created from an array job
[root@d_server /]# - [root@d_server /]# pbs_rsub --job 3[1]
pbs_rsub: Reservation may not be created from an array job
[root@d_server /]#
- [root@d_server /]# pbs_rsub --job 3[]
- Example:
- Interface 6: qsub error message when submitting job to a reservation and using -Wcreate_resv_from_job=1.
- Visibility: public
- Change Control: Stable
- Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
- Details:
- Example:
- [root@d_server pbspro_dev_oss]# pbs_rsub -R 2118 -E 2120
R1.d_server UNCONFIRMED
[root@d_server pbspro_dev_oss]# qr
Resv ID Queue User State Start / Duration / End
---------------------------------------------------------------------
R1.d_serve R1 root@d_s CO Today 21:18 / 120 / Today 21:20
[root@d_server pbspro_dev_oss]# qsub -q R1 -Wcreate_resv_from_job=1 -- /bin/sleep 1111
qsub: Reservation may not be created from a reservation job
[root@d_server pbspro_dev_oss]#
- [root@d_server pbspro_dev_oss]# pbs_rsub -R 2118 -E 2120
- Example:
- Interface 7: pbs_rsub error message when one user tries to create a reservation out of another user's job.
- Visibility: public
- Change Control: Stable
- Synopsis: A new error message indicating that creating a reservation out of another user's job is not allowed.
- Details:
- Example:
[pbsuser1@d_server pbspro_dev_oss]$ qsub -- /bin/sleep 1111
5.d_server
[pbsuser1@d_server pbspro_dev_oss]$ qstat -sd_server:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
5.d_server pbsuser1 workq STDIN 2874 1 1 -- -- R 00:00
Job run at Fri Jan 17 at 21:28 on (d_server:ncpus=1)
[pbsuser1@d_server pbspro_dev_oss]$ exit
[root@d_server pbspro_dev_oss]# su pbsuser
[pbsuser@d_server pbspro_dev_oss]$ pbs_rsub --job 5
qsub: Unauthorized Request
[pbsuser@d_server pbspro_dev_oss]$
- Example: