Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

forum discussion

Pull Request

Overview

This design focuses on creating a reservation out of a job after the job has started running.

When a job encounters a problem with application/data/script/etc, the job exits and the resources allocated to the job are released back to the server. If the user wants to run the job again (after correcting the issues that caused the problem), they need to re-submit the job which could take a while to run (depending on various factors like number of jobs in the queue, priority, scheduling policy etc.). If the user needs resources to be allocated for a time period so that the job can be re-submitted and can run without delays, they can use reservations. As of now reservations have a start and end time, so it is not yet possible to have a reservation that is schedule-able. This design focuses on creating a reservation out of a job after the job has started running.

Technical Details

  1. Interface 1 - New '--job' option to pbs_rsub command.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: Allow users to create a reservation out of a running job.
    4. Details
      1. This command will create a reservation using the exec_vnodes of the job provided.
        1. Example - 
          1. [root@d_server /]# qstat -f 1.d_server | grep exec_vnode
            exec_vnode = (vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)
            [root@d_server /]#

          2. [root@d_server /]# pbs_rsub --job 1
            R2.d_server CONFIRMED

            [root@d_server /]#

          3. [root@d_server /]# pbs_rstat -f | grep resv_nodes
            resv_nodes = (vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)+(vnode[0]:ncpus=1)
            [root@d_server /]#

      2. This option can only be used for a job in state 'R' and substate 42.
        1. "request invalid for job state" will be displayed if the job is not in state R/42.
      3. The newly created reservation will be immediately confirmed as shown above.
      4. The walltime of the newly created reservation will be the same as that of the job.
      5. The start time of the newly created reservation will be copied from the job.
      6. The end time of the newly created reservation will be calculated from the start time and walltime
      7. Other attributes that will be copied from the job are - 


        Job
        Reservation
        Job_OwnerReserve_Owner
        schedselectschedselect
        exec_vnoderesv_nodes


      8. The reservation ID will be prefixed with 'R' as that of advance reservations.
      9. The reservation will be named R<next_available_id>.
      10. The job from which the reservation is created will be moved to the newly created reservation queue.
      11. An array job ID cannot be used with this new option.
      12. If the job is peer scheduled, the reservation will be created in the pulling complex.
  2. Interface 2: A new job attribute "create_resv_from"
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: Allow users to mark a job for creating a reservation out of it at the time of submission (qsub) or through a runjob hook.
    4. Details: 
      1. This command will mark the job for creating a reservation out of it.
        1. Example:
          1. [root@d_server /]# qsub -Wcreate_resv_from=1 -- /bin/sleep 1111
            3016.d_server
            [root@d_server /]# qstat -s

            d_server:
            Req'd Req'd Elap
            Job ID             Username Queue Jobname SessID NDS TSK Memory Time S Time
            ------------------- -------------- --------  ------------ ---------- ------ ------ ----------- ------  -  -------
            3016.d_server root           R3017 STDIN     10824    1       1       --           --    R 00:00
            Job run at Fri Jan 10 at 22:58 on (d_server:ncpus=1)
            [root@d_server /]# pbs_rstat
            Resv ID Queue User State Start / Duration / End
            ---------------------------------------------------------------------
            R3017.d_se R3017 root@d_s RN Today 22:58 / 157680000 / Wed Jan 08 2025 2
            [root@d_server /]#

          2. Example showing creating a reservation out of a job is in the file hook demo.txt .
      2. Points 1.d.iii - 1.d.xii apply here as well.
  3. Interface 3: A new reservation attribute "reserve_job"
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: Allow users to identify if the reservation is created out of a job.
    4. Details:
      1. Example:
        1. [root@d_server /]# pbs_rstat -f | grep job
          reserve_job = 3016.d_server
          [root@d_server /]#
  4. Interface 4: pbs_rsub error message when creating a reservation out of a reservation job.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
    4. Details:
      1. Example:
        1. [root@d_server /]# pbs_rsub --job 3016
          pbs_rsub: Reservation cannot be created from a reservation job
          [root@d_server /]#
  5. Interface 4: pbs_rsub error message when creating a reservation out of an array job.
    1. Visibility: public
    2. Change Control: Stable
    3. Synopsis: A new error message indicating that creating a reservation out of a reservation job is not allowed.
    4. Details:
      1. Example:
        1. [root@d_server /]# pbs_rsub --job 3[]
          pbs_rsub: Reservation cannot be created from an array job
          [root@d_server /]#
        2. [root@d_server /]# pbs_rsub --job 3[1]
          pbs_rsub: Reservation cannot be created from an array job
          [root@d_server /]#