Node maintenance window enhancement

Node maintenance window enhancement

It is a common use case that nodes need some maintenance and the admin knows the maintenance window in advance. It is difficult to plan such a maintenance window(s) in the PBS now. This new feature enhances reservations in order to provide proper maintenance windows.

Interface: New '--hosts' option to PBS command 'pbs_rsub'

  • Visibility: public

  • Change Control: Stable

  • Synopsis: The new option allows submitting a special reservation and this reservation is allowed to 'run' on unavailable nodes.

  • Details: pbs_rsub with '--hosts' option is allowed to be run only by managers and operators. The resources 'place' and 'select' are generated automatically and they are forbidden to combine with '--hosts'. Combining these resources with '--hosts' results in printing 'usage' help. 

    • The syntax of pbs_rsub with '--hosts' option requires list of hosts: ' <host1> <host2> <host3> ...'

    • The placement of this reservation is always: '-l place=exclhost'

    • The select is generated by the hosts like this: '-l select=host=<host1>:ncpus=<ncpus_host1>+host=<host2>:ncpus=<ncpus_host2>+host=<host3>:ncpus=<ncpus_host3>+...'

    • The resv_nodes of this reservation is created in order to request all the ncpus of all vnodes on requested hosts.

    • This reservation is confirmed immediately after submission by the pbs_rsub command and overlapping reservations are degraded and will be reconfirmed in the next scheduler iteration.

    • The resv_nodes of overlapping reservations is modified and the requested vnodes are removed from the resv_nodes. This means that for running reservations no new job will start on overlapping nodes.

    • Overlapping running jobs are ignored and it is up to the administrators to deal with these jobs.

    • Reservation submitted with '--hosts' ignores resv_enable attribute on nodes.

    • The reservation prefix is 'M', which stands for maintenance.

    • Submitting this reservation will not invoke the scheduler iteration.

Interface: New extend parameter 'm' to IFL function 'char *pbs_submit_resv(int connect, struct attropl *attrib, char *extend)'

  • Visibility: public

  • Change Control: Stable

  • Synopsis: The new extend parameter modifies reservation id prefix.

  • Details: When the extend parameter includes 'm', the returned reservation id is prefixed with 'M'. This is available only for managers and operators and for others the PBSE_PERM (15007) is returned.

Interface: New reservation substate 'RESV_IN_CONFLICT'

  • Visibility: public

  • Change Control: Stable

  • Synopsis: The substate means that reservation is overlapping the 'M'-reservation.

  • Details: The RESV_IN_CONFLICT (shortcut is 'IC') has a similar impact as substate RESV_DEGRADED. The difference is that the reservation with the new substate is reconfirmed even if all the nodes of the reservation are up.