Follow the PBS Pro Design Document Guidelines.
This is to enhance the "node ramp down" feature, by introducing a new option "-k <node count>" ("k" for "keep") to the pbs command "pbs_release_nodes". This will allow users or admins to retain some of the sister nodes (exec_host), while performing node ramp down operation. The number of sister nodes to be kept is specified by the argument to this new option.
pbs_release_nodes [-j <job ID>] <vnode> [<vnode> [<vnode>] ...]
pbs_release_nodes [-j <job ID>] -a
pbs_release_nodes [-j <job ID>] -k <select statement>
pbs_release_nodes [-j <job ID>] -k <node count>
pbs_release_nodes --version
$ qsub -l select=4:model=abc:ncpus=5+3:model=abc:bigmem=true:ncpus=1+2:model=def:ncpus=32 job.scr
121.pbssrv
Now grepping for assigned vnodes we may see :
$ qstat -f 121 | egrep "exec_vnode|exec_host"
exec_host = nd_abc_1/0*5+nd_abc_2/0*5+nd_abc_3/0*5+nd_abc_3/1*5+nd_abc_4_bm/0*1+nd_abc_5_bm/0*1+nd_abc_6_bm/0*1+nd_def_1/0*32+nd_def_2/0*32
exec_vnode = (nd_abc_1:ncpus=5)+(nd_abc_2:ncpus=5)+(nd_abc_3[0]:ncpus=5)+(nd_abc_3[1]:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1)+(nd_abc_6_bm:ncpus=1)+(nd_def_1:ncpus=32)+(nd_def_2:ncpus=32)
Here, total of 9 nodes/hosts are assigned to the job. One mother superior node: first chunk "(nd_abc_1:ncpus=5)" and 8 sister nodes/hosts. (Note the host "nd_abc_3" is repeated twice in the exec_host)
and node statuses as :
$ pbsnodes -av
nd_abc_1
Mom = nd_abc_1.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 5
resources_assigned.ncpus = 5nd_abc_2
Mom = nd_abc_2.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 5
resources_assigned.ncpus = 5nd_abc_3[0]
Mom = nd_abc_3.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 5
resources_assigned.ncpus = 5nd_abc_3[1]
Mom = nd_abc_3.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = abc
resources_available.ncpus = 5
resources_assigned.ncpus = 5nd_abc_4_bm
Mom = nd_abc_4_bm.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.bigmem = True
resources_available.model = abc
resources_available.ncpus = 1
resources_assigned.ncpus = 1nd_abc_5_bm
Mom = nd_abc_5_bm.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.bigmem = True
resources_available.model = abc
resources_available.ncpus = 1
resources_assigned.ncpus = 1nd_abc_6_bm
Mom = nd_abc_6_bm.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.bigmem = True
resources_available.model = abc
resources_available.ncpus = 1
resources_assigned.ncpus = 1nd_def_1
Mom = nd_def_1.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = def
resources_available.ncpus = 32
resources_assigned.ncpus = 32nd_def_2
Mom = nd_def_2.pbspro.com
state = job-busy
jobs = 120.pbssrv/0
resources_available.model = def
resources_available.ncpus = 32
resources_assigned.ncpus = 32
Now if we do a pbs_release_nodes with the new "-k" option having a argument of "3":
$ pbs_release_nodes -j 121 -k 3
may release the nodes (nd_abc_2:ncpus=5)+(nd_abc_3[0]:ncpus=5)+(nd_abc_3[1]:ncpus=1)+(nd_def_1:ncpus=32)+(nd_def_2:ncpus=32) from the job while retaining the nodes (nd_abc_1:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1)+(nd_abc_6_bm:ncpus=1).
The new phase of the job will have below vnodes associated with it
$ qstat -f 121| egrep "exec_vnode|exec_host"
exec_host = nd_abc_1/0*5+nd_abc_4_bm/0*1+nd_abc_5_bm/0*1+nd_abc_6_bm/0*1
exec_vnode = (nd_abc_1:ncpus=5)+(nd_abc_4_bm:ncpus=1)+(nd_abc_5_bm:ncpus=1)+(nd_abc_6_bm:ncpus=1)
The same result in the previous example can be achieved by using the below select string as an argument to the "-k' option see Ref[3].
$ pbs_release_nodes -j 121 -k select=3
pbs_release_nodes: <sub select string>
pbs_release_nodes: -a and -k options cannot be used together
pbs_release_nodes: cannot supply node list with -k option
pbs_release_nodes: Server returned error 15010 for job
Project Documentation Main Page