PP-748: Move the scheduler configurations files to qmgr
Draft Version: Not ready to be posted on the forum yet
Introduction:
Use case:
admins want to be able to configure the scheduler without having to modify one or more of the 4 root owned files and then need root permission to restart the scheduler or HUP it.
PBS managers want to be able to change the scheduler configuration without having to have root access
Gist of design proposal::
Move the scheduler configuration files into qmgr and allow pbs managers the ability to modify the settings
Design proposal mentioned below tends to address both these problem.
- Interface 1: New policy object
- Visibility: Public
- Change Control: Stable
- Details:
- Admins will now be allowed to create policy objects and give a name to these policy object.
- Admins can then assign these policy objects to specific schedulers, they can have one policy object assigned to more than one scheduler.
- pbs manager will have access to create/modify/delete policy objects
- One can delete a policy object only when it is not assigned to any scheduler.
- Example:
qmgr -c "c policy p1"
qmgr -c "s p p1 by_queue=False, strict_ordering=True"
qmgr -c "s sched scheduler1 policy=p1" Below is the list of policies that reside in the policy attribute of scheduler.
Policy name Type Default value example round_robin Boolean round_robin=False qmgr -c "s policy p1 round_robin=True" by_queue Boolean by_queue=True qmgr -c "s policy p1 by_queue=True" strict_ordering Boolean strict_ordering=False qmgr -c "s policy p1 strict_ordering=True" help_starving_jobs Boolean help_starving_jobs=True qmgr -c "s policy p1 help_starving_jobs=True" max_starve string max_starve="24:00:00" qmgr -c "s policy p1 max_starve=24:00:00" node_sort_key array_string node_sort_key = "sort_priority HIGH" qmgr -c 's policy p1 node_sort_key="sort_priority HIGH, ncpus HIGH"' provision_policy string provision_policy="aggressive_provision" qmgr -c "s policy p1 provision_policy="aggressive_provision" exclude_resources array_string NOT SET BY DEFAULT qmgr -c 's policy p1 exclude_resources="vmem, color"' load_balancing Boolean load_balancing=False qmgr -c "s policy p1 load_balancing=True" fairshare Boolean fairshare=False qmgr -c "s policy p1 fairshare=True" fairshare_group fs_group NOT SET BY DEFAULT qmgr -c "s policy p1 fairshare_group=fs_grp1" preemption Boolean preemption=True qmgr -c "s policy p1 preemption=True" preempt_queue_prio integer preempt_queue_prio=150 qmgr -c "s policy p1 preempt_queue_prio=190" preempt_prio string preempt_prio="express_queue, normal_jobs" qmgr -c 's policy p1 preempt_prio="starving_jobs, normal_jobs, starving_jobs+fairshare"' preempt_order string preempt_order="SCR" qmgr -c 's policy p1 preempt_order="SCR 70 SC 30"' preempt_sort string preempt_sort="min_time_since_start" qmgr -c 's policy p1 preempt_sort="min_time_since_start"' peer_queue array_string NOT SET BY DEFAULT qmgr -c 's policy p1 peer_queue=" workq workq@svr1" server_dyn_res array_string NOT SET BY DEFAULT qmgr -c 's policy p1 server_dyn_res="mem !/bin/get_mem"' dedicated_queues array_string NOT_SET_BY_DEFAULT qmgr -c 's policy p1 dedicated_queues="queue1,queue2"' log_event integer log_event=4607 qmgr -c "s policy p1 log_event=255" job_sort_formula string NOT SET BY DEFAULT qmgr -c 's policy p1 job_sort_formula="ncpus*walltime"' backfill_depth integer Set to 1 by default qmgr -c 's policy p1 backfill_depth=1' job_sort_key array_string NOT_SET_BY_DEFAULT qmgr -c 's policy p1 job_sort_key="ncpus HIGH, mem LOW"' time_window_spill (formerly prime_spill) string NOT_SET_BY_DEFAULT qmgr -c 's policy p1 prime_spill="01:00:00"' prime_exempt_anytime_queues Boolean prime_exempt_anytime_queues=false qmgr -c 's policy p1 prime_exempt_anytime_queues=false' backfill_prime Boolean backfill_prime=false qmgr -c 's policy p1 backfill_prime=false' comment string NOT_SET_BY_DEFAULT qmgr -c 's policy p1 comment="afterhours policy" - Following are the configurations that are moved/removed:
- mom_resources - removed (mom periodic hooks can update custom resources)
- unknown_shares - moved to resource_group file.
- smp_cluster_dist - It was already deprecated, removed now
- sort_queues - It was already deprecated, removed now
- nonprimetime_prefix - New policy object does not differentiate between prime/non-prime time
- primetime_prefix - New policy object does not differentiate between prime/non-prime time
- resources - New policy object will now list the resources that needs to be excluded from scheduling. By default all resources will be used for scheduling.
- dedicated_prefix - New policy object will expose "dedicated_queues" which is a list of queues associated with dedicated time.
- preemptive_sched - This has been renamed to "preemption".
- log_filter - log_filter has been renamed to "log_event" to be in sync with the option server object exposes.
- Admin will now be allowed to add different policy object for prime/non-prime time.
- If the values of "policy" scheduler attribute is prefixed with "p:", it will be considered as prime-time policy.
- If the values of "policy" scheduler attribute is prefixed with "np:", it will be considered as non-prime-time policy.
- Policy name specified without any prefix will be used as all time policy.
- Admin will not be allowed to submit a prime/non-prime policy unless an "all time policy" (without any prefix) is specified. On doing so following error will be throw
- qmgr -c "s sched sched1 policy+='p:p1'"
Cannot set prime/non-prime time policy without setting an all time policy
- qmgr -c "s sched sched1 policy+='p:p1'"
- More than one policy object can be specified at the same time in policy scheduler attribute.
- example: qmgr -c "s sched sched1 policy=p:p1,np:p2"
- a primetime policy/non-primetime policy/all time policy can not be specified more than once while setting scheduler's policy attribute.
During dedicated time, if prime and non-prime time policies are defined then scheduler will use "prime" time policy to schedule jobs from dedicated queues, else it will apply all time policy.- If one wants to use policies mentioned under old sched config file then they need to keep a copy of the config file in the directory mentioned under "sched_priv" attribute.
- If both policy and sched_config files are present then sched_config file will be ignored.
- One can unset all the policies in one shot using "qmgr -c "unset sched <sched_name> policy" and this will make scheduler read the sched_config file in the next iteration.
- If there is any change in policy object, it will take effect in the very next cycle it's corresponding scheduler(s) runs. Schedulers do not need SIGHUP to have the change in policy object take effect.
- PBS server will create a default policy object with the name "default_policy".
- In case of upgrades from a PBS version that lacks this multi-scheduler functionality, PBS server will read the sched_config policy file from sched_priv directory and create a "default_policy" object.
- If sched_config file has different policies for prime/non-prime time then server will create "default_policy_prime", "default_policy_non_prime" policy objects and assign them to default scheduler.
- during upgrades all the prime/non-prime/dedicated time queues will have their execution_type attribute modified to "execution_prime", "execution_non_prime", "execution_dedicated" as mentioned in Interface 6 of this document.
- If it's a fresh installation then pbs server will provide "default_policy" with all the default values present in interface 3 of this document.
- PBS server will assign this default policy to the default scheduler object. It will only configure all time policy and no prime/non-prime time policies.
- the default_policy cannot be deleted
- In case of upgrades from a PBS version that lacks this multi-scheduler functionality, PBS server will read the sched_config policy file from sched_priv directory and create a "default_policy" object.
- fairshare attribute can only be set to true if fairshare_group has been set to a fs_group object.
Interface 2: New time_window object
- Visibility: Public
- Change Control: Stable
- Details:
- Admins will now be allowed to create time_window objects and name these object.
- Admins can then assign these time_window objects to specific queues, they can have multiple time_window object assigned to a given queue as long as they don't overlap.
- pbs manager will have access to create/modify/delete time_window objects
- One can only delete a time_window object only when it is not assigned to any queue.
- Example:
qmgr -c "c time_window t1"
qmgr -c "s t t1 rrule='some valid RRULE'"
qmgr -c "s t t1 start_time=0600" Below is the list of attributes that reside in the time_window object.
Attribute Type Default Example rrule string NOT_SET_BY_DEFAULT s t t1 rrule="RRULE:FREQ=WEEKLY;WKST=SU;BYDAY=MO,TU,WE,TH,FR" applies_on_holidays Boolean True s t t1 applies_on_holidays = False start_time time NOT_SET_BY_DEFAULT s t t1 start_time=06:00:00 end_time time NOT_SET_BY_DEFAULT s t t1 end_time=09:00:00 type string standard s t t1 type=dedicated comment string NOT_SET_BY_DEFAULT s t t1 comment="nightly data processing"
Interface 3: New non_work_days attribute
- Visibility: Public
- Change Control: Stable
- Details:
- Admins will now be allowed to set a new string_array attribute non_work_days at the server object level.
- pbs manager will have the ability to set/modify/unset the non_work_days server attribute
Example:
qmgr >> set server non_work_days = "FREQ=YEARLY;BYMONTH=1;BYMONTHDAY=1" (First of the year - Maybe we should allow 01/01 and we generate this rule)
qmgr >> set server non_work_days += "FREQ=YEARLY;BYDAY=MO;BYSETPOS=-1;BYMONTH=5"
qmgr >> set server non_work_days += "FREQ=WEEKLY;BYDAY=FR;INTERVAL=2"
Interface 4: New fs_group object
- Visibility: Public
- Change Control: Stable
- Details:
- Admins will now be allowed to create a fs_group object.
- pbs manager will have access to create/modify/delete fs_group objects
Example:
qmgr >> create fs_group f1
qmgr >> set fs_group f1 usage_formula = "cput*fs_factor"
qmgr >> set fs_group f1 decay_time=18:00:00Below is the list of attributes that reside in the fs_group object.
Attribute Type Default Example tree_element string_array NOT_SET_BY_DEFAULT set fs_group f1 tree_element = [ group01:root = 30 ] usage_formula string NOT_SET_BY_DEFAULT set fs_group f1 usage_formula = "ncpus*walltime" entitiy string euser set fs_group f1 entity = queue decay_time duration 24:00:00 set fs_group f1 decay_time = 06:00:00 decay_factor integer .5 set fs_group f1 decay_factor = .8 enforce_no_shares Boolean False set fs_group f1 enforce_no_shares = True unknown_shares integer 10 set fs_group f1 unknown_shares = 20
comment string NOT_SET_BY_DEFAULT s fs_group f1 comment="Big data fair share settings" - upon creation of a fs_group object it will contain the defaults and can immediately put into use
Interface 5: Changes to sched object
- Visibility: Public
- Change Control: Stable
- Details:
- Scheduler now has additional attributes which can be set in order to run it.
- policy - collection of various attributes (as mentioned below) which can be used to configure scheduler.
- Scheduler can now accept a set of policy that it can work on:
- Policy can be specified by using - qmgr -c "s sched <sched_name> policy=<policy object>" command.
- Scheduler now has additional attributes which can be set in order to run it.
Interface 6: Changes to PBS server.
- Visibility: Public
- Change Control: Stable
- Details:
- backfill_depth will also be an attribute of scheduler's policy object.
- If scheduler is configured to use sched_config instead of policy object, then it will take value of backfill_depth from server object.
- If scheduler is configured to use policy object instead of sched_config file, then it will take value of backfill_depth from scheduler's policy object.
- If there is backfill_depth set on per queue level then that value will take precedence over the value set in sched policy object or server object.
- These attributes now belong to a scheduler object and needs to be set on scheduler object using a scheduler name
- qmgr -c "s policy p1 backfill_depth = 3"
- Setting these attributes on server will result into following warning:
- qmgr -c "s s backfill_depth=3"
- qmgr: Warning: backfill_depth in server is deprecated. Set backfill_depth in a policy object.
- Attribute job_sort_formula has been moved from server to scheduler policy attribute.
- A policy change event (prime to non-prime, non-prime to prime) for any of the running schedulers will now trigger a scheduling cycle.
- backfill_depth will also be an attribute of scheduler's policy object.
Interface 10: New ifl api to list all the policy objects.
- Visibility: Public
- Change Control: Stable
- Details:
- A new ifl call is added to query the policy object from server -
struct batch_status *pbs_statpolicy(int, char *, struct attrl *, char *); - First argument of the call is to provide the server connection handle, second argument takes the policy name as the input, third argument if to provide attribute and fourth argument is to pass any extended parameters.
- This IFL call will return a batch_status as a response.
- If server is unable to find the requested policy object, return value of the ifl api would be NULL and pbs_errno will be set to "15212" (PBSE_POLICY_NOT_FOUND).
- A new ifl call is added to query the policy object from server -
Interface 11: New ifl api to list all the time_window objects.
- Visibility: Public
- Change Control: Stable
- Details:
- A new ifl call is added to query the policy object from server -
struct batch_status *pbs_stattimewindow(int, char *, struct attrl *, char *); - First argument of the call is to provide the server connection handle, second argument takes the time_window name as the input, third argument if to provide attribute and fourth argument is to pass any extended parameters.
- This IFL call will return a batch_status as a response.
- If server is unable to find the requested time_window object, return value of the ifl api would be NULL and pbs_errno will be set to "15212" (PBSE_TIME_WINDOW_NOT_FOUND).
- A new ifl call is added to query the policy object from server -
Interface 12: New ifl api to list all the fs_group objects.
- Visibility: Public
- Change Control: Stable
- Details:
- A new ifl call is added to query the fs_group object from server -
struct batch_status *pbs_statfsgroup(int, char *, struct attrl *, char *); - First argument of the call is to provide the server connection handle, second argument takes the fs_group name as the input, third argument if to provide attribute and fourth argument is to pass any extended parameters.
- This IFL call will return a batch_status as a response.
- If server is unable to find the requested policy object, return value of the ifl api would be NULL and pbs_errno will be set to "15212" (PBSE_FS_GROUP_NOT_FOUND).
- A new ifl call is added to query the fs_group object from server -