PP-748: Move the scheduler configurations files to qmgr

Draft Version: Not ready to be posted on the forum yet

Introduction:

Use case: 

admins want to be able to configure the scheduler without having to modify one or more of the 4 root owned files and then need root permission to restart the scheduler or HUP it.

PBS managers want to be able to change the scheduler configuration without having to have root access

Gist of design proposal::

Move the scheduler configuration files into qmgr and allow pbs managers the ability to modify the settings


Design proposal mentioned below tends to address both these problem.

Forum discussion


  • Interface 1: New policy object
    • Visibility: Public
    • Change Control: Stable
    • Details:
      • Admins will now be allowed to create policy objects and give a name to these policy object.
      • Admins can then assign these policy objects to specific schedulers, they can have one policy object assigned to more than one scheduler.
      • pbs manager will have access to create/modify/delete policy objects
      • One can delete a policy object only when it is not assigned to any scheduler.
      • Example: 
        qmgr -c "c policy p1"
        qmgr -c "s p p1 by_queue=False, strict_ordering=True"
        qmgr -c "s sched scheduler1 policy=p1"

      • Below is the list of policies that reside in the policy attribute of scheduler.

        Policy nameTypeDefault valueexample
        round_robinBooleanround_robin=Falseqmgr -c "s policy p1 round_robin=True"
        by_queueBooleanby_queue=Trueqmgr -c "s policy p1 by_queue=True"
        strict_orderingBooleanstrict_ordering=Falseqmgr -c "s policy p1 strict_ordering=True"
        help_starving_jobsBooleanhelp_starving_jobs=Trueqmgr -c "s policy p1 help_starving_jobs=True"
        max_starvestringmax_starve="24:00:00"qmgr -c "s policy p1 max_starve=24:00:00"
        node_sort_keyarray_stringnode_sort_key = "sort_priority HIGH"qmgr -c 's policy p1 node_sort_key="sort_priority HIGH, ncpus HIGH"'
        provision_policystringprovision_policy="aggressive_provision"qmgr -c "s policy p1 provision_policy="aggressive_provision"
        exclude_resourcesarray_stringNOT SET BY DEFAULTqmgr -c 's policy p1 exclude_resources="vmem, color"'
        load_balancingBooleanload_balancing=Falseqmgr -c "s policy p1 load_balancing=True"
        fairshareBooleanfairshare=Falseqmgr -c "s policy p1 fairshare=True"
        fairshare_groupfs_groupNOT SET BY DEFAULTqmgr -c "s policy p1 fairshare_group=fs_grp1"
        preemptionBooleanpreemption=Trueqmgr -c "s policy p1 preemption=True"
        preempt_queue_priointegerpreempt_queue_prio=150qmgr -c "s policy p1 preempt_queue_prio=190"
        preempt_priostringpreempt_prio="express_queue, normal_jobs"qmgr -c 's policy p1 preempt_prio="starving_jobs, normal_jobs, starving_jobs+fairshare"'
        preempt_orderstringpreempt_order="SCR"qmgr -c 's policy p1 preempt_order="SCR 70 SC 30"'
        preempt_sortstringpreempt_sort="min_time_since_start"qmgr -c 's policy p1 preempt_sort="min_time_since_start"'
        peer_queuearray_stringNOT SET BY DEFAULTqmgr -c 's policy p1 peer_queue=" workq workq@svr1"
        server_dyn_resarray_stringNOT SET BY DEFAULTqmgr -c 's policy p1 server_dyn_res="mem !/bin/get_mem"'
        dedicated_queuesarray_stringNOT_SET_BY_DEFAULTqmgr -c 's policy p1 dedicated_queues="queue1,queue2"'
        log_eventintegerlog_event=4607qmgr -c "s policy p1 log_event=255"
        job_sort_formulastringNOT SET BY DEFAULTqmgr -c 's policy p1 job_sort_formula="ncpus*walltime"'
        backfill_depthintegerSet to 1 by defaultqmgr -c 's policy p1 backfill_depth=1'
        job_sort_keyarray_stringNOT_SET_BY_DEFAULTqmgr -c 's policy p1 job_sort_key="ncpus HIGH, mem LOW"'
        time_window_spill (formerly prime_spill)stringNOT_SET_BY_DEFAULTqmgr -c 's policy p1 prime_spill="01:00:00"'
        prime_exempt_anytime_queuesBooleanprime_exempt_anytime_queues=falseqmgr -c 's policy p1 prime_exempt_anytime_queues=false'
        backfill_primeBooleanbackfill_prime=falseqmgr -c 's policy p1 backfill_prime=false'
        commentstringNOT_SET_BY_DEFAULTqmgr -c 's policy p1 comment="afterhours policy"
      • Following are the configurations that are moved/removed:
        • mom_resources - removed (mom periodic hooks can update custom resources)
        • unknown_shares - moved to resource_group file.
        • smp_cluster_dist - It was already deprecated, removed now
        • sort_queues - It was already deprecated, removed now
        • nonprimetime_prefix - New policy object does not differentiate between prime/non-prime time 
        • primetime_prefix - New policy object does not differentiate between prime/non-prime time 
        • resources - New policy object will now list the resources that needs to be excluded from scheduling. By default all resources will be used for scheduling.
        • dedicated_prefix - New policy object will expose "dedicated_queues" which is a list of queues associated with dedicated time.
        • preemptive_sched - This has been renamed to "preemption".
        • log_filter - log_filter has been renamed to "log_event" to be in sync with the option server object exposes.
      • Admin will now be allowed to add different policy object for prime/non-prime time. 
        • If the values of "policy" scheduler attribute is prefixed with "p:", it will be considered as prime-time policy.
        • If the values of "policy" scheduler attribute is prefixed with "np:", it will be considered as non-prime-time policy.
        • Policy name specified without any prefix will be used as all time policy.
        • Admin will not be allowed to submit a prime/non-prime policy unless an "all time policy" (without any prefix) is specified. On doing so following error will be throw
          • qmgr -c "s sched sched1 policy+='p:p1'"
            Cannot set prime/non-prime time policy without setting an all time policy
        • More than one policy object can be specified at the same time in policy scheduler attribute.
          • example: qmgr -c "s sched sched1 policy=p:p1,np:p2"
          • a primetime policy/non-primetime policy/all time policy can not be specified more than once while setting scheduler's policy attribute.
      • During dedicated time, if prime and non-prime time policies are defined then scheduler will use "prime" time policy to schedule jobs from dedicated queues, else it will apply all time policy.
      • If one wants to use policies mentioned under old sched config file then they need to keep a copy of the config file in the directory mentioned under "sched_priv" attribute.
      • If both policy and sched_config files are present then sched_config file will be ignored.
      • One can unset all the policies in one shot using "qmgr -c "unset sched <sched_name> policy" and this will make scheduler read the sched_config file in the next iteration.
      • If there is any change in policy object, it will take effect in the very next cycle it's corresponding scheduler(s) runs. Schedulers do not need SIGHUP to have the change in policy object take effect.
      • PBS server will create a default policy object with the name "default_policy".
        • In case of upgrades from a PBS version that lacks this multi-scheduler functionality, PBS server will read the sched_config policy file from sched_priv directory and create a "default_policy" object.
          • If sched_config file has different policies for prime/non-prime time then server will create "default_policy_prime", "default_policy_non_prime" policy objects and assign them to default scheduler.
          • during upgrades all the prime/non-prime/dedicated time queues will have their execution_type attribute modified to "execution_prime", "execution_non_prime", "execution_dedicated" as mentioned in Interface 6 of this document.
        • If it's a fresh installation then pbs server will provide "default_policy" with all the default values present in interface 3 of this document.
        • PBS server will assign this default policy to the default scheduler object. It will only configure all time policy and no prime/non-prime time policies.
        • the default_policy cannot be deleted
      • fairshare attribute can only be set to true if fairshare_group has been set to a fs_group object.


Interface 2: New time_window object

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • Admins will now be allowed to create time_window objects and name these object.
    • Admins can then assign these time_window objects to specific queues, they can have multiple time_window object assigned to a given queue as long as they don't overlap.
    • pbs manager will have access to create/modify/delete time_window objects
    • One can only delete a time_window object only when it is not assigned to any queue.
    • Example: 
      qmgr -c "c time_window t1"
      qmgr -c "s t t1 rrule='some valid RRULE'"
      qmgr -c "s t t1 start_time=0600"

    • Below is the list of attributes that reside in the time_window object.

      AttributeTypeDefaultExample
      rrulestringNOT_SET_BY_DEFAULTs t t1 rrule="RRULE:FREQ=WEEKLY;WKST=SU;BYDAY=MO,TU,WE,TH,FR"
      applies_on_holidaysBooleanTrues t t1 applies_on_holidays = False
      start_timetimeNOT_SET_BY_DEFAULTs t t1 start_time=06:00:00
      end_timetimeNOT_SET_BY_DEFAULTs t t1 end_time=09:00:00
      typestringstandards t t1 type=dedicated
      commentstringNOT_SET_BY_DEFAULTs t t1 comment="nightly data processing"

Interface 3: New non_work_days attribute

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • Admins will now be allowed to set a new string_array attribute non_work_days at the server object level.
    • pbs manager will have the ability to set/modify/unset the non_work_days server attribute
    • Example: 
      qmgr >> set server non_work_days = "FREQ=YEARLY;BYMONTH=1;BYMONTHDAY=1" (First of the year - Maybe we should allow 01/01 and we generate this rule)
      qmgr >> set server non_work_days += "FREQ=YEARLY;BYDAY=MO;BYSETPOS=-1;BYMONTH=5"
      qmgr >> set server non_work_days += "FREQ=WEEKLY;BYDAY=FR;INTERVAL=2"


Interface 4: New fs_group object

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • Admins will now be allowed to create a fs_group object.
    • pbs manager will have access to create/modify/delete fs_group objects
    • Example: 
      qmgr >> create fs_group f1
      qmgr >> set fs_group f1 usage_formula = "cput*fs_factor"
      qmgr >> set fs_group f1 decay_time=18:00:00

    • Below is the list of attributes that reside in the fs_group object.

      AttributeTypeDefaultExample
      tree_elementstring_arrayNOT_SET_BY_DEFAULTset fs_group f1 tree_element = [ group01:root = 30 ]
      usage_formulastringNOT_SET_BY_DEFAULTset fs_group f1 usage_formula = "ncpus*walltime"
      entitiystringeuserset fs_group f1 entity = queue
      decay_timeduration24:00:00set fs_group f1 decay_time = 06:00:00
      decay_factorinteger.5set fs_group f1 decay_factor = .8
      enforce_no_sharesBooleanFalseset fs_group f1 enforce_no_shares = True
      unknown_sharesinteger10

      set fs_group f1 unknown_shares = 20

      commentstringNOT_SET_BY_DEFAULTs fs_group f1 comment="Big data fair share settings"
    • upon creation of a fs_group object it will contain the defaults and can immediately put into use


Interface 5: Changes to sched object

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • Scheduler now has additional attributes which can be set in order to run it.
      • policy - collection of various attributes (as mentioned below) which can be used to configure scheduler.
    • Scheduler can now accept a set of policy that it can work on:
      • Policy can be specified by using - qmgr -c "s sched <sched_name> policy=<policy object>" command.


Interface 6: Changes to PBS server.

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • backfill_depth will also be an attribute of scheduler's policy object. 
      • If scheduler is configured to use sched_config instead of policy object, then it will take value of backfill_depth from server object.
      • If scheduler is configured to use policy object instead of sched_config file, then it will take value of backfill_depth from scheduler's policy object.
      • If there is backfill_depth set on per queue level then that value will take precedence over the value set in sched policy object or server object.
    • These attributes now belong to a scheduler object and needs to be set on scheduler object using a scheduler name
      • qmgr -c "s policy p1 backfill_depth = 3"
    • Setting these attributes on server will result into following warning:
      • qmgr -c "s s backfill_depth=3"
      • qmgr: Warning: backfill_depth in server is deprecated. Set backfill_depth in a policy object.
    • Attribute job_sort_formula has been moved from server to scheduler policy attribute.
    • A policy change event (prime to non-prime, non-prime to prime) for any of the running schedulers will now trigger a scheduling cycle.


Interface 10: New ifl api to list all the policy objects.

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new ifl call is added to query the policy object from server -
      struct batch_status *pbs_statpolicy(int, char *, struct attrl *, char *);
    • First argument of the call is to provide the server connection handle, second argument takes the policy name as the input, third argument if to provide attribute and fourth argument is to pass any extended parameters.
    • This IFL call will return a batch_status as a response. 
    • If server is unable to find the requested policy object, return value of the ifl api would be NULL and pbs_errno will be set to "15212" (PBSE_POLICY_NOT_FOUND).


Interface 11: New ifl api to list all the time_window objects.

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new ifl call is added to query the policy object from server -
      struct batch_status *pbs_stattimewindow(int, char *, struct attrl *, char *);
    • First argument of the call is to provide the server connection handle, second argument takes the time_window name as the input, third argument if to provide attribute and fourth argument is to pass any extended parameters.
    • This IFL call will return a batch_status as a response. 
    • If server is unable to find the requested time_window object, return value of the ifl api would be NULL and pbs_errno will be set to "15212" (PBSE_TIME_WINDOW_NOT_FOUND).


Interface 12: New ifl api to list all the fs_group objects.

  • Visibility: Public
  • Change Control: Stable
  • Details:
    • A new ifl call is added to query the fs_group object from server -
      struct batch_status *pbs_statfsgroup(int, char *, struct attrl *, char *);
    • First argument of the call is to provide the server connection handle, second argument takes the fs_group name as the input, third argument if to provide attribute and fourth argument is to pass any extended parameters.
    • This IFL call will return a batch_status as a response. 
    • If server is unable to find the requested policy object, return value of the ifl api would be NULL and pbs_errno will be set to "15212" (PBSE_FS_GROUP_NOT_FOUND).