PP-337: Multiple schedulers servicing the PBS cluster

PP-337: Multiple schedulers servicing the PBS cluster

Introduction:

Use case: 

As clusters get larger and workloads vary it is becoming critical that the jobs get evaluated in as short as time possible to ensure that the correct workload is being run. Using multiple schedulers to address this issue can allow for different scheduling policies and quicker turnaround time for large number of jobs or nodes.

Gist of design proposal::

PBS scheduler in it's current form can run easily run in multiple instances on the same machine. There are only two major problems that we have to deal with:

  • Managing the scheduler - This includes starting the scheduler, configuring them, making PBS server connect to each one of them and then make them run on specific events.

  • Make sure that schedulers do not overrun on each other's territory. Make sure that they run on clearly partitioned complex in terms of jobs and nodes.


Design proposal mentioned below tends to address both these problem.

Forum discussion

 

  • Interface 1: Extend PBS to support a list of scheduler objects

    • Visibility: Public

    • Change Control: Stable

    • Details:

      • PBS supports a list of scheduler objects to be created using qmgr. It is similar to how we create nodes in server.

      • qmgr command can be used to create a scheduler object . It must be invoked by a PBS admin/manager.

      • To create a scheduler object and make it run, the following are the attributes that can be set by the user

        • Name of the scheduler is mandatory to be given while creating a scheduler object. 

          • qmgr -c "c sched multi_sched_1"

            • This will create/set the following attributes for the sched object

              • sched_port - port number on which multi_sched_1 is going to run. This is mandatory parameter and should be given before start of this scheduler.

              • sched_host - host name on which multi_sched_1 is going to run. This is mandatory parameter and should be given before start of this scheduler.

              • partition = "None" (default)*

              • sched_priv = $PBS_HOME/sched_priv_multi_sched_1 (default)*

              • sched_log = $PBS_HOME/sched_log_multi_sched_1 (default)*

              • scheduling = False (default)*

              • scheduler_iteration = 600 (default)*

              • comment 

                • sites can use the comment field to

                • notify them if scheduler undergoes restarting 2-3 times due to potential crashes in an hour for example (i.e. comment => “NEEDS_ATTENTION”)

                • tell when a particular scheduler is ready to function again by setting the comment as follows.

                  • comment => “READY_TO_USE

 

Interface 2: Changes to PBS scheduler

Interface 3: Removed

 

Interface 4: Changes to PBS server.

  • Visibility: Public
    Change Control: StableDetails:

    • PBS does not allow attributes like scheduling, scheduler_iteration to be set on PBS server object.

    • scheduling and scheduler_iteration now belong to the sched object

      • During failover when secondary server takes control it will try to connect to connect to schedulers by using their sched_host attribute.

        • If secondary server is unable to connect to scheduler running on remote host then it will start that scheduler locally and update it's "sched_host" attribute.

        • When Primary pbs server takes control from secondary it will always check if scheduler's sched_host attribute matches it's server name, if it doesn't then it will shutdown the remote scheduler and spawn it locally on primary server.

      • If set at the server level, the changes will be applied to the default sched object

    • As backward compatibility PBS still allows attributes like scheduling, scheduler_iteration to be set on PBS server object. Any changes made to these attributes are automatically reflected in default scheduler. Similarly if any changes are made to these attributes in default scheduler, they are automatically reflected in the server object.

    • If at any point in time if Server is not able to contact or reach the corresponding scheduler one of the following messages are shown in server_logs.
             Unable to reach scheduler associated with partition [<partition id>]
             Unable to reach scheduler associated with job <job id>

 

Interface 5: Changes to PBS Nodes objects.

 

Interface 6: Changes to Queues.

Interface 7: How PBS server runs scheduler.

 

Interface 8: Changes to Reservations

Interface 9: Deleted

Interface 10: Fairshare

  • Visibility: Public

  • Change Control: Stable

 

  1. What is not supported when multiple scheduler objects are present.

       2. Server's backfill_depth will be default value for all the schedulers in the complex.

            Ex: Default server's backfill_depth is 1 ,1 job per each scheduler will be backfilled 

                If server's backfill_depth is set to 5 , 5 jobs from each scheduler will get backfilled

3. The pbs_statsched() IFL will return the status of all PBS scheduler status. return type is pointer to list of batch_status structure (one for each scheduler)