Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • New sched attribute ‘job_run_wait’ which will accept the following values:

    • execjob_hook: scheduler will wait for the server to send an ACK, and the server will only send an ACL ACK back to scheduler when it has run the runjob hook on server side, and sent the job to the mom and the mom sends an ACK back after running execjob hooks on the mom side.

      • Implications:

        • Scheduling cycles can be much slower than the other options.

        • On the upside, Scheduler will know about any runjob rejects from the runjob or execjob_begin hooks, and can re-purpose those resources for running some other job in that cycle.

    • runjob_hook: scheduler will wait for the server to send an ACK, and the server will send an ACK immediately after running any runjob hooks, it will NOT wait for the job to be sent to the mom and the mom sending an ACK. This will be the default value.

      • If no runjob hooks are configured then there’s no point waiting for server, so sched will internally behave as if the value was ‘none’

      • Implications:

        • Scheduling cycles will be slower than the “none” mode below, but still faster than the “execjob_hook” mode above.

        • When a mom level hook rejects a job, the job’s run_count is increased, so such jobs eventually get penalized by getting Held by the server. So, this mode shouldn’t need the admin to penalize such jobs via their execjob hooks.

    • none: scheduler will not wait for an ACK from the server at all, it will just shoot the runjob request and move on to the next job. This can make scheduling cycles an order of magnitude faster.

      • Implications:

        • To prevent the under-utilization situation described in 2nd bullet under the Caveats, it is recommended that if a site has runjob hooks in place, then the hook should prevent jobs which get repeatedly rejected from causing under-utilization of resources. This can be done by either de-prioritizing such jobs, or putting them on Hold.

    • Any changes to this attribute will take effect from the next scheduling cycle of the particular scheduler.

    • Attribute permissions: All can read, but Manager write only

  • pbs_asyrunjob() will be truly asynchronous:

    • No change to the function’s signature, just that the function will now no longer wait for any reply from the server

  • throughput_modesched attribute will be deprecated:

    • The “job_run_wait=runjob_hook” will take over the role of “throughput_mode=True” value, and “job_run_wait=execjob_hook” will take over the role of “throughput_mode=False” value.

    • While it’s still part of PBS, setting either throughput_mode or job_run_wait will set the other, except when job_run_wait is set to none, in that case throughput_mode will be unset (without being reset to default)

...