Requirements: Introduce new provisioning capabilities

Background:

There are several distinct types of provisioning that might need to be done before a jobs can run. So far PBS has two:

  • Provisioning of the Application Operating Environment (aoe) - OS, applications, etc
  • Provisioning of the Power Operating Environment (eoe) - power awareness settings

This RFE would introduce a third class with added capabilities:

  • Provisioning of the "hardware" configuration

These three operations have a hierarchical relationship: "hardware" then aoe then eoe

This new capabilities required of this class of provisioning tasks would be:

  • Provision per chunk - one chunk specifying a particular host that needs to be provisioned with new "hardware" settings
  • Provision multi-vnoded hosts, specifically a Cray system where each login node includes many vnodes, each representing a compute node
    • It would be nice but not critical to extend this latter capability to all platform

It will be left to the design discussion whether we extend the existing provisioning attribute (i.e. aoe) or create a new one (e.g. ioe). Note that Cray only uses provisioning for KNL configuration changes (no application/OS changes) while non-Cray uses provisioning for application/OS images changes (no KNL/BIOS level changes) so there should not be any conflicts should we choose to just extend aoe.

Use cases for :
  • configuring KNL memory models
  • resetting the HPE system cache configuration
  • enabling/disabling hyperthreading in system BIOS
Requirements:
  • Provisioning shall follow all the same rules as provisioning an aoe, specifically:
    • A host shall list which "oe"s it supports
    • A host shall list which "oe" is currently instantiated (if known)
  • A job shall be able to request which "oe"(s) it needs to run on a per chunk basis
    • Different chunks can specify a different "oe" or no "oe" at all
    • Each "type" of node requested (e.g. cray_compute_knl) must request the same "oe"
      • "type" could be specified via the vntype node attribute
    • A job shall still be able to specify an aoe on a per-job basis, but not at the same time as per-chunk if we decide to extend the aoe
  • It shall be possible to provision multi-vnode hosts with a new "oe" (specifically on a Cray)
    • This would be a nice to have on other platforms as well
  • It would be nice to be able to allow an admin specify their own custom "oe"s though not critical to this RFE
  • If more than one type of provisioning task is requested by a job they shall take place in the following order:
    1) "hardware"
    2) aoe
    3) eoe