It is assumed that PBS will rely on external tools (scripts developed by customers or third party applications) to perform metrics collections and take actions such switch power states of a vnode.
The total amount of resources in the system is assumed to be constant for the scope of this UCR. No execution vnodes are dynamically created as a result of power management actions.
Power consumption accounting is assumed to be precise (as far as the platform management tools, scripts developed by customers or third party applications, provide for proper and accurate information) only if jobs are exclusively allocated (vnodes set with sharing=exclusive and/or job submitted with “place=excl”).
Use Cases:
Admins must be able to inhibit power related actions initiated by PBS Pro on selected nodes. This is to keep PBS Pro from potentially harming maintenance operations or outside-of-PBS user activity.
Submit a job so that it runs within a particular "power state"
Allow a job type (e.g., application, cpu intensive, memory intensive, network) to request a specific power profile suitable for achieving a power vs performance objective.
When a job running in a prescribed “power state” terminates the resources it used are returned to a default power setting.
Administrator user will create power profiles associated with “power states” and make them available to users.
Administrator user sets the initial and default power profiles of execution nodes using the platform management tools and/or PBS Pro power related configuration parameters.
Administrator user can delete power profiles
If a job does not request a specific power profile, no power profile change will be initiated by PBS and job will run in whatever power environment is currently set on the selected execution hosts.
Show power status and power related metrics (e.g. temperatures) of vnodes.
What can be shown for each node depends on the available platform tools.
Track power consumption of nodes and jobs.
Requirements:
Support for vendor specific power interfaces
SGI HPE (limited to the platforms allowing for needed power activation functionalities, nominally any system that has IPNM and DCM functionality will work. Currently it is SGI Altix ICE X or SGI Rackable systems. HPE MC990 (previously called SGI UV300) is out of scope as it can’t yet activate power profiles mapping to specific vnodes).
Cray (limited to the Intel based platforms): platform specific tools to activate power profiles (capmc) and Intel specific features (ALPS/BASIL interface to frequency lock and turbomode).
Generic: support site defined power interfaces (ie. site defined scripts) to integrate with non vendor specific tools (IPMI, ipmitool, etc…) (This is internal design/architectural requirement).
Support heterogeneous power interfaces within the same complex. Example: the same pbs_server might host machines managed by different power management systems such as generic, HPE CMU, SGI HPE tools, Cray proprietary interfaces, etc....
Record job power usage in PBS Pro accounting logs.
All power related information pertaining to jobs and vnodes shall be visible when querying for job and vnodes status information.
The power value accounted for the job is the aggregate one consumed by every execution host allocated for the job for the time job was in “running” state. Accounting reliability is compromised if job was not exclusively allocated to hosts. For accounting to work the platform management tool is required to publish functions to query power consumed by the host.
All the power related functions are disabled by default and activation will require explicit administrator action (e.g. enable provisioning).
Enabling power related functionality shall not compromise other PBS Pro functionality if that’s not assumed otherwise.
Prior to enable power related functionality of PBS, the administrator need to make sure that all the necessary settings are in place in the platform management tool as well (e.g. when needed define power profiles as appropriate for the platform management tools and their integrations with PBS Pro).
Power profiles defined within a complex are supposed to change over time. Administrator is responsible for maintaining those within the platform management infrastructure. PBS will validate profile name at dispatch time. Job can be submitted requesting a power profile that could not exist when the job is actually run, but the inability to activate such profile on execution hosts will generate proper error messages.
It is assumed the default power state of an execution host is defined by means of the platform management tool and that PBS Pro will, depending on the particular platform management tool in place: 1. notify the management tool that the job which requested a particular power state has ended. 2. explicitly initiate the return to default power state upon job termination when no other jobs are scheduled to run on the same nodes with non-default profiles. Depending on the platform management tool any post-job action initiated by PBS Pro might be optional as the platform management tool or the execution hosts, might autonomously return to a default idle status when no workload is running.
The initial power status of execution nodes is set by operating on both the platform management tool and PBS Pro settings.
Communication between PBS Pro and the platform management tools (e.g. HPE Management Center) shall be secured and authenticated whenever possible.
PBS Professional must offer platform manager tool an interface to asynchronously perform action on execution hosts and jobs (e.g. set a node offline; requeue a running job). This is almost probably already taken care by existing command line interfaces and APIs (exechost_periodic hook).
As the information about what energy profiles are available in the complex belongs to the platform management tool, when admin changes the list of available profiles he’ll implicitly get PBS Pro configuration out of sync. In all circumstances when the synchronization happens by PBS Pro periodic action (however it is implemented), the synchronization period shall be set with a reasonable default and made tunable by the admin.