Community discussion is located here: http://community.pbspro.org/t/pp-719-enhance-setup-in-ptl-specifically-for-cray-platforms/463
InterfaceOverview: setUp()
...
PBSTestSuite.setUp()
...
calls several functions. The changes made in the settings in these functions are often not desirable on Cray because the MoM, scheduler, and server settings get reverted to plain Linux
...
out-of-box configurations.
Out-of-box configuration of PBS on Cray:
...
- These lines should appear in PBS_HOME/mom_priv/config:
- $vnodedef_additive 0
- $alps_client /opt/cray/alps/default/bin/apbasilclient <path to ALPS API>
- $usecp ' *:<Absolute path to /home /home'directory> <Absolute path to /home directory>
Notes: $usecp setting above is used to enable local copy of job output of jobs that ran on compute nodes.
$vnodedef_additive 0 will allow MoM to tell the server that there are vnodes missing, which the server will mark as stale.
2) Scheduler
- PBS_HOME/sched_priv/sched_config will have 'vntype' in the resources line.
3) Server
- In pbspro/test/fw/ptl/lib/pbs_testlib.py, class MoM, method
- The hook PBS_HOME/server_priv/hooks/PBS_translate_mpp.HK is enabled.
- Based on what is returned by the ALPS inventory :
- vnodes representing the compute nodes will get created
- the Cray-specific custom resources (e.g. PBScraynid, PBScrayhost, etc) will get created in PBS_HOME/server_priv/resourcedef.
- the appropriate Cray-specific custom resources will be added to the vnodes.
- since the mom config $vnode_per_numa_node is unset by default, there will be no PBScrayseg attribute on the vnodes representing the compute nodes.
- Server settings :
- flatuid = true
scheduling = true
the default queue is
workq
Design of PBSTestSuite.setUp() for Cray
The affected functions are shown below:
1) MoM:
workq
Interface: MoM.__init__()
- Visibility: Public
- Change Control: Stable
- Synopsis: Modifications to include initial MoM config settings for Cray.
- Standing of the interface : modified interface
- Details
- if on real Cray or Cray ALPS simulator, then add these (key, value) pairs to the dictionary self.dflt_config :
- ('$vnodedef_additive', 0)
- ('$alps_client', '<path to apbasil>')
- ('$usecp' '*:/home /home')
- if on real Cray or Cray ALPS simulator, then add these (key, value) pairs to the dictionary self.dflt_config :
2) Scheduler:
- In pbspro/test/fw/ptl/lib/pbs_testlib.py, class Scheduler, method
- initialize the PBS_HOME/mom_priv/config to have these:
$vnodedef_additive 0
$alps_client <path to ALPS API>
- $usecp *:<Absolute path to /home directory> <Absolute path to /home directory>
Interface: Scheduler.revert_to_defaults()
- Visibility: Public
- Change Control: Stable
- Synopsis: Modifications to include 'vntype' in sched_config out-of-box settings for Cray.
- Standing of the interface : modified interface
- Details
- After copying the out-of-box scheduler configuration in PBS_EXEC/etc/pbs_sched_config to PBS_HOME/sched_priv/sched_config :
- If platform is a real Cray or Cray ALPS simulator, then add "vntype" to resources line of PBS_HOME/sched_priv/sched_config.
- After copying the out-of-box scheduler configuration in PBS_EXEC/etc/pbs_sched_config to PBS_HOME/sched_priv/sched_config :
3) Server:
...
...
Interface: PBSTestSuite.revert_mom()
- Visibility: Public
- Change Control: Stable
- Synopsis: Modifications to have additional steps of deleting and creating nodes back, for retaining Cray specific resources.
- Standing of the interface : modified interface
- Details
- If on a Cray or Cray ALPS simulator
, then recreate the out-of-box Cray-specific custom resources and vnodes, - delete all the vnodes: qmgr -c "delete node @default"
- add back the MoMs: qmgr -c "c n <MoM hostname>"
- if successful, then
- the resourcedef file and the vnodes would have reverted to the out-of-box Cray settings
- return True for revert_to_defaults()
If there is an error during one of the following: - checking if $alps_client is properly configured
- deleting all vnodes
- adding MoMs
- in the "if reverthooks:" clause, after all hooks are disabled,
if on a Cray ALPS simulator, then enable the PBS_translate_mpp.HK hook. If there is an error during enabling of the - then perform below:
- Delete all nodes and create them back.
- then perform below:
including the ones PBS creates from reading the ALPS inventory, by:Checking that $alps_client and its value exists in PBS_HOME/mom_priv/config, and if so:
3b) hooksIn pbspro/test/fw/ptl/lib/pbs_testlib.py, class Server, method revert_to_defaults()
Interface: Server.revert_to_defaults()
- Visibility: Public
- Change Control: Stable
- Synopsis: Modifications to hooks settings for Cray.
- Standing of the interface : modified interface
- Details
- If on a Cray or Cray ALPS simulator then perform below:
- Restore the default 'PBS_translate_mpp' hook by:
- Copying PBS_EXEC/lib/python/altair/pbs_hooks/PBS_translate_mpp.HK to PBS_HOME/server_priv/hooks/PBS_translate_mpp.HK
- Restore the default 'PBS_translate_mpp' hook by:
hook, then an exception is raised. This will cause revert_to_defaults() to return False - If on a Cray or Cray ALPS simulator then perform below:
- In this way, the shipped version (i.e. default version) of the hook will be in PBS_HOME for any PTL test to enable if desired.