...
- Be clear about what you expect.
- Specify exactly what should happen to an attribute, resource, etc.
- Do not use snapshots of qstat, pbsnodes, or logs
Cleanup
- Specify any steps needed to return the system to its vanilla state
Guidelines
- Add a summary to the test. Give a unique test ID and a test summary of fewer than 50 characters ### is ID part of summary? Or is this 2 steps?
- We recommend providing the issue ID in the name of the test suite as a reference
- Provide a high-level summary of what needs to be done in each step. Avoid providing commands as sometimes they are system-specific or might change in future.
- Only specify server/sched and MoM options or log messages if they are public/stable or contractual. Avoid specifying any command that is private/unstable
- Refer to guides for any upgrade or installation steps. DO NOT add any upgrade or installation steps in the test. Also DO NOT add reference to a section in PBS guides as it may change in future.
- Similarly DO NOT provide any failover or peer server, etc .. setup in the test configuration.
...
- Refer the related documentation. For example, if testing on a special branch then refer the branch specific documentation and EDD along with generic PBS guides,
- Sometime new features might not be documented in time. If not able to find information in PBS guides then refer the respective EDD. Also one might have to follow up with the project team who has delivered the feature on the status of documentation and previous testing.
- Link PBS documentation and product bugs to the tests if they are failing either due to missing documentation or product behavior.
Examples ### most of these examples fail our recommendations; need better
Job Placement Should Respect ncpus on Nodes Marked Exclusive
...
- Submit 1 job with 1 ncpus and place=excl
- Submit another job with 3 chunks each of 1 ncpus and place=excl
- Check 1st job is running and 2nd is queued with the following job comment:
Insufficient amount of resource: ncpus (R: 3 A: 2 T: 4)
Note: A is 2 and not 3 since the node on which first job is running is exclusive.
Cleanup
Remove the vnode and reset the configuration to default
...
- Set sched attribute opt_backfill_fuzzy to low
- Set server’s backfill_depth = 2
- Set sched attribute strict_ordering to true
- Submit a job consuming 1 ncpu of walltime 60 secs
- Submit a reservation which will start after 60 secs consuming both the ncpus
- Submit a job with walltime 120 secs.
- Verify that above job will be calendared to start after reservation end time. To see that, check the estimated start time of the second job.
Cleanup
Set ncpus to default and unset backfill_depth, opt_backfill_fuzzy, and strict_ordering.
...
- Set server’s backfill_depth = 2
- Set sched attribute sched_preempt_enforce_resumption = true
- Submit a job consuming 2 ncpus with walltime of 2min
- Set running job's topjob_ineligible=true via qalter
- Create a high priority queue
- Submit 2 jobs to high priority queue with walltime of 1min
- See that pre-empted jobs are calendared even when they are set topjob_ineligible=true, i.e., pre-empted jobs have estimated start time set.
Cleanup
Reset to default configuration
...
Hook;pbs_python;requestor_host is (hostname)
Cleanup
Delete the hook and reset to default configuration
...
For PBS >10.x : "cannot run job: Insufficient amount of resource: ncpus (R:1 A:0 T:2)"
Questions
Do we have a test suite tool for returning the system to its vanilla state?
...
...