Add a mock run option to pbs_mom for testing

Motivation: Often while testing, especially for performance profiling, devs have the need to run thousands of jobs to tax scheduler and server enough to find bottlenecks. In such scenarios, pbs_mom becomes a bottleneck instead when the testing is being done on a single machine, not an actual cluster. Major reasons why this happens:

  • pbs_mom forks off a separate process for each job that it runs. If we create a system with 10k vnodes, which are able to run 100k jobs at once, imagine pbs_mom(s) forking off 100k processes for these jobs at the same time. This overloads the system.

  • pbs_mom also walks through the entire proc table on the system on a regular basis. This again compounds the problem in bullet 1 as there are hundreds of thousands of processes.

  • pbs_mom also sets a high nice value for its process, which means that the kernel gives it preference over other processes like the scheduler or server which are running on the same machine.

So, for testing/performance profiling scheduler or server, it would be immensely useful to add an option to pbs_mom which makes it lightweight and not burden the system, while still maintaining similar server communication so that server and sched don’t see any difference and work the same way that they would in a real cluster.

 

Changes:

  • New -m option to pbs_mom to make it run in a “mock run” mode.

    • This will be used for testing of PBSPro itself, so this option will be hidden from end users, no documentation/man page changes necessary.

  • While in the mock run mode, pbs_mom will behave as follows:

    • forking the job process: instead of forking the job process and actually running the job, it will just add an event to send a job obit to server after Resource_List.walltime, or immediately if not provided.

    • proc table walk: this is done to update time based resource information like cput, cpu percent etc. for each job. Skipping this saves a lot of time at the expense of inaccurate cput, cpupercent etc. info in a job’s E record, which should be ok for performance testing.

    • proc nice value: pbs_mom will not set a high nice value for its process, this will prevent it from hogging cpu time and allow server and sched to use it instead.

Caveats:

  • Any alters to a job’s walltime after it’s submitted won’t affect when the job will be ended by the mom.

  • resources used reported by mom will be set as follows:

    • walltime: if Resource_List.walltime has been provided, them mom keeps track of the used walltime, so that will continue to work the same way.

    • cput/cpupercent: will be set assuming 100% cpu utilization

    • ncpus: will be set to Resource_List.ncpus

    • mem: will be set to Resource_List.mem, if provided

    • vmem: will be set to Resource_List.mem, if provided

  • session ID will not be set on the job

  • No output/error files will be generated for the job