Link to the discussion forum: https://community.openpbs.org/t/qdel-optimization-for-a-huge-number-of-jobs/2327

Overview:

Since qdel is taking an enormous amount of time to delete the tens of thousands of jobs in PBS_SERVER, there is a need for optimization to improve its performance.  On analysis, observed that the qdel client is iterating over all the jobs and sending the pbs_deljob IFL in a serial fashion, there is a conspicuous problem in design. However this approach works well with a minimum set of jobs, It is not a beneficial solution with a large number of jobs, say for 1 million.

Technical details:

Need for new IFL API pbs_deljobbatch()

  1. To support backward compatibility, since the reply choice type is different between single job deletion and multiple job deletion. (pbs_deljob vs pbs_deljobbatch)

  2. Even reservation deletion causes to create multiple delete job requests, need refactoring on processing reply.

  3. By having a new IFL, we could maintain the existing IFL calls (pbs_deljob) between Server->Mom for job deletion, else it would take time to implement to support these changes in Server->mom requests.

  4. Changing the existing IFL would impact the IFL wraps and, also that welcomes more changes on hooks in server and mom. (pbs_ifl_wrap.c & pbs_tclWrap.c), since the change in reply type. For a single job, the return type is an integer that defines the status of job deletion. But in case of more number of jobs, the reply struct needs to be processed.

  5. The test framework also needs to be refactored to accommodate the IFL signature change.

Interface 1: New ifl api to send a bunch of job list to delete.