To optimize job resource polling, discontinue reporting of resources_used values of PBS root jobs

Follow the PBS Pro Design Document Guidelines.

Overview

The jobs' resources_used values are normally reported by PBS via pbs_mom looking at all processes as found in /proc, and determining which ones belong to PBS jobs, tallying up resources_used information such as mem, vmem, cputime and so on for each job. The code in pbs_mom that walks the /proc filesystem and opening each /proc/<pid> file can be expensive, especially for larger systems with many system processes, particularly those run by root. There's a way to optimize this by not inspecting root processes.

Approach

An enhancement to the pbs_mom code is to only consider and open /proc/<pid> files if they are not owned by root.

The effect of this enhancement is that root-owned PBS jobs would no longer report their resource usage values, with the exception of 'walltime'.  This is an acceptable caveat since:

  • Root jobs are not common, and would require the server attribute 'acl_roots = root' to be set.
  • The jobs would still continue to have their limits (as specified under Resource_List.<resources>) set by pbs_mom, which is not dependent on the polled information.
  • If pbs_cgroups hook is enabled, then root jobs would have their resources_used values tracked via cgroups mechanism.

So the new behavior now in PBS with root jobs would look like this.

corretja:/home/bayucan/tmp # qmgr -c "set server acl_roots=root"    ← allow root jobs to be submitted

corretja:/home/bayucan/tmp # qsub -- /bin/sleep 300

1.corretja

corretja:/home/bayucan/tmp # qstat

Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - ----
1.corretja        STDIN            root              00:00:00 R workq    

corretja:/home/bayucan/tmp # qstat -f 1 | grep resources_used
    resources_used.cpupercent = 0           ← these values are not set
    resources_used.cput = 00:00:00
    resources_used.mem = 0kb
    resources_used.ncpus = 1
    resources_used.vmem = 0kb
    resources_used.walltime = 00:01:16       ← walltime is still accumulated

In terms of accounting_logs, the root job's resources_used values are zero-ed out, except walltime:

3/25/2020 10:26:43;E;1.corretja;user=root group=root project=_pbs_project_default jobname=STDIN queue=workq ctime=1585146103 qtime=1585146103 etime=1585146103 start=1585146103 exec_host=corretja/1 exec_vnode=(corretja:ncpus=1) Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:ncpus=1 session=17245 end=1585146403 Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=1 resources_used.vmem=0kb resources_used.walltime=00:05:00 run_count=1


But if we enable pbs_cgroups hook on a system with cpusets, the root job's resources_used values are recorded as tracked by cgroups:

corretja:/home/bayucan/tmp # qsub -- /bin/sleep 300

4.corretja

corretja:/var/spool/pbs/mom_priv/hooks # qstat -f 4 | grep resources_used
    resources_used.cpupercent = 0
    resources_used.cput = 00:00:00
    resources_used.mem = 3224kb             ← values show up
    resources_used.ncpus = 1
    resources_used.vmem = 3224kb
    resources_used.walltime = 00:01:50


Mom_logs show cgroups mechanism is handling resource usage tracking:


03/25/2020 12:19:53;0080;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, event_type 2048 (elapsed time: 0.1061)
03/25/2020 12:19:53;0008;pbs_mom;Job;4.corretja;Started, pid = 18095
03/25/2020 12:21:42;0008;pbs_python;Job;4.corretja;update_job_usage: CPU percent: 0
03/25/2020 12:21:42;0008;pbs_python;Job;4.corretja;update_job_usage: CPU usage: 0.059 secs
03/25/2020 12:21:42;0008;pbs_python;Job;4.corretja;update_job_usage: Memory usage: mem=3224kb
03/25/2020 12:21:42;0008;pbs_python;Job;4.corretja;update_job_usage: Memory usage: vmem=3224kb
03/25/2020 12:21:42;0008;pbs_python;Job;4.corretja;update_job_usage: vmem fail count: 0
03/25/2020 12:21:42;0080;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, event_type 4096 (elapsed time: 0.1162)

And at the end of the job, accounting_logs show:

03/25/2020 12:24:54;E;4.corretja;user=root group=root project=_pbs_project_default jobname=STDIN queue=workq ctime=1585153193 qtime=1585153193 etime=1585153193 start=1585153193 exec_host=corretja/0 exec_vnode=(corretja:ncpus=1) Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:ncpus=1 session=18095 end=1585153494 Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3224kb resources_used.ncpus=1 resources_used.vmem=3224kb resources_used.walltime=00:05:00 run_count=1






OSS Site Map

Project Documentation Main Page

Developer Guide Pages