Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview:

There are many events in the lifecycle of a job. This EDD is focused on logging accounting records related to suspend and resume events of the job. Currently, job's resource usage is available when the job ends and there is "E" record in the accounting logs for the same. The resource requested/usage can change during the life of the job. The objective of this EDD is to understand the correct usage of resources during the suspend and resume events. 

'z' record:

  • Upon job suspension, a 'z' record shall be accounted. 
  • The record shall consist of:requestor
    • resources_released(if available)
    • resources_used

Example:

1. resources_released list shall only be available if the server attribute restrict_res_to_release_on_suspend is set.
"qmgr -c 's s restrict_res_to_release_on_suspend+=ncpus'"

a. Submit a job requesting ncpus

03/22/2020 18:04:21;z;0.pbsserver;requestor=root@pbsserver resources_released=(pbsserver:ncpus=1) resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3240kb resources_used.ncpus=1 resources_used.vmem=336452kb resources_used.walltime=00:00:30

b. Submit a job requesting ncpus and memory

03/23/2020 19:44:34;z;7.pbsserver;requestor=root@pbsserver resources_released=(pbsserver:ncpus=4) resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=4 resources_used.vmem=0kb resources_used.walltime=00:00:00


2. restrict_res_to_release_on_suspend is unset

03/23/2020 19:16:05;z;6.pbsserver;
requestor=root@pbsserver resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3236kb resources_used.ncpus=1 resources_used.vmem=336452kb resources_used.walltime=00:00:27

'r' record

  • Upon resuming a job, an 'r' record shall be accounted.The record shall consist of:requestor

Example:

03/22/2020 18:05:17;r;0.pbsserver;requestor=Scheduler@pbsserver