Overview:
There are many events in the lifecycle of a job. This EDD is focused on logging accounting records related to suspend and resume events of the job. Currently, job's resource usage is available when the job ends and there is "E" record in the accounting logs for the same. The resource requested/usage can change during the life of the job. The objective of this EDD is to understand the correct usage of resources during the suspend and resume events.
'z' record:
- Upon job suspension, a 'z' record shall be accounted.
- The record shall consist of:requestor
- resources_released(if available)
- resources_used
Example:
1. resources_released list shall only be available if the server attribute restrict_res_to_release_on_suspend is set.
"qmgr -c 's s restrict_res_to_release_on_suspend+=ncpus'"
a. Submit a job requesting ncpus
03/22/2020 18:04:21;z;0.pbsserver;requestor=root@pbsserver resources_released=(pbsserver:ncpus=1) resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3240kb resources_used.ncpus=1 resources_used.vmem=336452kb resources_used.walltime=00:00:30
b. Submit a job requesting ncpus and memory
03/23/2020 19:44:34;z;7.pbsserver;requestor=root@pbsserver resources_released=(pbsserver:ncpus=4) resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=4 resources_used.vmem=0kb resources_used.walltime=00:00:00
2. restrict_res_to_release_on_suspend is unset
03/23/2020 19:16:05;z;6.pbsserver;requestor=root@pbsserver resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3236kb resources_used.ncpus=1 resources_used.vmem=336452kb resources_used.walltime=00:00:27
'r' record
- Upon resuming a job, an 'r' record shall be accounted.The record shall consist of:requestor
Example:
03/22/2020 18:05:17;r;0.pbsserver;requestor=Scheduler@pbsserver