JOBOBIT Hook Event
Links
Link to discussion on Developer Forum: https://community.openpbs.org/t/design-document-for-endjob-hook-event/2525
Link to ALCF development repo: https://github.com/ericpershey/pbspro/compare/hook_event_jobobit
Link to pull requests:
Overview
In order to provide more detailed accounting of jobs as required by some sites, recording the job information and the time of when a job leaves execution is required. Therefore an jobobit hook event has been added to the PBS server. Registered jobobit hook scripts are executed when a job or subjob leaves execution. Each script is provided information about the job and any associated attributes.
Technical Details
Added Python constants:
pbs.HOOK_EVENT_QUEUEJOB
pbs.HOOK_EVENT_MODIFYJOB
pbs.HOOK_EVENT_RESVSUB
pbs.HOOK_EVENT_MODIFYRESV
pbs.HOOK_EVENT_MOVEJOB
pbs.HOOK_EVENT_RUNJOB
pbs.HOOK_EVENT_JOBOBIT
pbs.HOOK_EVENT_MANAGEMENT
pbs.HOOK_EVENT_MODIFYVNODE
pbs.HOOK_EVENT_PROVISION
pbs.HOOK_EVENT_RESV_END
pbs.HOOK_EVENT_RESV_BEGIN
pbs.HOOK_EVENT_RESV_CONFIRM
pbs.HOOK_EVENT_EXECJOB_BEGIN
pbs.HOOK_EVENT_EXECJOB_PROLOGUE
pbs.HOOK_EVENT_EXECJOB_EPILOGUE
pbs.HOOK_EVENT_EXECJOB_PRETERM
pbs.HOOK_EVENT_EXECJOB_END
pbs.HOOK_EVENT_EXECJOB_LAUNCH
pbs.HOOK_EVENT_EXECHOST_PERIODIC
pbs.HOOK_EVENT_EXECHOST_STARTUP
pbs.HOOK_EVENT_EXECJOB_ATTACH
pbs.HOOK_EVENT_EXECJOB_RESIZE
pbs.HOOK_EVENT_EXECJOB_ABORT
pbs.HOOK_EVENT_EXECJOB_POSTSUSPEND
pbs.HOOK_EVENT_EXECJOB_PRERESUME
pbs.HOOK_EVENT_MOM_EVENTS
pbs.HOOK_EVENT_PERIODIC
pbs.REVERSE_JOB_STATE
pbs.REVERSE_JOB_SUBSTATE
pbs.REVERSE_HOOK_EVENT
pbs.JOB_SUBSTATE_UNKNOWN
pbs.JOB_SUBSTATE_TRANSIN
pbs.JOB_SUBSTATE_TRANSICM
pbs.JOB_SUBSTATE_TRNOUT
pbs.JOB_SUBSTATE_TRNOUTCM
pbs.JOB_SUBSTATE_QUEUED
pbs.JOB_SUBSTATE_PRESTAGEIN
pbs.JOB_SUBSTATE_SYNCRES
pbs.JOB_SUBSTATE_STAGEIN
pbs.JOB_SUBSTATE_STAGEGO
pbs.JOB_SUBSTATE_STAGECMP
pbs.JOB_SUBSTATE_HELD
pbs.JOB_SUBSTATE_SYNCHOLD
pbs.JOB_SUBSTATE_DEPNHOLD
pbs.JOB_SUBSTATE_WAITING
pbs.JOB_SUBSTATE_STAGEFAIL
pbs.JOB_SUBSTATE_PRERUN
pbs.JOB_SUBSTATE_RUNNING
pbs.JOB_SUBSTATE_SUSPEND
pbs.JOB_SUBSTATE_SCHSUSP
pbs.JOB_SUBSTATE_EXITING
pbs.JOB_SUBSTATE_STAGEOUT
pbs.JOB_SUBSTATE_STAGEDEL
pbs.JOB_SUBSTATE_EXITED
pbs.JOB_SUBSTATE_ABORT
pbs.JOB_SUBSTATE_KILLSIS
pbs.JOB_SUBSTATE_RUNEPILOG
pbs.JOB_SUBSTATE_OBIT
pbs.JOB_SUBSTATE_TERM
pbs.JOB_SUBSTATE_DELJOB
pbs.JOB_SUBSTATE_RERUN
pbs.JOB_SUBSTATE_RERUN1
pbs.JOB_SUBSTATE_RERUN2
pbs.JOB_SUBSTATE_RERUN3
pbs.JOB_SUBSTATE_EXPIRED
pbs.JOB_SUBSTATE_BEGUN
pbs.JOB_SUBSTATE_PROVISION
pbs.JOB_SUBSTATE_WAITING_JOIN_JOB
pbs.JOB_SUBSTATE_TERMINATED
pbs.JOB_SUBSTATE_FINISHED
pbs.JOB_SUBSTATE_FAILED
pbs.JOB_SUBSTATE_MOVED
Details:
The jobobit event will contain job information
The job object attached to the the jobobit event is read-only
Type for jobobit event is pbs.HOOK_EVENT_JOBOBIT
jobobit hooks run by the server
Hooks registered to the jobobit event will execute when:
a job ends
A call to process_hooks() has been added to the end_job function (req_jobobit.c)
a job array parent jobends
A call to process_hooks() has been added to the chk_array_doneness function (array_func.c)
a subjob ends
A call to process_hooks() has been added to the end_job function (req_jobobit.c)
a job needs to rerun
A call to process_hooks() has been added to the force_reque function (req_rerun.c)
A call to process_hooks() has been added to the on_job_rerun function (req_jobobit.c)
a running job is deleted and the mom can not be accessed
A call to process_hooks() has been added to the post_discard_job function (node_manager.c)
a running job is forced deleted
A call to process_hooks() has been added to the req_deletejob2 function (req_delete.c)
New job attribute "obittime"
Format: seconds since epoch time
Permissions: read-only for all users
Represents the time a job or subjob obit occurs (state 'F')
A reservation object is attached to the job object if a reservation is associated with the job
A pbs.event().accept() call terminates hook execution, as does pbs.event().reject(). The jobobit event object is unaffected by either call.
In the case where the hook script encounters an exception, the error is logged.
New functional test for jobobit hook event defined in pbs_hook_jobobit.py, includes tests for:
single jobs
array jobs
jobs run under reservations
jobs that are requeued
New code for jobobit hook event has been added, including:
New batch request structure:
/* JobObit */
struct rq_jobobit {
struct job *rq_pjob;
char rq_jid[PBS_MAXSVRJOBID + 1];
char *rq_destin;
};New event type
HOOK_EVENT_JOBOBIT
Example Hook Script for jobobit:
import pbs
import sys
try:
e = pbs.event()
job = e.job
pbs.logjobmsg(job.id, 'jobobit hook, "%s" started' % (e.hook_name,))
pbs.logjobmsg(job.id, 'jobobit hook, job starttime:%s' % (job.stime,))
pbs.logjobmsg(job.id, 'jobobit hook, job obittime:%s' % (job.obittime,))
pbs.logjobmsg(job.id, 'jobobit hook, job_state=%s' % (job.job_state,))
pbs.logjobmsg(job.id, 'jobobit hook, job_substate=%s' % (job.substate,))
state_desc = pbs.REVERSE_JOB_STATE.get(job.job_state, '(None)')
substate_desc = pbs.REVERSE_JOB_SUBSTATE.get(job.substate, '(None)')
pbs.logjobmsg(job.id, 'jobobit hook, job_state_desc=%s' % (state_desc,))
pbs.logjobmsg(job.id, 'jobobit hook, job_substate_desc=%s' % (substate_desc,))
if hasattr(job, "resv") and job.resv:
pbs.logjobmsg(job.id, 'jobobit hook, resv:%s' % (job.resv.resvid,))
pbs.logjobmsg(job.id, 'jobobit hook, resv_nodes:%s' % (job.resv.resv_nodes,))
pbs.logjobmsg(job.id, 'jobobit hook, resv_state:%s' % (job.resv.reserve_state,))
else:
pbs.logjobmsg(job.id, 'jobobit hook, resv:(None)')
pbs.logjobmsg(job.id, 'jobobit hook, "%s" finished' % (e.hook_name,))
except Exception as err:
ty, _, tb = sys.exc_info()
pbs.logmsg(pbs.LOG_ERROR, "jobobit hook, error: " + str(ty) +
str(tb.tb_frame.f_code.co_filename) + str(tb.tb_lineno))
e.reject()
else:
e.accept()
PBS server log excerpt of an jobobit hook executing:
10/11/2021 19:57:16.179481;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, "jobobit_example" started
10/11/2021 19:57:16.179498;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job starttime:1633982235
10/11/2021 19:57:16.179505;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job obittime:1633982236
10/11/2021 19:57:16.179511;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_state=5
10/11/2021 19:57:16.179521;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_substate=53
10/11/2021 19:57:16.179529;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_state_desc=JOB_STATE_EXITING
10/11/2021 19:57:16.179534;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_substate_desc=JOB_SUBSTATE_EXITED
10/11/2021 19:57:16.179540;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, resv:(None)
10/11/2021 19:57:16.179545;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, "jobobit_example" finished