JOBOBIT Hook Event

Overview

In order to provide more detailed accounting of jobs as required by some sites, recording the job information and the time of when a job leaves execution is required. Therefore an jobobit hook event has been added to the PBS server. Registered jobobit hook scripts are executed when a job or subjob leaves execution. Each script is provided information about the job and any associated attributes.

Technical Details

Added Python constants:

  • pbs.HOOK_EVENT_QUEUEJOB
  • pbs.HOOK_EVENT_MODIFYJOB
  • pbs.HOOK_EVENT_RESVSUB
  • pbs.HOOK_EVENT_MODIFYRESV
  • pbs.HOOK_EVENT_MOVEJOB
  • pbs.HOOK_EVENT_RUNJOB
  • pbs.HOOK_EVENT_JOBOBIT
  • pbs.HOOK_EVENT_MANAGEMENT
  • pbs.HOOK_EVENT_MODIFYVNODE
  • pbs.HOOK_EVENT_PROVISION
  • pbs.HOOK_EVENT_RESV_END
  • pbs.HOOK_EVENT_RESV_BEGIN
  • pbs.HOOK_EVENT_RESV_CONFIRM
  • pbs.HOOK_EVENT_EXECJOB_BEGIN
  • pbs.HOOK_EVENT_EXECJOB_PROLOGUE
  • pbs.HOOK_EVENT_EXECJOB_EPILOGUE
  • pbs.HOOK_EVENT_EXECJOB_PRETERM
  • pbs.HOOK_EVENT_EXECJOB_END
  • pbs.HOOK_EVENT_EXECJOB_LAUNCH
  • pbs.HOOK_EVENT_EXECHOST_PERIODIC
  • pbs.HOOK_EVENT_EXECHOST_STARTUP
  • pbs.HOOK_EVENT_EXECJOB_ATTACH
  • pbs.HOOK_EVENT_EXECJOB_RESIZE
  • pbs.HOOK_EVENT_EXECJOB_ABORT
  • pbs.HOOK_EVENT_EXECJOB_POSTSUSPEND
  • pbs.HOOK_EVENT_EXECJOB_PRERESUME
  • pbs.HOOK_EVENT_MOM_EVENTS
  • pbs.HOOK_EVENT_PERIODIC
  • pbs.REVERSE_JOB_STATE
  • pbs.REVERSE_JOB_SUBSTATE
  • pbs.REVERSE_HOOK_EVENT
  • pbs.JOB_SUBSTATE_UNKNOWN
  • pbs.JOB_SUBSTATE_TRANSIN
  • pbs.JOB_SUBSTATE_TRANSICM
  • pbs.JOB_SUBSTATE_TRNOUT
  • pbs.JOB_SUBSTATE_TRNOUTCM
  • pbs.JOB_SUBSTATE_QUEUED
  • pbs.JOB_SUBSTATE_PRESTAGEIN
  • pbs.JOB_SUBSTATE_SYNCRES
  • pbs.JOB_SUBSTATE_STAGEIN
  • pbs.JOB_SUBSTATE_STAGEGO
  • pbs.JOB_SUBSTATE_STAGECMP
  • pbs.JOB_SUBSTATE_HELD
  • pbs.JOB_SUBSTATE_SYNCHOLD
  • pbs.JOB_SUBSTATE_DEPNHOLD
  • pbs.JOB_SUBSTATE_WAITING
  • pbs.JOB_SUBSTATE_STAGEFAIL
  • pbs.JOB_SUBSTATE_PRERUN
  • pbs.JOB_SUBSTATE_RUNNING
  • pbs.JOB_SUBSTATE_SUSPEND
  • pbs.JOB_SUBSTATE_SCHSUSP
  • pbs.JOB_SUBSTATE_EXITING
  • pbs.JOB_SUBSTATE_STAGEOUT
  • pbs.JOB_SUBSTATE_STAGEDEL
  • pbs.JOB_SUBSTATE_EXITED
  • pbs.JOB_SUBSTATE_ABORT
  • pbs.JOB_SUBSTATE_KILLSIS
  • pbs.JOB_SUBSTATE_RUNEPILOG
  • pbs.JOB_SUBSTATE_OBIT
  • pbs.JOB_SUBSTATE_TERM
  • pbs.JOB_SUBSTATE_DELJOB
  • pbs.JOB_SUBSTATE_RERUN
  • pbs.JOB_SUBSTATE_RERUN1
  • pbs.JOB_SUBSTATE_RERUN2
  • pbs.JOB_SUBSTATE_RERUN3
  • pbs.JOB_SUBSTATE_EXPIRED
  • pbs.JOB_SUBSTATE_BEGUN
  • pbs.JOB_SUBSTATE_PROVISION
  • pbs.JOB_SUBSTATE_WAITING_JOIN_JOB
  • pbs.JOB_SUBSTATE_TERMINATED
  • pbs.JOB_SUBSTATE_FINISHED
  • pbs.JOB_SUBSTATE_FAILED
  • pbs.JOB_SUBSTATE_MOVED


Details:

  • The jobobit event will contain job information
  • The job object attached to the the jobobit event is read-only
  • Type for jobobit event is pbs.HOOK_EVENT_JOBOBIT
  • jobobit hooks run by the server
  • Hooks registered to the jobobit event will execute when:
    • a job ends
      • A call to process_hooks() has been added to the end_job function (req_jobobit.c)
    • a job array parent jobends
      • A call to process_hooks() has been added to the chk_array_doneness function (array_func.c)
    • a subjob ends
      • A call to process_hooks() has been added to the end_job function (req_jobobit.c)
    • a job needs to rerun
      • A call to process_hooks() has been added to the force_reque function (req_rerun.c)
      • A call to process_hooks() has been added to the on_job_rerun function (req_jobobit.c)
    • a running job is deleted and the mom can not be accessed
      • A call to process_hooks() has been added to the post_discard_job function (node_manager.c)
    • a running job is forced deleted
      • A call to process_hooks() has been added to the req_deletejob2 function (req_delete.c)
  • New job attribute "obittime"
    • Format: seconds since epoch time
    • Permissions: read-only for all users
    • Represents the time a job or subjob obit occurs (state 'F')
  • A reservation object is attached to the job object if a reservation is associated with the job
  • A pbs.event().accept() call terminates hook execution, as does pbs.event().reject(). The jobobit event object is unaffected by either call. 
    • In the case where the hook script encounters an exception, the error is logged.
  • New functional test for jobobit hook event defined in pbs_hook_jobobit.py, includes tests for:
    • single jobs
    • array jobs
    • jobs run under reservations
    • jobs that are requeued
  • New code for jobobit hook event has been added, including:
    • New batch request structure:
/* JobObit */
struct rq_jobobit {
    struct job *rq_pjob;
    char rq_jid[PBS_MAXSVRJOBID + 1];
    char *rq_destin;
};
  • New event type
HOOK_EVENT_JOBOBIT


Example Hook Script for jobobit:

import pbs
import sys
try:
    e = pbs.event()
    job = e.job
    pbs.logjobmsg(job.id, 'jobobit hook, "%s" started' % (e.hook_name,))
    pbs.logjobmsg(job.id, 'jobobit hook, job starttime:%s' % (job.stime,))
    pbs.logjobmsg(job.id, 'jobobit hook, job obittime:%s' % (job.obittime,))
    pbs.logjobmsg(job.id, 'jobobit hook, job_state=%s' % (job.job_state,))
    pbs.logjobmsg(job.id, 'jobobit hook, job_substate=%s' % (job.substate,))
    state_desc = pbs.REVERSE_JOB_STATE.get(job.job_state, '(None)')
    substate_desc = pbs.REVERSE_JOB_SUBSTATE.get(job.substate, '(None)')
    pbs.logjobmsg(job.id, 'jobobit hook, job_state_desc=%s' % (state_desc,))
    pbs.logjobmsg(job.id, 'jobobit hook, job_substate_desc=%s' % (substate_desc,))
    if hasattr(job, "resv") and job.resv:
        pbs.logjobmsg(job.id, 'jobobit hook, resv:%s' % (job.resv.resvid,))
        pbs.logjobmsg(job.id, 'jobobit hook, resv_nodes:%s' % (job.resv.resv_nodes,))
        pbs.logjobmsg(job.id, 'jobobit hook, resv_state:%s' % (job.resv.reserve_state,))
    else:
        pbs.logjobmsg(job.id, 'jobobit hook, resv:(None)')
    pbs.logjobmsg(job.id, 'jobobit hook, "%s" finished' % (e.hook_name,))
except Exception as err:
    ty, _, tb = sys.exc_info()
    pbs.logmsg(pbs.LOG_ERROR, "jobobit hook, error: " + str(ty) +
        str(tb.tb_frame.f_code.co_filename) + str(tb.tb_lineno))
    e.reject()
else:
    e.accept()


PBS server log excerpt of an jobobit hook executing:

10/11/2021 19:57:16.179481;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, "jobobit_example" started
10/11/2021 19:57:16.179498;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job starttime:1633982235
10/11/2021 19:57:16.179505;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job obittime:1633982236
10/11/2021 19:57:16.179511;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_state=5
10/11/2021 19:57:16.179521;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_substate=53
10/11/2021 19:57:16.179529;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_state_desc=JOB_STATE_EXITING
10/11/2021 19:57:16.179534;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_substate_desc=JOB_SUBSTATE_EXITED
10/11/2021 19:57:16.179540;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, resv:(None)
10/11/2021 19:57:16.179545;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, "jobobit_example" finished



OSS Site Map

Project Documentation Main Page

Developer Guide Pages