JOBOBIT Hook Event

JOBOBIT Hook Event

Links

Overview

In order to provide more detailed accounting of jobs as required by some sites, recording the job information and the time of when a job leaves execution is required. Therefore an jobobit hook event has been added to the PBS server. Registered jobobit hook scripts are executed when a job or subjob leaves execution. Each script is provided information about the job and any associated attributes.

Technical Details

Added Python constants:

  • pbs.HOOK_EVENT_QUEUEJOB

  • pbs.HOOK_EVENT_MODIFYJOB

  • pbs.HOOK_EVENT_RESVSUB

  • pbs.HOOK_EVENT_MODIFYRESV

  • pbs.HOOK_EVENT_MOVEJOB

  • pbs.HOOK_EVENT_RUNJOB

  • pbs.HOOK_EVENT_JOBOBIT

  • pbs.HOOK_EVENT_MANAGEMENT

  • pbs.HOOK_EVENT_MODIFYVNODE

  • pbs.HOOK_EVENT_PROVISION

  • pbs.HOOK_EVENT_RESV_END

  • pbs.HOOK_EVENT_RESV_BEGIN

  • pbs.HOOK_EVENT_RESV_CONFIRM

  • pbs.HOOK_EVENT_EXECJOB_BEGIN

  • pbs.HOOK_EVENT_EXECJOB_PROLOGUE

  • pbs.HOOK_EVENT_EXECJOB_EPILOGUE

  • pbs.HOOK_EVENT_EXECJOB_PRETERM

  • pbs.HOOK_EVENT_EXECJOB_END

  • pbs.HOOK_EVENT_EXECJOB_LAUNCH

  • pbs.HOOK_EVENT_EXECHOST_PERIODIC

  • pbs.HOOK_EVENT_EXECHOST_STARTUP

  • pbs.HOOK_EVENT_EXECJOB_ATTACH

  • pbs.HOOK_EVENT_EXECJOB_RESIZE

  • pbs.HOOK_EVENT_EXECJOB_ABORT

  • pbs.HOOK_EVENT_EXECJOB_POSTSUSPEND

  • pbs.HOOK_EVENT_EXECJOB_PRERESUME

  • pbs.HOOK_EVENT_MOM_EVENTS

  • pbs.HOOK_EVENT_PERIODIC

  • pbs.REVERSE_JOB_STATE

  • pbs.REVERSE_JOB_SUBSTATE

  • pbs.REVERSE_HOOK_EVENT

  • pbs.JOB_SUBSTATE_UNKNOWN

  • pbs.JOB_SUBSTATE_TRANSIN

  • pbs.JOB_SUBSTATE_TRANSICM

  • pbs.JOB_SUBSTATE_TRNOUT

  • pbs.JOB_SUBSTATE_TRNOUTCM

  • pbs.JOB_SUBSTATE_QUEUED

  • pbs.JOB_SUBSTATE_PRESTAGEIN

  • pbs.JOB_SUBSTATE_SYNCRES

  • pbs.JOB_SUBSTATE_STAGEIN

  • pbs.JOB_SUBSTATE_STAGEGO

  • pbs.JOB_SUBSTATE_STAGECMP

  • pbs.JOB_SUBSTATE_HELD

  • pbs.JOB_SUBSTATE_SYNCHOLD

  • pbs.JOB_SUBSTATE_DEPNHOLD

  • pbs.JOB_SUBSTATE_WAITING

  • pbs.JOB_SUBSTATE_STAGEFAIL

  • pbs.JOB_SUBSTATE_PRERUN

  • pbs.JOB_SUBSTATE_RUNNING

  • pbs.JOB_SUBSTATE_SUSPEND

  • pbs.JOB_SUBSTATE_SCHSUSP

  • pbs.JOB_SUBSTATE_EXITING

  • pbs.JOB_SUBSTATE_STAGEOUT

  • pbs.JOB_SUBSTATE_STAGEDEL

  • pbs.JOB_SUBSTATE_EXITED

  • pbs.JOB_SUBSTATE_ABORT

  • pbs.JOB_SUBSTATE_KILLSIS

  • pbs.JOB_SUBSTATE_RUNEPILOG

  • pbs.JOB_SUBSTATE_OBIT

  • pbs.JOB_SUBSTATE_TERM

  • pbs.JOB_SUBSTATE_DELJOB

  • pbs.JOB_SUBSTATE_RERUN

  • pbs.JOB_SUBSTATE_RERUN1

  • pbs.JOB_SUBSTATE_RERUN2

  • pbs.JOB_SUBSTATE_RERUN3

  • pbs.JOB_SUBSTATE_EXPIRED

  • pbs.JOB_SUBSTATE_BEGUN

  • pbs.JOB_SUBSTATE_PROVISION

  • pbs.JOB_SUBSTATE_WAITING_JOIN_JOB

  • pbs.JOB_SUBSTATE_TERMINATED

  • pbs.JOB_SUBSTATE_FINISHED

  • pbs.JOB_SUBSTATE_FAILED

  • pbs.JOB_SUBSTATE_MOVED

 

Details:

  • The jobobit event will contain job information

  • The job object attached to the the jobobit event is read-only

  • Type for jobobit event is pbs.HOOK_EVENT_JOBOBIT

  • jobobit hooks run by the server

  • Hooks registered to the jobobit event will execute when:

    • a job ends

      • A call to process_hooks() has been added to the end_job function (req_jobobit.c)

    • a job array parent jobends

      • A call to process_hooks() has been added to the chk_array_doneness function (array_func.c)

    • a subjob ends

      • A call to process_hooks() has been added to the end_job function (req_jobobit.c)

    • a job needs to rerun

      • A call to process_hooks() has been added to the force_reque function (req_rerun.c)

      • A call to process_hooks() has been added to the on_job_rerun function (req_jobobit.c)

    • a running job is deleted and the mom can not be accessed

      • A call to process_hooks() has been added to the post_discard_job function (node_manager.c)

    • a running job is forced deleted

      • A call to process_hooks() has been added to the req_deletejob2 function (req_delete.c)

  • New job attribute "obittime"

    • Format: seconds since epoch time

    • Permissions: read-only for all users

    • Represents the time a job or subjob obit occurs (state 'F')

  • A reservation object is attached to the job object if a reservation is associated with the job

  • A pbs.event().accept() call terminates hook execution, as does pbs.event().reject(). The jobobit event object is unaffected by either call. 

    • In the case where the hook script encounters an exception, the error is logged.

  • New functional test for jobobit hook event defined in pbs_hook_jobobit.py, includes tests for:

    • single jobs

    • array jobs

    • jobs run under reservations

    • jobs that are requeued

  • New code for jobobit hook event has been added, including:

    • New batch request structure:

/* JobObit */ struct rq_jobobit {     struct job *rq_pjob;     char rq_jid[PBS_MAXSVRJOBID + 1];     char *rq_destin; };
  • New event type

HOOK_EVENT_JOBOBIT

 

Example Hook Script for jobobit:

import pbs import sys try: e = pbs.event() job = e.job pbs.logjobmsg(job.id, 'jobobit hook, "%s" started' % (e.hook_name,)) pbs.logjobmsg(job.id, 'jobobit hook, job starttime:%s' % (job.stime,)) pbs.logjobmsg(job.id, 'jobobit hook, job obittime:%s' % (job.obittime,)) pbs.logjobmsg(job.id, 'jobobit hook, job_state=%s' % (job.job_state,)) pbs.logjobmsg(job.id, 'jobobit hook, job_substate=%s' % (job.substate,)) state_desc = pbs.REVERSE_JOB_STATE.get(job.job_state, '(None)') substate_desc = pbs.REVERSE_JOB_SUBSTATE.get(job.substate, '(None)') pbs.logjobmsg(job.id, 'jobobit hook, job_state_desc=%s' % (state_desc,)) pbs.logjobmsg(job.id, 'jobobit hook, job_substate_desc=%s' % (substate_desc,)) if hasattr(job, "resv") and job.resv: pbs.logjobmsg(job.id, 'jobobit hook, resv:%s' % (job.resv.resvid,)) pbs.logjobmsg(job.id, 'jobobit hook, resv_nodes:%s' % (job.resv.resv_nodes,)) pbs.logjobmsg(job.id, 'jobobit hook, resv_state:%s' % (job.resv.reserve_state,)) else: pbs.logjobmsg(job.id, 'jobobit hook, resv:(None)') pbs.logjobmsg(job.id, 'jobobit hook, "%s" finished' % (e.hook_name,)) except Exception as err: ty, _, tb = sys.exc_info() pbs.logmsg(pbs.LOG_ERROR, "jobobit hook, error: " + str(ty) + str(tb.tb_frame.f_code.co_filename) + str(tb.tb_lineno)) e.reject() else: e.accept()

 

PBS server log excerpt of an jobobit hook executing:

10/11/2021 19:57:16.179481;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, "jobobit_example" started 10/11/2021 19:57:16.179498;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job starttime:1633982235 10/11/2021 19:57:16.179505;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job obittime:1633982236 10/11/2021 19:57:16.179511;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_state=5 10/11/2021 19:57:16.179521;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_substate=53 10/11/2021 19:57:16.179529;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_state_desc=JOB_STATE_EXITING 10/11/2021 19:57:16.179534;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, job_substate_desc=JOB_SUBSTATE_EXITED 10/11/2021 19:57:16.179540;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, resv:(None) 10/11/2021 19:57:16.179545;0008;Server@pdw-s1;Job;7.pdw-s1;jobobit hook, "jobobit_example" finished