/
Proposed Design for PP-465

Proposed Design for PP-465

PP-465 - Getting issue details... STATUS

forum discussion/EDD review

Interface: job_requeue_timeout

  • Visibility: Public
  • Change Control: Obsolete
  • Synopsis: server attribute job_requeue_timeout, will now be used to send reply to the client that job couldn't be rerun within the time period and will continue to be in progress.
  • Details:
    • server attribute job_requeue_timeout, does not have a clear specification as to what will happen to the job once the timeout is hit.
    • current behaviour is that once the timeout is hit, server returns an error to the client that the rerun process has timed out.
    • However, the server continues the process of rerunning the job even though an error is already returned to the client.
    • As this "timeout" is not aborting the rerun process, but leads to display a spurious message that the rerun has timed out, the behaviour is not correct.
    • PBS scheduler relies on the "delay" and if the API is made to return immediately, there are chances there will be over-subscription.
    • Hence, we would continue to have this delay/timeout in place, however, the server attribute job_requeue_timeout, will be marked obsolete.
    • The documentation will be changed to reflect that the attribute is obsolete and that the error message is spurious.
    • The error message will be changed to display that the rerun process is in progress. Exact words will be "qrerun: Response timed out. Job rerun request still in progress for <jobid>.<server>"

Related content