Follow the PBS Pro Design Document Guidelines.
This enhancement provides a way for an interactive job to survive even after the client host issuing 'qsub -I' loses connection to the execution host.
Possible reasons for the qsub disconnection include:
In all these cases, the proposal is to allow job to continue running, and user can reconnect to the interactive session from a different terminal and host.
PBS_REMOTE_VIEWER=screen
to use the screen utility. The screen command in Linux provides the ability to launch and use multiple shell sessions from a single shell session. That single session can then be disconnected interactively, preserving any processes and its terminal started by that screen session. Then at a later time, another process can re-attach to the original 'screen' session.
PBS_REMOTE_VIEWER=/usr/bin/screen
With PBS_REMOTE_VIEWER set in the client host to "screen", running qsub -I would give the submitting user this message when the interactive session runs:
Your interactive job is running 'screen' on <job-id> under host <exec_hostname>. To disconnect, you can run: screen -d You can reconnect to the job at a later time in one of 2 ways: 1. ssh <exec_hostname> <exec_hostname>% screen -d -r <job-id> 2. pbs_interact <job-id> If you want to shut down the screen session, simply run: exit Press <return>, <ctrl+D>, <ctrl+J>, or <ctrl+M> to continue... Press <ctrl+C> to exit the job... |
When the primary mom sees the PBS_REMOTE_VIEWER value of 'screen', it would execute:
/usr/bin/screen -S <job-id> where <job-id> is the screen name. |
The screen session would continue to run in the background, until an "exit" is done in that session.
Primary mom would monitor for existence of the 'screen' process.
The interactive job will end when the 'screen' process exits, the job has reached its 'walltime' limit, or a qdel has been issued.
If an interactive job disconnects and then reconnects at a later time, the connected session would continue to be tracked by PBS for accounting purposes.
pbs_interact <job-id>
ssh <primary_host>
<primary host> % screen -d -r <job-id>
but instead of using 'ssh' talking to sshd daemon on the execution host, it would be 'pbs_interact' communicating with the primary pbs_mom executing <job-id>.
"Unauthorized Request"
Allow the possibility for PBS_REMOTE_VIEWER to accept other screen-like facility such as 'tmux'.
Project Documentation Main Page