Man Page for pbs_dtj
Synopsis
pbs_dtj [-n] [-r <command>] [-u <username>]
Description
pbs_dtj (Distributed TraceJob) is a command that enables a user to gather tracejob information from ALL of the nodes where a PBS Professional job ran. The script uses rsh to connect to the nodes by default, though it will check the pbs.conf file to see if PBS_SCP is set, and use ssh in that case. Because it uses rsh or ssh to connect, it is assumed that the user running the script has passwordless remote access to the nodes.
The script first performs a tracejob on the PBS server system. It then extracts from that information which nodes the job actually ran on, then performs the tracejob on those nodes. All output goes to stdout.
Options
-n
This option specifies the number of days ago the job ran. It is analogous to the -n option to the tracejob command.
In an effort to put as little strain as possible on the execution nodes, pbs_dtj will calculate the fewest number of days needed to use as the tracejob -n xxx argument. It uses the tracejob output from the server to obtain the first date, then calculates the number of days ago that this date was. So, for example, if a job ran 10 days ago and the pbs_dtj -n 30 command were run, tracejob -n 30 would be run on the PBS server/ scheduler system, but then tracejob -n 10 will be run on the execution nodes. The default -n argument for the server/scheduler tracejob is 7 days.
-r rcommand
This option can be used to override the default choice of rsh or ssh (based on PBS_SCP in pbs.conf). Anything could actually be used here as long as it allows "rcommand username@remotehost command" rsh-like syntax.
-u username
This option is useful if you are running the script as root, but you do not have passwordless remote access set up for the root account. You can specify a different username here to be used when connecting to the remote nodes.