Proposal for tm_spawn_multi
Follow the PBS Pro Design Document Guidelines.
Links
- Link to discussion on Developer Forum: https://community.openpbs.org/t/proposal-for-tm-spawn-multi/2550
- Link to pull request: <PR link if available>
Overview
tm_spawn can only handle starting the executable on one host.
pbsdsh which calls tm_spawn() had to call tm_spawn() in a loop, once for every copy/host that needed to run the command. If there were 10 hosts to send the command to, there would be 10 calls in pbsdsh to tm_spawn().
We can make things more parallel by sending the request to all MoMs at once by creating a new tm_spawn_multi()
that will cause the executable to be multicast to all other hosts at once.
tm_spawn_multi(int argc, char **argv, char **envp, int list_size, tm_node_id where[], tm_task_id tids[], tm_event_t *event)
tm_spawn_multi()
has three notable changes/additions to the tm_spawn()
interface:tm_node_id where[]
- is an array of node ids one for each node (or copy) neededint list_size
- the number of entries in the where array.tm_task_id *tids
- should be a pointer to an array of list_size number of tm_task_idstm_spawn_multi()
sends a message to all MoMs that are part of the job to start a new task. The node ids of the host(s) to run the task is given in the where array. The parameters argc
, argv
and envp
specify the program to run and its arguments and environment very much like exec()
. The full path of the program executable must be given by argv[0]
and the number of elements in the argv
array is given by argc
. The array envp
is NULL
terminated. The argument event points to a tm_event_t
variable which is filled in with an event number. When this event is returned by tm_poll
, the tm_task_id
pointed to by tids
will contain all the task IDs of the newly created tasks by all MoMs.
pbsdsh
will also need to be modified to call the new tm_spawn_multi()
instead tm_spawn()
.
There is a pbsdsh
option -s
that caused the executable to finish running on one host before starting on the next host. This option no longer has meaning since executables will no longer be started one at a time with tm_spawn_multi()
. Therefore, the -s
option will be removed from pbsdsh
.
A few detailed design details:
Introduce TM_SPAWN_MULTI
protocol message type.
Created a new InterMoM command type IM_SPAWN_MULTI
tm_spawn_multi()
will send to the primary execution host the number of entries in the array of node ids, the array of node ids, along with the argc, argv and envp arguments. The primary execution host MoM will go through the list and for every host on the list that isn't for her, open a stream to each new host found in the list and add it to the group to be multicast to. Similar to what is done for tm_spawn
, the primary execution host MoM will handle any requests that are to be executed on her as she goes through the list. Once the primary execution host has gone through the list of hosts all of the information is multicast out to the other moms as an IM_SPAWN_MULTI
command.
Each of the sister MoMs (any MoM that is not on the head or first host of a multihost job) will go through the list of node ids, and everytime she finds herself, she will create a task and temporarily save the task ID. When the sister MoM is done going through the list, she will send the total number of tasks created along with the associated task IDs back to the primary execution host MoM. The primary execution host MoM will store the task IDs and the association node IDs in the job attribute ji_taskid_list
and ji_nid_list
. Only after all expected task IDs are received a tm_reply()
will be sent. The total count of expected task/node IDs will be sent, along with the task ID and node ID pair.
a TM_SPAWN_MULTI
option will be added to tm_poll
to handle the multiple task ID and node IDs that are sent.new_task()
is created in tm_poll
for each task ID/node ID pair.
Project Documentation Main Page