Proposal for tm_spawn_multi

Follow the PBS Pro Design Document Guidelines.

Overview

tm_spawn can only handle starting the executable on one host.
pbsdsh which calls tm_spawn() had to call tm_spawn() in a loop, once for every copy/host that needed to run the command.  If there were 10 hosts to send the command to, there would be 10 calls in pbsdsh to tm_spawn().


We can make things more parallel by sending the request to all MoMs at once by creating a new tm_spawn_multi() 
that will cause the executable to be multicast to all other hosts at once.

tm_spawn_multi(int argc, char **argv, char **envp, int list_size, tm_node_id where[], tm_task_id tids[], tm_event_t *event)


tm_spawn_multi() has three notable changes/additions to the tm_spawn() interface:
tm_node_id where[] - is an array of node ids one for each node (or copy) needed
int list_size - the number of entries in the where array.
tm_task_id *tids - should be a pointer to an array of list_size number of tm_task_ids

tm_spawn_multi() sends a message to all MoMs that are part of the job to start a new task. The node ids of the host(s) to run the task is given in the where array.  The parameters argc, argv and envp specify the program to run and its arguments and environment very much like exec(). The full path of the program executable must be given by argv[0] and the number of elements in the argv array is given by argc. The array envp is NULL terminated. The argument event points to a tm_event_t variable which is filled in with an event number. When this event is returned by tm_poll, the tm_task_id pointed to by tids will contain all the task IDs of the newly created tasks by all MoMs.

pbsdsh will also need to be modified to call the new tm_spawn_multi() instead tm_spawn().
There is a pbsdsh option -s that caused the executable to finish running on one host before starting on the next host.  This option no longer has meaning since executables will no longer be started one at a time with tm_spawn_multi().  Therefore, the -s option will be removed from pbsdsh.


A few detailed design details:


Introduce TM_SPAWN_MULTI protocol message type.
Created a new InterMoM command type IM_SPAWN_MULTI

tm_spawn_multi() will send to the primary execution host the number of entries in the array of node ids, the array of node ids, along with the argc, argv and envp arguments.  The primary execution host MoM will
go through the list and for every host on the list that isn't for her, open a stream to each new host  found in the list and add it to the group to be multicast to.  Similar to what is done for tm_spawn, the primary execution host MoM will handle any requests that are to be executed on her as she goes through the list.  Once the primary execution host has gone through the list of hosts all of the information is multicast out to the other moms as an IM_SPAWN_MULTI command.

Each of the sister MoMs (
any MoM that is not on the head or first host of a multihost job) will go through the list of node ids, and everytime she finds herself, she will create a task and temporarily save the task ID.  When the sister MoM is done going through the list, she will send the total number of tasks created along with the associated task IDs back to the primary execution host MoM.  The primary execution host MoM will store the task IDs and the association node IDs in the job attribute ji_taskid_list and ji_nid_listOnly after all expected task IDs are received a tm_reply() will be sent. The total count of expected task/node IDs will be sent, along with the task ID and node ID pair.

a TM_SPAWN_MULTI option will be added to tm_poll to handle the multiple task ID and node IDs that are sent.
new_task() is created in tm_poll for each task ID/node ID pair.








OSS Site Map

Project Documentation Main Page

Developer Guide Pages