MOM will initiate the dialogue sequence with server (code committed)

 

Follow the PBS Pro Design Document Guidelines.

 

Motivation

Generic design

Ideally, clients should reach out to the server with connection requests. Whereas in PBSPro the server has to reach out to scheduler and moms with connection requests and deal with connection failures and non-responding moms/scheds. The future vision is to enable all daemons and clients to reach out to the server for the connection. This work targets in changing the direction of mom connection.

2. Offloading the Server

The server has to maintain the information about each mom it has contacted. It should send a hello periodically if the server does not receive a response back from a mom. This can occur if the mom is down or due to network failure. All these will add a lot of computing overhead to the server which can be alleviated by initiating the connection from mom.

3. Sharding mom to server connection sequence in multi-server architecture

Server reaching out to every mom it knows does not make much sense if it can share that load between servers. Especially when initial dialogue between server and moms are long and computationally expensive. The proposed design will allow the mom to choose a server and continue the dialogue with it essentially sharing the load between servers.

Overview

There is a sequence of dialogue taking place between the server and mom when they come up. This change focuses on reversing the initial exchange to reduce the overhead on the server and making it adaptable for future scalability enhancements.

Initial Dialogue Exchange:

Initial dialogues exchange is a number of message exchange happens between server and mom before the server shows the node as up. The number of message exchange is reduced. Following is a sequence diagram of how mom-server interaction appeared before and now.

IS_HELLOSVR

When the server comes up it sends IS_HELLO to all the moms in the cluster. Mom will reply back with a list of jobs and state information. This is not an efficient method as the server has to deal diligently to keep track of the moms who have not responded yet and continue sending IS_HELLO in a repeated interval. The same can be achieved by sending the first hello message from MOM without incurring overhead on the server. Mom will be initiating the dialogue with IS_HELLOSVR. The server will then check whether the receivers network resolved hostname is available in its database corresponding to a mom.

IS_REPLYHELLO

Server will send IS_REPLYHELLO in response to IS_HELLOSVR from mom. This contains rpp values as well as the IP address of all moms for single-server mode which was being sent as part of IS_NULL and IS_CLUSTER_ADDRS.

IS_REGISTERMOM

Mom will send IS_REGISTERMOM in response to IS_REPLYHELLO. It will contain job and vnode details. This will serve the purpose of IS_HELLO4, IS_UPDATE2 and IS_MOM_READY.

IS_NULL

IS_NULL is a repeated heartbeat from the server to mom with rpp values. The message body contains rpp_retry and rpp_highwater.
Whenever rpp values are updated in the server it needs to update all the moms with the new set of values. But periodic ping will be removed as there are no use cases it can solve.

IS_CLUSTERADDRS

IS_CLUSTERADDRS will send a list of addresses from server to mom which can be used for inter mom communication (More info on IS_CLUSTERADDRS). This will not be enabled for multi-server mode. Users have to use supported alternative security models as reserved ports based authentication is not secure for inter-mom communication.

Frequency of HELLOSVR

Mom will start with quick bursts of IS_HELLOSVR followed by slower intervals. This strategy will make sure a faster connection if the server is already up and does not blow up the network with tons of requests. Mom will attempt to connect every 64 seconds in the worst case.

Failover:

Mom will be treating two of the servers in its list with equal precedence and will be sending hello randomly to one of them. It can receive a reply back only from the active server. Mom will follow the same in the future when we have active-active failover.

Server talking to an alien Mom:

When a server gets a job request from the scheduler which is supposed to run on a mom it does not know yet (thus alien mom), it will look into the database and find the required information. It then opens a connection stream with that mom and sends the job.

The server attempts to talk to the mom before mom attempts to connect

This can happen due to operations such as delete_job or run_job reaching at the server before mom has reached out. Server will then attempt to open a connection via tpp and send the request to mom. Upon receiving the request mom will initiate hello exchange as well as initiating hello exchange sequence. This will succeed only if the mom is up and connected to comm.

Server ping rate is no longer be used

pbs_server used to accept -P option to set the server ping rate. This usage will be removed as it is no longer applicable.

 

OSS Site Map
Project Documentation Main Page
Developer Guide Pages