Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In larger clusters where there are more than 10K mom's, this mechanism is not very efficient as we need to broadcast the message to all the mom's in the cluster and this needs to be done with every addition and deletion of mom's. This becomes a bigger problem in cluster which dynamically grows and shrinks, like cloud bursting.

B. Does not work with multi-server architecture.

The future of PBS is multi-server architecture, where you can horizontally scale PBS by deploying more than one instance of PBS server. If the nodes are sharded, (one set of mom's report to only one server), then none of the servers has the complete set of IP's in the cluster. Thereby, It is difficult to rely on IS_CLUSTERADDR2 In a multi-server architecture.

Proposal:

Every server and mom will have a common secret key which is manually configured in a file in a root-only readable directory. This secret key is used to send an encrypted payload (ip address of sender) to the receiver. The receiver can confirm that the sender is part of the same cluster as they have the same shared key.

...

This scheme requires the same password to be present in pbs.key file in all $PBS_HOME/$daemon_priv/ directory. Where the daemon is either server or mom.


Pros:

  1. ScalableFits with multi-server architecture
  2. Easy to implement and maintain

...