It is possible to indefinitely delay the server ping_nodes using a client that registers/deregisters with pbs_comm

Description

Per Alexis:

It is possible to indirectly force the server to go through net_restore_handler, by issuing commands that set up TPP connections to pbs_comm. One example utility that does this is pbs_rmget, but obviously it's also possible to use the RM API to build your own.
When the handler is called, the server currently will set up a node ping in exactly three seconds. Which makes perfect sense, since pbs_comm has told it that something changed about the connected clients.

It also makes sense to ignore ping_nodes that were set up earlier and to force a delay, since otherwise you could mount a denial of service attack by issuing pbs_rmget as fast as you can (which would then cause ping_nodes to be called as fast as possible too). And if things are really in flux, it does make sense to wait until "the dust settles" before you issue a ping_nodes, and three seconds is as good a value for that as any.

BUT since we always schedule 3 seconds in the future, that also means that any user can delay ping_nodes indefinitely simply by issuing one pbs_rmget every two seconds.
Clearly, ping_nodes needs to remember the last time we actually pinged nodes, and setup_ping then needs to see if we haven't pinged for a long time (e.g. the server's node ping time). If so, then it should indeed submit an immediate ping_nodes task instead of a timed one in three seconds. That would ensure that even if users mount an attack by spamming pbs_mom and thus indirectly the server, ping_nodes would still run at the "regular" intervals it usually runs.

We could also make pbs_comm decide when to make the server call the net_restore_handler differently.

Acceptance Criteria

None

Activity

Show:
Shrinivas Harapanahalli
October 4, 2017, 8:45 AM

There is no direct way to test this fix. The easiest way is to watch the stdout messages using a debug binary run in standalone mode.
Attaching the server stdout captured before and after the fix.

No way to test network communication protocol with PTL, so testing manually via debug stdouts.

Shrinivas Harapanahalli
October 4, 2017, 9:17 AM

Server output for before and after fix version attached

Shrinivas Harapanahalli
October 4, 2017, 9:58 AM

Attached the unit test steps used for verifying

Assignee

Shrinivas Harapanahalli

Reporter

Scott Campbell

Severity

3-High

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Components

Affects versions

Priority

High
Configure