For log_match where we check for non existense of a message , it will go through all 60 attempts , as by default now log_match has max_attempts set to 60 .
For an instance in pbs_testlib.py , there is a log_match for no existense of Error reading line after daemon HUP, It searches for 60 attempts .
It is making running of all the test cases slow.
For running each and every test case , It is taking too much time.
If we atleast make max_attempts to 1 , where existense=False . It will save a lot of time.
When we run smoke test today we observed that the completion of all test cases in it is taking longer than expected. This is because of the following code change in _log_match()
if max_attempts is None:
max_attempts = 60
Whichever PTL test case calls set_sched_config() which in turn calls apply_config() which while sending HUP signal to scheduler it calls log_match() to make sure that "Error reading line" message is not present in sched_logs. In this case above code is hit and it iterates 60 times which is the default value as mentioned above. This is the case for most of the PTL test cases due to which the impact is very high.
The default value for max_attempts should always be minimum and also the impact of changing it should be very less.
Is this the same as PP-1063?
I can say this ticket is related to PP-1063. But as a first look of the code it seems to me that they are not exactly identical. We also should make sure that default values for max_attempts even in case of a line which is there in the logs should always be minimum. If some test case /test suite needs more attempts then they can always pass their desired value as an argument overriding the default value.