Improvement to server logging on a failed call to go_to_background()

Description

I installed PBS from scratch on a machine and the server would fail to start up. The logs didn't suggest anything was wrong. After some debugging I realized that it was a memory issue with the machine that I was installing it on, go_to_background() would fail while forking a process as there wasn't enough swap space for it. We don't log anything when go_to_background() fails. We should probably log a helpful message. go_to_background() logs an error message itself when a call to setsid fails, could do something similar when fork fails.

Acceptance Criteria

None

Activity

Show:
Ravi Agrawal
August 15, 2016, 9:16 PM

Pull request: https://github.com/PBSPro/pbspro/pull/139

Before the change:
[ravi@heisenberg ~]$ sudo /etc/init.d/pbs start
Starting PBS
PBS comm
/opt/pbs/sbin/pbs_comm ready (pid=3162), Proxy Name:heisenberg:17001, Threads:4
PBS mom
PBS sched
Connecting to PBS dataservice.....connected to PBS dataservice@heisenberg
Licenses valid for 1000000 Floating hosts
pbs_server startup failed, exit 2 aborting.

After the change:
[ravi@heisenberg ~]$ sudo /etc/init.d/pbs start
Starting PBS
PBS comm
/opt/pbs/sbin/pbs_comm ready (pid=3162), Proxy Name:heisenberg:17001, Threads:4
PBS mom
PBS sched
Connecting to PBS dataservice.....connected to PBS dataservice@heisenberg
Licenses valid for 1000000 Floating hosts
Server@heisenberg: Cannot allocate memory (12) in Server@heisenberg, fork failed
pbs_server startup failed, exit 2 aborting.

Ravi Agrawal
August 15, 2016, 9:19 PM

Also, the same error message is also written to the server logs now:

08/15/2016 20:56:24;0080;Server@heisenberg;Hook;print_hook;ALLHOOKS hook[1] = {PBS_ibwins, order=0, type=1, enabled=0 user=0, debug=(0) fail_action=(1), event=(queuejob), alarm=30, freq=120}
08/15/2016 20:56:24;0080;Server@heisenberg;Hook;print_hook;queuejob hook[0] = {PBS_ibwins, order=0, type=1, enabled=0 user=0, debug=(0) fail_action=(1), event=(queuejob), alarm=30, freq=120}
08/15/2016 20:56:24;0080;Server@heisenberg;Hook;print_hook;queuejob hook[1] = {PBS_translate_mpp, order=1000, type=1, enabled=0 user=0, debug=(0) fail_action=(1), event=(queuejob,resvsub), alarm=90, freq=120}
08/15/2016 20:56:24;0080;Server@heisenberg;Hook;print_hook;resvsub hook[0] = {PBS_translate_mpp, order=1000, type=1, enabled=0 user=0, debug=(0) fail_action=(1), event=(queuejob,resvsub), alarm=90, freq=120}
08/15/2016 20:56:24;0086;Server@heisenberg;Svr;pbs_python_ext_quick_shutdown_interpreter;--> Stopping Python interpreter <--
08/15/2016 20:56:26;0001;Server@heisenberg;Svr;Server@heisenberg;Cannot allocate memory (12) in Server@heisenberg, fork failed

Earlier, with default log level, the logs would end with the "Stopping Python Interpreter" line

Assignee

Ravi Agrawal

Reporter

Ravi Agrawal

Severity

None

OS

None

Start Date

None

Pull Request URL

Story Points

1

Fix versions

Priority

Low
Configure