After a server restart, all running jobs disappear from pbsnodes

Description

After a server restart, all running jobs disappear from pbsnodes. Pbsnodes shows only jobs started after server start (till the next server restart).

Steps to reproduce:

  • torque4 is the server.

  • torque5 is a node.

1) starting a job:
(JESSIE)vchlum@torque5:~$ qsub -I -l select=vnode=torque5
qsub: waiting for job 854.torque4.ics.muni.cz to start
qsub: job 854.torque4.ics.muni.cz ready

(JESSIE)vchlum@torque5:~$

2)pbsnodes shows jobs:
(JESSIE)root@torque4:~# pbsnodes -v torque5 | grep jobs
jobs = 854.torque4.ics.muni.cz/0
(JESSIE)root@torque4:~#

3) restart the server:
(JESSIE)root@torque4:~# systemctl restart pbs
(JESSIE)root@torque4:~#

4) no job showed:
(JESSIE)root@torque4:~# pbsnodes -a | grep jobs
(JESSIE)root@torque4:~#

5) qstat output not affected:
(JESSIE)root@torque4:~# qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
854.torque4 STDIN vchlum 00:00:00 R default
(JESSIE)root@torque4:~#

Acceptance Criteria

None

Activity

Show:
Subhasis Bhattacharya
November 28, 2017, 1:14 PM

Hi , just tested on master - seems to be working fine - what version of PBS did you test on?

Václav Chlumský
November 28, 2017, 1:20 PM

I also use the latest master. I will investigate...

Václav Chlumský
November 28, 2017, 6:47 PM

I am surprised you are not able to reproduce this problem. I tested this on Debian (with/without wiped spool directory) and I also tried to run a fresh installation of PBS Pro on fresh images in our cloud with CentOS and I am still able to reproduce the problem. Of course, I use the master branch of PBS Pro.

Scott Campbell
November 28, 2017, 8:16 PM

I see it as well:

[root@centos7-2 pbspro-master]# qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
4.centos7-2 STDIN user1 00:00:00 R workq
5.centos7-2 STDIN user1 00:00:00 R workq
6.centos7-2 STDIN user1 0 Q workq
7.centos7-2 STDIN user1 0 Q workq
[root@centos7-2 pbspro-master]# pbsnodes -av | grep jobs
jobs = 4.centos7-2/0, 5.centos7-2/1
[root@centos7-2 pbspro-master]# qterm
[root@centos7-2 pbspro-master]# pbs_server
Connecting to PBS dataservice.....connected to PBS dataservice@centos7-2
Licenses valid for 10000000 Floating hosts
[root@centos7-2 pbspro-master]# pbsnodes -av | grep jobs
[root@centos7-2 pbspro-master]# qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
4.centos7-2 STDIN user1 00:00:00 R workq
5.centos7-2 STDIN user1 00:00:00 R workq
6.centos7-2 STDIN user1 0 Q workq
7.centos7-2 STDIN user1 0 Q workq
[root@centos7-2 pbspro-master]#

Subhasis Bhattacharya
November 29, 2017, 5:25 AM

looks like i was not using the absolutely latest master, and found out that one of our most recent check-ins has caused this problem. If this is true, we should be able to fix it really quickly. Thanks for reporting this.

Assignee

Bhroam Mann

Reporter

Václav Chlumský

Severity

None

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Fix versions

Affects versions

Priority

High
Configure