qrun of job fails to preempt running jobs

Description

qrun of job fails to preempt running jobs

1. On a single node setup, submit normal job/s that takes all reserouces
2. Submit another job that gets queued due to resource shortage
3. As root, qrun the queued job

Expected:
Running job should be suspended to make room for the job that is qrun

Actual:
qrun fails with message 'not enough free nodes available'

Latest code:

x19-64-sles11-altix:/home/saritah # qmgr -c "s q workq2 priority=150"
x19-64-sles11-altix:/home/saritah # qstat -s

x19-64-sles11-altix:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ — — ------ ----- - -----
4.x19-64-sles11 saritah workq STDIN 28582 1 3 – 00:08 R 00:00
Job run at Sun Oct 29 at 23:30 on (x19-64-sles11-altix:ncpus=3)
5.x19-64-sles11 saritah workq3 STDIN 28585 1 3 – 00:08 R 00:00
Job run at Sun Oct 29 at 23:30 on (x19-64-sles11-altix:ncpus=3)
6.x19-64-sles11 saritah workq3 STDIN 28588 1 2 – 00:08 R 00:00
Job run at Sun Oct 29 at 23:30 on (x19-64-sles11-altix:ncpus=2)
7.x19-64-sles11 saritah workq STDIN – 1 3 – 00:08 Q –
Not Running: Not enough free nodes available
x19-64-sles11-altix:/home/saritah # qrun 7
qrun: Not enough free nodes available 7.x19-64-sles11-altix
x19-64-sles11-altix:/home/saritah # echo $?
2
x19-64-sles11-altix:/home/saritah # tail /var/spool/pbs/sched_logs/20171029
10/29/2017 23:31:19;0800;pbs_sched;Sched;lim_get;entlim_get(g:altair) returned NULL
10/29/2017 23:31:19;0800;pbs_sched;Sched;lim_get;entlim_get(gBS_GENERIC) returned NULL
10/29/2017 23:31:19;0800;pbs_sched;Job;check_server_max_group_run_soft;6.x19-64-sles11-altix max_*group_run_soft are unset
10/29/2017 23:31:19;0800;pbs_sched;Sched;lim_get;entlim_get(p:_pbs_project_default) returned NULL
10/29/2017 23:31:19;0800;pbs_sched;Sched;lim_get;entlim_get(pBS_GENERIC) returned NULL
10/29/2017 23:31:19;0800;pbs_sched;Job;check_server_max_project_run_soft;6.x19-64-sles11-altix max_*project_run_soft are unset
10/29/2017 23:31:19;0008;pbs_sched;Job;7.x19-64-sles11-altix;Received qrun request
10/29/2017 23:31:19;0080;pbs_sched;Job;7.x19-64-sles11-altix;Considering job to run
10/29/2017 23:31:19;0040;pbs_sched;Job;7.x19-64-sles11-altix;Not enough free nodes available
10/29/2017 23:31:19;0080;pbs_sched;Req;;Leaving Scheduling Cycle

Passed on code based on 5th October 17:

x19-64-sles11-altix:/home/saritah # qstat --version
pbs_version = 18.2.0.20171005010546
x19-64-sles11-altix:/home/saritah # qstat -s

x19-64-sles11-altix:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ — — ------ ----- - -----
0.x19-64-sles11 saritah workq STDIN 10853 1 2 – 00:08 R 00:00
Job run at Mon Oct 30 at 01:29 on (x19-64-sles11-altix:ncpus=2)
1.x19-64-sles11 saritah workq STDIN – 1 2 – 00:08 Q –
Not Running: Not enough free nodes available
x19-64-sles11-altix:/home/saritah # qrun 1
x19-64-sles11-altix:/home/saritah # qstat -s

x19-64-sles11-altix:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ — — ------ ----- - -----
0.x19-64-sles11 saritah workq STDIN 10853 1 2 – 00:08 S 00:00
Not Running: Not enough free nodes available
1.x19-64-sles11 saritah workq STDIN 10940 1 2 – 00:08 R 00:00
Job run at Mon Oct 30 at 01:29 on (x19-64-sles11-altix:ncpus=2)
x19-64-sles11-altix:/home/saritah # tail /var/spool/pbs/sched_logs/20171030
10/30/2017 01:29:43;0008;pbs_sched;Job;1.x19-64-sles11-altix;Received qrun request
10/30/2017 01:29:43;0080;pbs_sched;Job;1.x19-64-sles11-altix;Considering job to run
10/30/2017 01:29:44;0040;pbs_sched;Job;0.x19-64-sles11-altix;Job preempted by suspension
10/30/2017 01:29:44;0040;pbs_sched;Job;1.x19-64-sles11-altix;Job run
10/30/2017 01:29:44;0080;pbs_sched;Req;;Leaving Scheduling Cycle
10/30/2017 01:29:44;0080;pbs_sched;Req;;Starting Scheduling Cycle
10/30/2017 01:29:44;0004;pbs_sched;Fil;holidays;The holiday file is out of date; please update it.
10/30/2017 01:29:44;0080;pbs_sched;Job;0.x19-64-sles11-altix;Considering job to run
10/30/2017 01:29:44;0040;pbs_sched;Job;0.x19-64-sles11-altix;Not enough free nodes available
10/30/2017 01:29:44;0080;pbs_sched;Req;;Leaving Scheduling Cycle
x19-64-sles11-altix:/home/saritah #

Acceptance Criteria

None

Activity

Show:
sarita kh
October 31, 2017, 5:16 AM

Marking this as duplicate of PP-1061, since it is the same.

Assignee

Unassigned

Reporter

sarita kh

Severity

None

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Components

Affects versions

Priority

Low
Configure