Jobs don't run properly when MoM started with -N option [WINDOWS]

Description

pbs_mom does not execute jobs when it is run with -N option. It directly goes into 'E' state without executing the job script, sending the job obit the moment it receives the request.

C:\Users\pbsadmin.PBSWORKS>qstat --version
pbs_version = PBSPro_13.1.0.160576

======================================
===== Started mom through services.msc =====
===== Job will execute properly and finish =====
======================================
C:\Users\pbsadmin.PBSWORKS>pbsnodes -av
blrecvm-purify
Mom = blrecvm-purify
Port = 15002
pbs_version = PBSPro_13.1.0.171043
ntype = PBS
state = free
pcpus = 8
resources_available.arch = windows
resources_available.host = blrecvm-purify
resources_available.mem = 16830004kb
resources_available.ncpus = 8
resources_available.vnode = blrecvm-purify
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared

C:\Users\pbsadmin.PBSWORKS>qmgr -c "p s"
#

  1. Create queues and set their attributes.
    #
    #

  2. Create and define queue workq
    #
    create queue workq
    set queue workq queue_type = Execution
    set queue workq enabled = True
    set queue workq started = True
    #

  3. Set server attributes.
    #
    set server scheduling = True
    set server acl_roots = pbsadmin
    set server default_queue = workq
    set server log_events = 511
    set server mail_from = adm
    set server query_other_jobs = True
    set server resources_default.ncpus = 1
    set server default_chunk.ncpus = 1
    set server scheduler_iteration = 600
    set server resv_enable = True
    set server node_fail_requeue = 310
    set server max_array_size = 10000
    set server single_signon_password_enable = True
    set server pbs_license_info = 6200@204.235.23.207
    set server pbs_license_min = 1
    set server pbs_license_max = 2147483647
    set server pbs_license_linger_time = 31536000
    set server license_count = Avail_Global:3208238 Avail_Local:1 Used:0 High_Use:8
    Avail_Sockets:0 Unused_Sockets:0
    set server eligible_time_enable = False
    set server max_concurrent_provision = 5

C:\Users\pbsadmin.PBSWORKS>qstat

C:\Users\pbsadmin.PBSWORKS>qsub – pbs-sleep 30
409.blrecvm-purify

C:\Users\pbsadmin.PBSWORKS>qstat -s 409

blrecvm-purify:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ — — ------ ----- - -----

409.blrecvm-pur pbsadmin workq STDIN 5220 1 1 – – R 00:00

Job run at Fri Nov 17 at 02:07 on (blrecvm-purify:ncpus=1)

C:\Users\pbsadmin.PBSWORKS>tracejob 409

Job: 409.blrecvm-purify

11/17/2017 02:07:26 L Considering job to run
11/17/2017 02:07:26 L Job run
11/17/2017 02:07:26 S Job Queued at request of pbsadmin@blrecvm-purify,
owner = pbsadmin@blrecvm-purify, job name = STDIN,
queue = workq
11/17/2017 02:07:26 S Job Run at request of Scheduler@blrecvm-purify on
exec_vnode (blrecvm-purify:ncpus=1)
11/17/2017 02:07:26 S Job Modified at request of Scheduler@blrecvm-purify
11/17/2017 02:07:27 M allowed pbsadmin to access window station and desktop,

User pbsadmin passworded
11/17/2017 02:07:27 M Started, pid = 5220
11/17/2017 02:07:27 M Started, pid = 5220
11/17/2017 02:07:57 S Obit received momhop:1 serverhop:1 state:4 substate:42

11/17/2017 02:07:57 M task 1 terminated
11/17/2017 02:07:57 M Terminated
11/17/2017 02:07:57 M task 00000001 cput= 0:00:00
11/17/2017 02:07:57 M kill_job
11/17/2017 02:07:57 M BLRECVM-PURIFY cput= 0:00:00 mem=1876kb
11/17/2017 02:07:57 M Obit sent
11/17/2017 02:07:57 M copy file cred request received
11/17/2017 02:07:58 M User pbsadmin passworded
11/17/2017 02:07:58 M staged 2 items out over 0:00:00
11/17/2017 02:08:13 M post_cpyfile: entered 409.blrecvm-purify
11/17/2017 02:08:13 S Exit_status=0 resources_used.cput=00:00:00
resources_used.mem=1876kb
resources_used.walltime=00:00:44
11/17/2017 02:08:13 M post_cpyfile: done 409.blrecvm-purify
11/17/2017 02:08:13 M delete job request received
11/17/2017 02:08:13 M kill_job

==================================
==== Started mom with 'pbs_mom -N' ====
==== Job will directly go into 'E' state =====
==================================

C:\Users\pbsadmin.PBSWORKS>pbsnodes -av
blrecvm-purify
Mom = blrecvm-purify
Port = 15002
pbs_version = PBSPro_13.1.0.171043
ntype = PBS
state = free
pcpus = 8
resources_available.arch = windows
resources_available.host = blrecvm-purify
resources_available.mem = 16830004kb
resources_available.ncpus = 8
resources_available.vnode = blrecvm-purify
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared

C:\Users\pbsadmin.PBSWORKS>qsub – pbs-sleep 30
410.blrecvm-purify

C:\Users\pbsadmin.PBSWORKS>qstat -s 410

blrecvm-purify:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ — — ------ ----- - -----

410.blrecvm-pur pbsadmin workq STDIN – 1 1 – – E 00:00

Job run at Fri Nov 17 at 02:11 on (blrecvm-purify:ncpus=1)

C:\Users\pbsadmin.PBSWORKS>

C:\Users\pbsadmin.PBSWORKS>tracejob 410

Job: 410.blrecvm-purify

11/17/2017 02:11:23 L Considering job to run
11/17/2017 02:11:23 S Job Queued at request of pbsadmin@blrecvm-purify,
owner = pbsadmin@blrecvm-purify, job name = STDIN,
queue = workq
11/17/2017 02:11:23 L Job run
11/17/2017 02:11:23 S Job Run at request of Scheduler@blrecvm-purify on
exec_vnode (blrecvm-purify:ncpus=1)
11/17/2017 02:11:23 S Job Modified at request of Scheduler@blrecvm-purify
11/17/2017 02:11:24 M allowed pbsadmin to access window station and desktop,

User pbsadmin passworded
11/17/2017 02:11:24 S Obit received momhop:1 serverhop:1 state:4 substate:41

11/17/2017 02:11:24 M CreateProcess(AsUser) error=1314
11/17/2017 02:11:24 M CreateProcess(AsUser) error=1314
11/17/2017 02:11:24 M task 00000001 cput= 0:00:00
11/17/2017 02:11:24 M kill_job
11/17/2017 02:11:24 M BLRECVM-PURIFY cput= 0:00:00 mem=0kb
11/17/2017 02:11:24 M Obit sent
11/17/2017 02:11:24 M copy file cred request received
11/17/2017 02:11:31 M User pbsadmin passworded
11/17/2017 02:12:03 M Unable to copy file
C:/PROGRA~2/PBSPRO~1/home/spool/410.blrecvm-purify.OU
to
blrecvm-purify:C:/Users/pbsadmin.PBSWORKS/STDIN.o410
11/17/2017 02:12:35 M Unable to copy file
C:/PROGRA~2/PBSPRO~1/home/spool/410.blrecvm-purify.ER
to
blrecvm-purify:C:/Users/pbsadmin.PBSWORKS/STDIN.e410
11/17/2017 02:12:35 M staged 2 items out over 0:01:04
11/17/2017 02:12:42 M post_cpyfile: entered 410.blrecvm-purify
11/17/2017 02:12:42 M post_cpyfile: done 410.blrecvm-purify
11/17/2017 02:12:42 M delete job request received
11/17/2017 02:12:42 M kill_job
11/17/2017 02:12:42 S Post job file processing error
11/17/2017 02:12:42 S Exit_status=-1 resources_used.cput=00:00:00
resources_used.mem=0kb
resources_used.walltime=00:00:00

This is for pbs_version = PBSPro_13.1.0.160576

Acceptance Criteria

Jobs should run when MoM run with -N

Activity

Show:
Siddharth Sahay
January 17, 2018, 12:02 PM

Attaching valgrind output for server.

Assignee

Siddharth Sahay

Reporter

Siddharth Sahay

Severity

None

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Fix versions

Affects versions

Priority

Low
Configure