pbs_mom dumped core in tpp_em_destroy

Description

The corefile was produced with a mainline build from April 20th. Not sure what triggered it, but found a corefile in mom_priv. Corefile was produced on a CentOS 7 system. RPMs and core file attached.

(gdb) where
#0 0x00007f950dcd11f7 in raise () from /lib64/libc.so.6
#1 0x00007f950dcd28e8 in abort () from /lib64/libc.so.6
#2 0x00007f950dd10f47 in __libc_message () from /lib64/libc.so.6
#3 0x00007f950dd18619 in _int_free () from /lib64/libc.so.6
#4 0x00000000004828f4 in tpp_em_destroy (em_ctx=0x25ffd30)
at ../../../../src/lib/Libtpp/tpp_em.c:180
#5 0x00000000004854f6 in tpp_transport_terminate ()
at ../../../../src/lib/Libtpp/tpp_transport.c:2428
#6 0x000000000047db6c in tpp_terminate () at ../../../../src/lib/Libtpp/tpp_client.c:1591
#7 0x00007f950dd5b3bf in fork () from /lib64/libc.so.6
#8 0x0000000000451ec3 in rmtmpdir (jobid=jobid@entry=0x266c998 "273.swdev")
at ../../../src/resmom/start_exec.c:1048
#9 0x0000000000420420 in job_purge (pjob=0x266c810) at ../../../src/server/job_func.c:823
#10 0x000000000041df25 in main (argc=1, argv=<optimized out>)
at ../../../src/resmom/mom_main.c:9982

Acceptance Criteria

None

Activity

Show:
Subhasis Bhattacharya
April 25, 2018, 2:53 AM
Edited

Ignore the earlier comment (deleted now). Still can't find the reason or reproduce. Another mystery. The line that does the free was there was start of TPP.

Scott Campbell
May 24, 2018, 1:31 AM

We believe that the fix provided by at https://github.com/PBSPro/pbspro/pull/698 addresses this bug, though we do not have a way to prove it. I am assigning this ticket to , associating it with the PR mentioned, and closing it.

Thanks for the contribution!

Assignee

Václav Chlumský

Reporter

Michael Karo

Severity

None

OS

None

Start Date

None

Pull Request URL

Story Points

1

Components

Fix versions

Affects versions

Priority

Highest
Configure