Miscelleneous fixes on TPP usage in net_server.c

Description

In -DDEBUG mode server prints the following errors:

close_conn: Not the creator. Not closing connection
close_conn: Not the creator. Not closing connection

At qterm, it prints:
close_conn: Not the creator. Not closing connection
Server@centosvm1: cleanup_conn, tpp_em_del_fd, Remove from poll list failed for sock 10, errno=9

close_conn: Not the creator. Not closing connection

From the code:

[subhas@centosvm1 pbspro]$ find . -type f -exec grep "cn_pid" {} \; -print
pid_t cn_pid; /* process id of the creator */
./src/include/net_connect.h
if (svr_conn[conn_idx]->cn_pid != getpid()) { /* Close connection only if I am the process who created the connection */
./src/lib/Libnet/net_server.c

Acceptance Criteria

None

Activity

Show:
Subhasis Bhattacharya
December 24, 2017, 5:36 AM

Assigning to myself. Discovered a potential problem with epoll() implementation in the kernel, which renders epoll() quite dangerous (logically, not memory errors etc) when used with fork. We use forks all the time.

If net_close() is called in a forked child (which is actually called from all fork_me), it will call close() on all the socket descriptors which all also call epoll_ctl(DEL_FD) to remove it from the epoll fd itself. However, this will remove it from the parent itself and leave the parent dumb!

Solution:
1. Remove the use of cn_pid in the connection structure.
2. Add it transparently into the tpp_em_create/destroy/add/del functionality. If possible add a fork handler to do it transparently.
3. Refactor a few variable names in net_server.c

Assignee

Subhasis Bhattacharya

Reporter

Subhasis Bhattacharya

Severity

None

OS

None

Start Date

None

Pull Request URL

None

Story Points

1

Components

Fix versions

Priority

High
Configure