PDA

View Full Version : SLES 11 SP3 delay of 10+ secs between TCP/IP connect and return of poll



gurucubano
17-Mar-2016, 12:29
Hello,

We have a server application running which has a LISTEN on port 8115 and waits with the poll(2) syscall for incoming client connections:


# LANG=C netstat -an | fgrep 8115
tcp 0 0 0.0.0.0:8115 0.0.0.0:* LISTEN
tcp 0 0 10.146.200.41:8115 10.146.202.252:1106 ESTABLISHED
tcp 0 0 10.146.200.41:8115 10.146.202.19:62295 ESTABLISHED
tcp 0 0 10.146.200.41:8115 10.146.201.54:63310 ESTABLISHED
...

From time to time it happens that the TCP connection is established fine and the client is sending already data, as you can see in the TCPDUMP:


# tcpdump -n -S -r 8115.tcp.02 | fgrep 10.146.201.54.59273
reading from file 8115.tcp.02, link-type EN10MB (Ethernet)
09:18:39.694636 IP 10.146.201.54.59273 > 10.146.200.41.8115: S 3430710974:3430710974(0) win 8192 <mss 1452,nop,wscale 2,nop,n
op,sackOK>
09:18:39.694649 IP 10.146.200.41.8115 > 10.146.201.54.59273: S 2986597784:2986597784(0) ack 3430710975 win 14600 <mss 1460,no
p,nop,sackOK,nop,wscale 7>
09:18:39.750540 IP 10.146.201.54.59273 > 10.146.200.41.8115: . ack 2986597785 win 16698
09:18:39.751543 IP 10.146.201.54.59273 > 10.146.200.41.8115: P 3430710975:3430711016(41) ack 2986597785 win 16698
09:18:40.055344 IP 10.146.201.54.59273 > 10.146.200.41.8115: P 3430710975:3430711016(41) ack 2986597785 win 16698
09:18:40.655377 IP 10.146.201.54.59273 > 10.146.200.41.8115: P 3430710975:3430711016(41) ack 2986597785 win 16698
09:18:41.855471 IP 10.146.201.54.59273 > 10.146.200.41.8115: P 3430710975:3430711016(41) ack 2986597785 win 16698
09:18:41.855496 IP 10.146.200.41.8115 > 10.146.201.54.59273: . ack 3430711016 win 115
09:18:42.715302 IP 10.146.200.41.8115 > 10.146.201.54.59273: P 2986597785:2986597799(14) ack 3430711016 win 115
09:18:42.757938 IP 10.146.201.54.59273 > 10.146.200.41.8115: P 3430711016:3430711036(20) ack 2986597799 win 16694
09:18:42.758026 IP 10.146.200.41.8115 > 10.146.201.54.59273: . ack 3430711036 win 115
09:18:42.766887 IP 10.146.200.41.8115 > 10.146.201.54.59273: P 2986597799:2986597914(115) ack 3430711036 win 115
09:18:43.035501 IP 10.146.201.54.59273 > 10.146.200.41.8115: . ack 2986597914 win 16665
...

but the server needs seconds more to be waked up in its poll():


13455 08:25:59.162706 poll([{fd=3, events=POLLIN|POLLPRI}, {fd=-1}], 2, 4294967295 <unfinished ...>
13455 09:18:41.613083 <... poll resumed> ) = 1 ([{fd=3, revents=POLLIN}]) <3162.450367>
13455 09:18:41.613250 --- SIGCHLD (Child exited) @ 0 (0) ---
13455 09:18:41.613789 waitpid(-1, <unfinished ...>
13455 09:18:41.613853 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 13491 <0.000057>
13455 09:18:41.613950 gettimeofday( <unfinished ...>
13455 09:18:41.613970 <... gettimeofday resumed> {1458029921, 613958}, NULL) = 0 <0.000013>
13455 09:18:41.613992 stat64("/etc/localtime", <unfinished ...>
13455 09:18:41.614015 <... stat64 resumed> {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0 <0.000013>
13455 09:18:41.614060 write(1, "15.03.2016 09:18:41.613 Sip2Sln"..., 97 <unfinished ...>
13455 09:18:41.614148 <... write resumed> ) = 97 <0.000060>
13455 09:18:41.614172 waitpid(-1, <unfinished ...>
13455 09:18:41.614240 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 13501 <0.000053>
13455 09:18:41.614293 gettimeofday( <unfinished ...>
13455 09:18:41.614334 <... gettimeofday resumed> {1458029921, 614327}, NULL) = 0 <0.000029>
13455 09:18:41.614365 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0 <0.000012>
13455 09:18:41.614439 write(1, "15.03.2016 09:18:41.614 Sip2Sln"..., 97) = 97 <0.000012>
13455 09:18:41.614486 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 13515 <0.000027>
13455 09:18:41.614547 gettimeofday({1458029921, 614563}, NULL) = 0 <0.000009>
13455 09:18:41.614592 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0 <0.000010>
13455 09:18:41.614657 write(1, "15.03.2016 09:18:41.614 Sip2Sln"..., 97) = 97 <0.000011>
13455 09:18:41.614703 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 13524 <0.000024>
13455 09:18:41.614881 gettimeofday( <unfinished ...>
13455 09:18:41.614913 <... gettimeofday resumed> {1458029921, 614898}, NULL) = 0 <0.000020>
13455 09:18:41.614960 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0 <0.000013>
13455 09:18:41.615505 write(1, "15.03.2016 09:18:41.614 Sip2Sln"..., 97 <unfinished ...>
13455 09:18:41.615982 <... write resumed> ) = 97 <0.000468>
13455 09:18:41.616064 waitpid(-1, <unfinished ...>
13455 09:18:41.616097 <... waitpid resumed> 0xff81e938, WNOHANG) = 0 <0.000027>
13455 09:18:41.616120 sigreturn( <unfinished ...>
13455 09:18:41.616155 <... sigreturn resumed> ) = 1 <0.000029>
13455 09:18:41.616186 accept(3, <unfinished ...>
13455 09:18:41.616244 <... accept resumed> {sa_family=AF_INET, sin_port=htons(59903), sin_addr=inet_addr("10.146.201.55")}, [16]) = 8 <0.000051>
...
13455 09:18:42.236844 accept(3, <unfinished ...>
13455 09:18:42.236892 <... accept resumed> {sa_family=AF_INET, sin_port=htons(59273), sin_addr=inet_addr("10.146.201.54")}, [16]) = 8 <0.000036>

What could cause this? The interfaces are:


# ifconfig -a
bond0 Link encap:Ethernet Hardware Adresse 40:F2:E9:21:FB:6A
inet Adresse:10.146.200.41 Bcast:10.146.200.255 Maske:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:12802814 errors:0 dropped:2230 overruns:0 frame:0
TX packets:24974438 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 Sendewarteschlangenlänge:0
RX bytes:1444539093 (1377.6 Mb) TX bytes:28234489868 (26926.5 Mb)

bond0:L00 Link encap:Ethernet Hardware Adresse 40:F2:E9:21:FB:6A
inet Adresse:10.146.200.40 Bcast:10.146.200.255 Maske:255.255.255.0
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1

eth0 Link encap:Ethernet Hardware Adresse 40:F2:E9:21:FB:6A
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:67641 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 Sendewarteschlangenlänge:1000
RX bytes:4436676 (4.2 Mb) TX bytes:0 (0.0 b)
Speicher:c5a60000-c5a80000

eth1 Link encap:Ethernet Hardware Adresse 40:F2:E9:21:FB:6A
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:12735173 errors:0 dropped:0 overruns:0 frame:0
TX packets:24974438 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 Sendewarteschlangenlänge:1000
RX bytes:1440102417 (1373.3 Mb) TX bytes:28234489868 (26926.5 Mb)
Speicher:c5a40000-c5a60000

Thanks in advance

Matthias

jmozdzen
17-Mar-2016, 13:05
Hi Matthias,

> and the client is sending already data, as you can see in the TCPDUMP:

do you have the full capture, and looked at it in i.e. wireshark (including error diagnostics)? Where was the capture run, on the server or on the client? I believe to see 3 retries for the first "data" packet (starting at 09:18:39.751543), maybe it's some network issue?

Regards,
Jens

gurucubano
17-Mar-2016, 13:32
Hi Matthias,

> and the client is sending already data, as you can see in the TCPDUMP:

do you have the full capture, and looked at it in i.e. wireshark (including error diagnostics)? Where was the capture run, on the server or on the client? I believe to see 3 retries for the first "data" packet (starting at 09:18:39.751543), maybe it's some network issue?


Hi Jens,

Yes, I have the full capture from both sides and they show both the same packages; and yes, the client is already after the successful SYN-SYN-ACK sending the first data package of 41 bytes and due to the missing ACK's it's repeating the same package three times more until the ACK of the server is sent (late) at 09:18:41.855496.

The real question for me is: the SYN-SYN-ACK connection establishment is fine, also from the Linux side; and the kernel should have waked-up with sending the ACK the listening process. Why this is not happening?

Regards

matthias

jmozdzen
17-Mar-2016, 14:04
Hi Matthias,

since the traces on both network ends are identical I would rule out network components as a problem source, too.

The strace does imply that your server application is working with child processes, which may be contributing to scheduling delays - both timing- and signal-wise. I've never seen (major) delays when using simple process environment, do you have the option to run a similar test case where the server process is just waiting for the call/data, without handling child processes?

Delays in the mentioned range (10+ seconds) are not what I would assume to be "regular" delays, which would call for real-time extension to circumvent. My gut feeling is rather pointing at (side) effects from running child processes.

Regards,
Jens

gurucubano
18-Mar-2016, 11:59
Hi,

A close look into the strace output of all involved processes (daemon and its ~15 childs) show that there are time windows in where none of the processes issues
any system call and after this the duration of the syscalls blocked in the gap is displayed as the length of the time gap of 10-20 secs. A look in /var/log/warn shows messages about
aborted SAN commands like this:


# tail /var/log/warn
Mar 17 19:28:24 l000sa23 kernel: [141382.222387] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 17 19:28:24 l000sa23 kernel: [141382.222616] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 00:05:32 l000sa23 kernel: [157977.984232] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 01:08:04 l000sa23 kernel: [161722.869694] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 01:08:04 l000sa23 kernel: [161722.870041] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 03:15:30 l000sa23 kernel: [169354.064396] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 05:21:21 l000sa23 kernel: [176890.252273] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 05:40:40 l000sa23 kernel: [178047.053723] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 10:15:30 l000sa23 kernel: [194505.164046] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.
Mar 18 11:22:16 l000sa23 kernel: [198503.443273] qla2xxx [0000:8b:00.0]-801c:8: Abort command issued nexus=8:2:1 -- 1 2002.

which coinciden with the time of the gaps.

We can close this thread here.
Thanks

matthias

jmozdzen
18-Mar-2016, 13:30
Hi Matthias,

> A look in /var/log/warn shows messages about aborted SAN commands

that's definitely a potential explanation... waiting for I/O.

Thank you for reporting back & good luck sorting out the SAN issues!

Regards,
Jens