Results 1 to 6 of 6

Thread: Delayed first network connection

Hybrid View

  1. #1

    Delayed first network connection

    Hi,

    on an new installed SLES 11 SP3 server (fully pachted) I have a strange phenomenon:

    Any first network connection (ssh, vnc, http) results in a wait of about 8 to 20 seconds and then get a reply.
    When I close the ssh connection on the client and try another connection (ssh, vnc or http), I immediately get
    a response to login / the web page.

    I replaced the network card, nothing helps. We have a pure IPv4 network. Client and server are in the same subnet.

    For the network specialists: I montitored the network traffic between the client (136.199.34.34 bib-edv34.uni-trier.de)
    and the server (136.199.34.247 ub-zcm.uni-trier.de) with an ssh connection:

    -----------------------------------------------------------------------------------------------------------------
    ub-zcm:/tmp # netstat -pantc | grep 136.199.34.34
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 21 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: [accept
    tcp 0 0 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: root [p
    tcp 0 0 136.199.34.247:22 136.199.34.34:1304 VERBUNDEN 29658/sshd: root [p

    ub-zcm:/etc/init.d # tcpdump host 136.199.34.34
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on p6p1, link-type EN10MB (Ethernet), capture size 96 bytes
    07:19:29.743860 arp who-has ub-zcm.uni-trier.de tell bib-edv34.uni-trier.de
    07:19:29.743877 arp reply ub-zcm.uni-trier.de is-at 00:0a:f7:3e:39:20 (oui Unknown)
    07:19:29.744049 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: S 2026223864:2026223864(0) win 65535 <mss 1460,nop,nop,sackOK>
    07:19:29.744140 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: S 4197396408:4197396408(0) ack 2026223865 win 14600 <mss 1460,nop,nop,sackOK>
    07:19:29.744304 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: . ack 1 win 65535
    07:19:29.776946 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:32.783125 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:34.759187 arp who-has bib-edv34.uni-trier.de tell ub-zcm.uni-trier.de
    07:19:34.759332 arp reply bib-edv34.uni-trier.de is-at 00:1e:4f:f4:8b:90 (oui Unknown)
    07:19:38.791128 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:38.791291 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: . ack 22 win 65514
    07:19:38.850267 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 1:639(638) ack 22 win 65514
    07:19:38.850313 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: . ack 639 win 15312
    07:19:38.852104 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: . 22:1482(1460) ack 639 win 15312
    07:19:38.852124 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1482:1566(84) ack 639 win 15312
    07:19:38.852359 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: . ack 1566 win 65535
    07:19:38.852818 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 639:655(16) ack 1566 win 65535
    07:19:38.859502 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1566:1846(280) ack 655 win 15312
    07:19:38.961925 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 655:927(272) ack 1846 win 65255
    07:19:38.972329 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1846:2694(848) ack 927 win 16588
    07:19:39.163815 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 927:943(16) ack 2694 win 64407
    07:19:39.203127 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: . ack 943 win 16588
    07:19:39.203303 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 943:995(52) ack 2694 win 64407
    07:19:39.203343 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: . ack 995 win 16588
    07:19:39.203407 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 2694:2746(52) ack 995 win 16588
    07:19:39.225987 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 995:1063(68) ack 2746 win 64355
    07:19:39.228870 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 2746:2814(68) ack 1063 win 16588
    07:19:39.229640 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 1063:1163(100) ack 2814 win 64287
    07:19:39.237201 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 2814:2882(68) ack 1163 win 16588
    07:19:39.503134 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 2814:2882(68) ack 1163 win 16588
    07:19:39.503311 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: . ack 2882 win 64219
    ^C
    31 packets captured
    31 packets received by filter
    0 packets dropped by kernel
    -----------------------------------------------------------------------------------------------------------------

    The Send-Q ist filled about 9 seconds and then empty. This is exact the time I have to wait for getting the password
    prompt.

    Does somebody know this phenomenon?
    How ca I analyse the Send-Q problem?

    If I transfer a 3 GB file to the server, it interupts after about 1.4 GB with a network error.

    I have no ideas anymore to solve the problem.

    Regards
    Harald Sauerwein

  2. Re: Delayed first network connection

    Hi Harald,

    > Any first network connection (ssh, vnc, http) results in a wait of about 8 to 20 seconds and then get a reply.
    > When I close the ssh connection on the client and try another connection (ssh, vnc or http), I immediately get
    > a response to login / the web page.

    such descriptions trigger my "DNS reverse resolution" association

    Is the target server able to resolve the client's IP address to a matching host name in time? For a test, you might want to enter an according entry to /etc/hosts, make sure "files" are checked before "DNS", and rerun your test. Or simply try to resolve a new client's IP address from the server command line before the client's first connection attempt.

    Once the first DNS query was run, the result is most probably cached on the server (by nscd), hence the speedier reaction of the server on future requests.

    The aborting file transfer would have nothing to do with DNS, though.

    Regards,
    Jens
    From the times when today's "old school" was "new school"

    If you find this post helpful and are logged into the web interface, show your appreciation and click on the star below...

  3. #3

    Re: Delayed first network connection

    Hi Jens,

    DNS requests are fast and the client's IP name and or address is resolved in time. I also tested to add the client in the hosts file, same behaviour.
    With ' make sure "files" are checked before "DNS" ' you mean the entries in nsswitch.conf? They are:
    hosts:files dns
    networks: files dns

    Do you have any idea how to debug my problem?

    Regards,
    Harald

  4. Re: Delayed first network connection

    Hi Harald,
    Quote Originally Posted by sauerwein View Post
    Hi Jens,

    DNS requests are fast and the client's IP name and or address is resolved in time. I also tested to add the client in the hosts file, same behaviour.
    With ' make sure "files" are checked before "DNS" ' you mean the entries in nsswitch.conf? They are:
    hosts:files dns
    networks: files dns
    yes, all that is like I meant it - I sorry that this didn't solve the problem.

    Quote Originally Posted by sauerwein View Post
    Do you have any idea how to debug my problem?
    A key element is to understand what is happening where. From your initial report:
    Code:
    ub-zcm:/etc/init.d # tcpdump host 136.199.34.34
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on p6p1, link-type EN10MB (Ethernet), capture size 96 bytes
    07:19:29.743860 arp who-has ub-zcm.uni-trier.de tell bib-edv34.uni-trier.de
    07:19:29.743877 arp reply ub-zcm.uni-trier.de is-at 00:0a:f7:3e:39:20 (oui Unknown)
    07:19:29.744049 IP bib-edv34.uni-trier.de.boomerang >  ub-zcm.uni-trier.de.ssh: S 2026223864:2026223864(0) win 65535 <mss  1460,nop,nop,sackOK>
    07:19:29.744140 IP ub-zcm.uni-trier.de.ssh >  bib-edv34.uni-trier.de.boomerang: S 4197396408:4197396408(0) ack  2026223865 win 14600 <mss 1460,nop,nop,sackOK>
    07:19:29.744304 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: . ack 1 win 65535
    07:19:29.776946 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:32.783125 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:34.759187 arp who-has bib-edv34.uni-trier.de tell ub-zcm.uni-trier.de
    07:19:34.759332 arp reply bib-edv34.uni-trier.de is-at 00:1e:4f:f4:8b:90 (oui Unknown)
    07:19:38.791128 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:38.791291 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: . ack 22 win 65514
    07:19:38.850267 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: P 1:639(638) ack 22 win 65514
    07:19:38.850313 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: . ack 639 win 15312
    If I read this right, the trace was taken on the server side ("ub-zcm") and is filtering traffic concerning the client host (136.199.34.34, "bib-edv34").

    - The client send an ARP request to get to the server (so they are on the same physical network/broadcast domain) and receives an answer
    - The initial session establishment takes place (up to time stamp 07:19:29.776946)
    - then somehow, the packets from the server to the client get dropped:
    Code:
    07:19:29.776946 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:32.783125 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:34.759187 arp who-has bib-edv34.uni-trier.de tell ub-zcm.uni-trier.de
    07:19:34.759332 arp reply bib-edv34.uni-trier.de is-at 00:1e:4f:f4:8b:90 (oui Unknown)
    07:19:38.791128 IP ub-zcm.uni-trier.de.ssh > bib-edv34.uni-trier.de.boomerang: P 1:22(21) ack 1 win 14600
    07:19:38.791291 IP bib-edv34.uni-trier.de.boomerang > ub-zcm.uni-trier.de.ssh: . ack 22 win 65514
    You can see that the server's ack is initially sent on 07:19:29.776946, then retransmitted on 07:19:32.783125. The server then re-runs an ARP sequence for bib-edv34, retransmits the ACK another time (07:19:38.791128) and now receives the proper response(s) immediately (07:19:38.791291).

    There may be several reasons for this. First simple check would be a trace on bib-edv34, to see what is received, and when - do the packets from the server get dropped (do not reach the client), or are the client's responses dropped on their way to the server?

    I'm *asssuming* that the MAC address (as returned by the ARP sequence at 07:19:34.759332) is the same as during the initial communication.

    Next step, after determining which packets actually get dropped, would be to check the network switch(es) for details on the according network path and indications of errors. Port statistics (any errors on the ports along the path), ARP storms, unexpected adapter relocations, ... anything unusual.

    Regards,
    Jens
    From the times when today's "old school" was "new school"

    If you find this post helpful and are logged into the web interface, show your appreciation and click on the star below...

  5. #5

    Re: Delayed first network connection

    Hi Jens,

    the problem is solved!
    After creating wireshark network captures on client and server at the same time we saw:
    2 13:43:40.975 Dell_f4:8b:90 Broadcast ARP Who has 136.199.34.247? Tell 136.199.34.34
    3 13:43:40.976 Broadcom_3e:39:20 Dell_f4:8b:90 ARP 136.199.34.247 is at 00:0a:f7:3e:39:20
    ...
    8 13:43:41.006 DellEsgP_f6:e6:48 Dell_f4:8b:90 ARP 136.199.34.247 is at 00:13:72:f6:e6:48

    The problem was an old DELL PowerEdge 2800 server where the remote access card had the same IP as ub-zcm.
    The new dell PowerEdge servers do respond to a ping at there iDRAC remote access cards IP, the old ones do not.

    Normally when using an dupklicat ip I'm used to be getting informed with messages like "duplicate ip" or something else.

    Thanks a lot for your proactivity and tips the help me solving this unusual problem!

    Kind regards
    Harald Sauerwein

  6. Re: Delayed first network connection

    Hi Harald,

    great to see you got it working, and thank you for reporting back the root cause!

    Duplicate IP addresses can be a *****, fortunately you're one of those that can read a tcpdump trace to identify the issue... and that duplicate IP is most probably also the cause of the aborting transfer you mentioned in your initial message.

    With regards
    Jens
    From the times when today's "old school" was "new school"

    If you find this post helpful and are logged into the web interface, show your appreciation and click on the star below...

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •