Results 1 to 5 of 5

Thread: very bad network performance

Hybrid View

  1. very bad network performance

    Dear experts,

    we are struggling with bad network performance on a SLES 11 SP4 Installation as PowerLinux LPAR. The LPARS are having installed 10 Gbit/s network adapters to them. Currently in our project we are communicating with AIX sandbox systems that only have 1 Gbit/s. we were able to increase the performance between those systems by 12 MB/s by simply upgrading the linux kernel from 3.0.101-63-ppc64 to 3.0.101-77-ppc64 via zypper, because now ethtool -k eth0 shows that tso and gro are on, that gave us the boost.

    however, after the kernel upgrade we tested the network connection between SLES on 10 Gbit/s and a producitve AIX machine on 10 Gbit/s, both residing in the same data center, but different ibm power servers, but in the same subnet of course, so no hops via traceroute are taken. We measured only 1,2 Gbit/s in speed, but between two AIX on 10 Gbit/s we are measuring > 7 Gbit/s on the same network.

    When i am querying the devices on the SLES machine via ethtool eth0 and ethtool eth1, the supported ports are detected as FIBRE, supported link modes are 1000baseT/Full and adterised is 1000baseT/full, auto-neogtiation is on, duplex is full and speed is only 1000 Mb/s. However that can't be really the case because as stated above we measured 1,2 Gbit/s between SLES 10 Gbit/s and AIX 10 Gbit/s.

    My concern is: shouldn't sles be able to advertise a speed of 10000 Mb/s? why does it only advertise 1000 Mb/s?
    Last edited by dafrk; 30-Jun-2016 at 11:28.

  2. #2

    Re: very bad network performance

    On 06/30/2016 04:34 AM, dafrk wrote:
    >
    > we are struggling with bad network performance on a SLES 11 SP4
    > Installation as PowerLinux LPAR. The LPARS are having installed 10
    > Gbit/s network adapters to them. Currently in our project we are
    > communicating with AIX sandbox systems that only have 1 Gbit/s. we were
    > able to increase the performance between those systems by 12 MB/s by
    > simply upgrading the linux kernel from 3.0.101-63-ppc64 to
    > 3.0.101-77-ppc64 via zypper, because now ethtool -k eth0 shows that tso
    > and gro are on, that gave us the boost.


    How fast were the SLES-to-AIX connections before the kernel upgrade? I
    see the max speed, and I see some increase, but I do not see the actual
    speed prior to the update.

    > however, after the kernel upgrade we tested the network connection
    > between SLES on 10 Gbit/s and a producitve AIX machine on 10 Gbit/s,
    > both residing in the same data center, but different ibm power servers,
    > but in the same subnet of course, so no hops via traceroute are taken.
    > We measured only 1,2 Gbit/s in speed, but between two AIX on 10 Gbit/s
    > we are measuring > 7 Gbit/s on the same network.


    What kind of speed do you get between two LPARs on the same host?

    > When i am querying the devices on the SLES machine via ethtool eth0 and
    > ethtool eth1, the supported ports are detected as FIBRE, supported link
    > modes are 1000baseT/Full and adterised is 1000baseT/full,
    > auto-neogtiation is on, duplex is full and speed is only 1000 Mb/s.
    > However that can't be really the case because as stated above we
    > measured 1,2 Gbit/s between SLES 10 Gbit/s and AIX 10 Gbit/s.


    In other virtual environments I have seen SLES send far faster than
    advertised because it's all virtual and at the end of the day the virtual
    bits moving between virtual systems are not really limited by 1 Gbit
    hardware, particularly when sending among VMs on the same host, thus my
    question above about intra-host communications among VMs.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  3. Re: very bad network performance

    Quote Originally Posted by ab View Post
    On 06/30/2016 04:34 AM, dafrk wrote:
    >
    > we are struggling with bad network performance on a SLES 11 SP4
    > Installation as PowerLinux LPAR. The LPARS are having installed 10
    > Gbit/s network adapters to them. Currently in our project we are
    > communicating with AIX sandbox systems that only have 1 Gbit/s. we were
    > able to increase the performance between those systems by 12 MB/s by
    > simply upgrading the linux kernel from 3.0.101-63-ppc64 to
    > 3.0.101-77-ppc64 via zypper, because now ethtool -k eth0 shows that tso
    > and gro are on, that gave us the boost.


    How fast were the SLES-to-AIX connections before the kernel upgrade? I
    see the max speed, and I see some increase, but I do not see the actual
    speed prior to the update.

    > however, after the kernel upgrade we tested the network connection
    > between SLES on 10 Gbit/s and a producitve AIX machine on 10 Gbit/s,
    > both residing in the same data center, but different ibm power servers,
    > but in the same subnet of course, so no hops via traceroute are taken.
    > We measured only 1,2 Gbit/s in speed, but between two AIX on 10 Gbit/s
    > we are measuring > 7 Gbit/s on the same network.


    What kind of speed do you get between two LPARs on the same host?

    > When i am querying the devices on the SLES machine via ethtool eth0 and
    > ethtool eth1, the supported ports are detected as FIBRE, supported link
    > modes are 1000baseT/Full and adterised is 1000baseT/full,
    > auto-neogtiation is on, duplex is full and speed is only 1000 Mb/s.
    > However that can't be really the case because as stated above we
    > measured 1,2 Gbit/s between SLES 10 Gbit/s and AIX 10 Gbit/s.


    In other virtual environments I have seen SLES send far faster than
    advertised because it's all virtual and at the end of the day the virtual
    bits moving between virtual systems are not really limited by 1 Gbit
    hardware, particularly when sending among VMs on the same host, thus my
    question above about intra-host communications among VMs.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...
    Hello,

    at first, let me thank you for your fast reply.

    Before the kernel upgrade we were @ about 600 Mbit/s, after on ~700 MBit/s. Before that we were @ only 200 Mbit/s, but we increased that by simply upgrading our IBM VIO-Server
    to version 2.4.2.32. We also weren't able to restore Backups from Tivoli Storage Until until upgrading that, too. Seems the power Kernel for Linux and the ibm support for PowerLinux is still work in progress.
    Between two LPARs on the same host we are getting 3948 Mbit/s. Note that this should be much faster as well because the communication between those lpars only passes the hypervisor itself and does not go out over network.

    I seem to get the best out of the network when using the following settings btw
    ethtool -K <device> tso on
    ethtool -K <device> gro on
    sysctl -w net.ipv4.tcp_sack = 0
    sysctl -w net.ipv4.tcp_fack=0
    sysctl -w net.ipv4.tcp_window_scaling = 1
    sysctl -w net.ipv4.tcp_no_metrics_save = 1
    sysctl -w net.core.rmem_max=12582912
    sysctl -w net.core.wmem_max=12582912
    sysctl -w net.core.netdav_max_backlog=9000
    sysctl -w net.core.somaxconn=512
    sysctl -w net.ipv4.tcp_rmem="4096 87380 9437184"
    sysctl -w net.ipv4.tcp_wmem="4096 87380 9437184"
    sysctl -w net.ipv4.ipfrag_low_thresh=393216
    sysctl -w net.ipv4.ipfrag_high_thresh=544288
    sysctl -w net.ipv4.tcp_max_syn_backlog=8192
    sysctl -w net.ipv4.tcp_synack_retries=3
    sysctl -w net.ipv4.tcp_retries2=6
    sysctl -w net.ipv4.tcp_keepalive_time=1000
    sysctl -w net.ipv4.tcp_keepalive_probes=4
    sysctl -w net.ipv4.tcp_keepalive_intvl=20
    sysctl -w net.ipv4.tcp_tw_recycle=1
    sysctl -w net.ipv4.tcp_tw_reuse=1
    sysctl -w net.ipv4.tcp_fin_timeout=30

    With these settings, i was able to boost SLES-10 Gbit to AIX-10 Gbit communication from 1,2 Gbit/s to 1,95 Gbit/s, but not further.

    Also, I cannot use a wide variety of ethtool-options, like ethtool -a, which i can on Intel-Linux installations.

    So yes, I guess you are write that the advertised speed has nothing to do with that. However, something's still fishy.

    Best regards
    Last edited by dafrk; 04-Jul-2016 at 10:44.

  4. Re: very bad network performance

    sorry for the double post.

    just wanted to add that my problem is that currently AIX to AIX 10 Gbit/s is making about 3 Gbit/s. So my boss is asking me why SLES is - still after tuning parameters - swallowing 1 Gbit/s down it'S throat while AIX are working fine when talking to each other.

  5. Re: very bad network performance

    Hi dafrk,
    Quote Originally Posted by dafrk View Post
    just wanted to add that my problem is that currently AIX to AIX 10 Gbit/s is making about 3 Gbit/s. So my boss is asking me why SLES is - still after tuning parameters - swallowing 1 Gbit/s down it'S throat while AIX are working fine when talking to each other.
    there's more to network performance parameters than the parameters you posted. And I couldn't see how you measured the throughput, as this would be telling the nature of the network traffic:

    - If you're transferring large amounts of data, using large packets would increase the over-all throughput. For 10G, you may have large Ethernet packets enabled (and usable) on the AIX side of things, while the Linux LPARs use the standard size

    - When transferring using a windowed protocol (where transmission of further packets requires a receiver's acknowledgment of the previously sent packets), different window sizing algorithms may affect the throughput

    - using different protocols for tests (or simply measuring the link utilization) would of course cause incomparable results (this is only for future readers, as I'm sure that you already took that into account)

    - of course, comparing virtualized environments with OSs running native would have to consider virtualization-specific effects, too (you didn't say if the "production servers" are LPARed AIX instances, so I'm just mentioning)

    Maybe you'd be able to tell differences when looking at the actual packet transfer - that'd give facts about packet and window sizes, to compare AIX-to-AIX vs AIX-to-SLES?

    Regards,
    J
    From the times when today's "old school" was "new school"

    If you find this post helpful and are logged into the web interface, show your appreciation and click on the star below...

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •