Page 2 of 2 FirstFirst 12
Results 11 to 18 of 18

Thread: Slow network response if 1 CPU is on 100% load

  1. Re: Slow network response if 1 CPU is on 100% load

    Hi Tamas,

    I don't know too much about the internals of the current ESX versions, but network packet handling (passing between physical and virtual device) is done in software. Might it happen that for some bad coincidence the same physical CPU is used both for the loaded vCPU and the bridging software at the hypervisor level?

    Regards,
    Jens
    From the times when today's "old school" was "new school"

    If you find this post helpful and are logged into the web interface, show your appreciation and click on the star below...

  2. Re: Slow network response if 1 CPU is on 100% load

    Hello Jens,

    True, it could happen that the same physical CPU core handles the network traffic for the physical host and I load the same core with the vcpu process. But 3 times right after each other? I don't think this is bad luck.
    I countinue test the workaround on another esxi host.
    I will post back the results.

    Regards,

    Tamas

  3. #13

    Re: Slow network response if 1 CPU is on 100% load

    uracst wrote:

    >
    > This makes no sense, because the recommanded network card for SLES11
    > is vmxnet3.
    > Anyway e1000 is obsolete and will be removed in further versions.
    >
    > I just moved the VM between physical servers, to see there is any
    > difference. I ran my tests and figured that I cannot reproduce the
    > solution with e1000. I even moved the VM back to its original location
    > and even there, I cannot solve the issue just by replacing the nic
    > card type.
    > Well this isn't funny at all.


    While I believe it's better to diagnose the problem before trying to
    fix it, sometimes it's easier to try a few quick fixes.

    Since your issue is related to your network card(s) and there have been
    some performance issues with TCP offloading, especially in a VM, you
    may want to look into temporarily disabling this feature.

    https://www.suse.com/support/kb/doc.php?id=7005304

    While I have not seen this feature cause excessive CPU utilisation, it
    is easy enough to turn it off to see.

    --
    Kevin Boyle - Knowledge Partner
    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  4. Re: Slow network response if 1 CPU is on 100% load

    Quote Originally Posted by KBOYLE View Post
    uracst wrote:

    >
    > This makes no sense, because the recommanded network card for SLES11
    > is vmxnet3.
    > Anyway e1000 is obsolete and will be removed in further versions.
    >
    > I just moved the VM between physical servers, to see there is any
    > difference. I ran my tests and figured that I cannot reproduce the
    > solution with e1000. I even moved the VM back to its original location
    > and even there, I cannot solve the issue just by replacing the nic
    > card type.
    > Well this isn't funny at all.


    While I believe it's better to diagnose the problem before trying to
    fix it, sometimes it's easier to try a few quick fixes.

    Since your issue is related to your network card(s) and there have been
    some performance issues with TCP offloading, especially in a VM, you
    may want to look into temporarily disabling this feature.

    https://www.suse.com/support/kb/doc.php?id=7005304

    While I have not seen this feature cause excessive CPU utilisation, it
    is easy enough to turn it off to see.

    --
    Kevin Boyle - Knowledge Partner
    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...
    Hello Kevin,

    Thank You for the answer.
    Here is the output from the ethtool:
    ethtool -k eth0
    Offload parameters for eth0:
    rx-checksumming: on
    tx-checksumming: on
    scatter-gather: on
    tcp-segmentation-offload: on
    udp-fragmentation-offload: off
    generic-segmentation-offload: on
    generic-receive-offload: on
    large-receive-offload: on
    rx-vlan-offload: on
    tx-vlan-offload: on
    ntuple-filters: off
    receive-hashing: off

    Do You see anything suspicious?

    Regards,

    Tamas
    Last edited by uracst; 27-Mar-2014 at 07:23.

  5. Re: Slow network response if 1 CPU is on 100% load

    Hello Kevin,

    I turned off everythind what is could:
    ethtool -k eth0
    Offload parameters for eth0:
    rx-checksumming: off
    tx-checksumming: off
    scatter-gather: off
    tcp-segmentation-offload: off
    udp-fragmentation-offload: off
    generic-segmentation-offload: off
    generic-receive-offload: off
    large-receive-offload: off
    rx-vlan-offload: on
    tx-vlan-offload: off
    ntuple-filters: off
    receive-hashing: off

    The problem is still there. I will update my WMWare SR today.

    Tamas

  6. #16

    Re: Slow network response if 1 CPU is on 100% load

    uracst wrote:

    >
    > Hello Kevin,
    >
    > I turned off everythind what is could:
    > ethtool -k eth0
    > Offload parameters for eth0:
    > rx-checksumming: off
    > tx-checksumming: off
    > scatter-gather: off
    > tcp-segmentation-offload: off
    > udp-fragmentation-offload: off
    > generic-segmentation-offload: off
    > generic-receive-offload: off
    > large-receive-offload: off
    > rx-vlan-offload: on
    > tx-vlan-offload: off
    > ntuple-filters: off
    > receive-hashing: off
    >
    > The problem is still there. I will update my WMWare SR today.
    >
    > Tamas


    Thanks for the feedback, Tamas. IMO, it was worth a try and didn't cost
    anything other than a few minutes of your time.

    --
    Kevin Boyle - Knowledge Partner
    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  7. #17

    Re: Slow network response if 1 CPU is on 100% load

    I know this thread is already quite old. but we were facing the same issue the last days and since I didn't find an answer to this issue on the net I would like to share our solution.

    Background:
    We have a farm of 70+ VMware hosts with windows, and linux VMs.
    We experienced a poor latency behavior on quite a few linux vms when one of the cpus (also one multi core VMs) was busy.
    Strangely enough this didn't happen on all VMs.
    It was also not consistent on one cluster or ESXi-host.

    Solution:
    When comparing the *.vmx files from the VMs we found a parameter called sched.cpu.latencySensitivity.
    Code:
    sched.cpu.latencySensitivity = "low"
    On the Systems with poor latency values this was set to "low" while the others had "normal".
    Changing this parameter to normal solved the problem!
    Code:
    sched.cpu.latencySensitivity = "normal"
    1. Check in the *.vmx-file if the parameter sched.cpu.latencySensitivity is set to "low"
    2. If yes. Shutdown the VM.
    3. Either the *.vmx file directly. Or go to: "Edit Settings" -> "Options" -> "General" -> "Configuration Parameters" and change the parameter in there. Note: This will only be available if the VM is powerd off.
    4. Start the VM
    5. Test (i.e. with the script mentioned in the posts before)

    have fun

  8. Re: Slow network response if 1 CPU is on 100% load

    Hi Arnkit,

    Thank You, I can confirm that this advanced vmware setting indeed is a solution for us too.

    KR,

    Tamas

Page 2 of 2 FirstFirst 12

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •