Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

  1. SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    Hi,

    i have a SLES 10 SP4 guest running slowly in a SLES 11 SP4 host. The guest is running a small web application with a MySQL DB, an apache webserver and some perl scripts. The DB is not really busy, maybe some hundreds requests per day.
    But the system performs slowly, especially on a console. Connecting with ssh, e.g. top need between 2 and 3 seconds to refresh its output. It's really not funny to work on the console.
    The system has 4 virtual cpus and 8gb of ram. System was before a physical system which was migrated to a vm. The physical system just has 2 cpu's but is running fine. The host is very performanent, 8 cores and 96gb of ram. The host is running fine, no performance problem. And the guest is not a heavy load for it. It's KVM. Vmx flag is set on the host cores.
    What i see is that the guest has constantly high si in top.

    Here a typical example:

    top - 14:12:31 up 38 min, 9 users, load average: 0.81, 0.69, 0.60
    Tasks: 111 total, 2 running, 109 sleeping, 0 stopped, 0 zombie
    Cpu0 : 0.0%us, 1.6%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 2.8%si, 0.0%st
    Cpu1 : 1.1%us, 1.1%sy, 0.0%ni, 86.8%id, 3.9%wa, 0.0%hi, 7.1%si, 0.0%st
    Cpu2 : 0.6%us, 0.6%sy, 0.0%ni, 58.7%id, 0.0%wa, 5.2%hi, 34.8%si, 0.0%st
    Cpu3 : 0.3%us, 0.3%sy, 0.0%ni, 85.8%id, 0.0%wa, 0.3%hi, 13.2%si, 0.0%st
    Mem: 7995120k total, 1122420k used, 6872700k free, 75368k buffers
    Swap: 2104472k total, 0k used, 2104472k free, 790208k cached

    You see, the system isn't doing much. Very little IO, just a little bit load in user and system context, but much si.
    How can i find out from where this si come from ? IIRC correctly, i had this problem already. I updated the system completely to SLES 11 SP4,
    and the high si were history. But this was just for test, the system needs to be a SLES 10 SP4.
    Maybe the kernel or some modules are the culprit ? I tried to update only the kernel, but it's not possible.
    Tons of dependencies.

    Thanks for any help.


    Bernd

  2. #2

    Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    I have not seen this, but I also do not know that I have ever had SLES 10
    as a KVM guest, where I have had SLES 11 and 12 as such, both with at
    least the 3.x kernel.

    One thread I found online said another user was helped with a similar
    kernel (version) by changing the clock settings on the guest:

    https://www.centos.org/forums/viewtopic.php?t=17663

    I have not had to hack kernel clock settings for a while, but "a while"
    means back to the SLES 9 or 10 days, so maybe this is valid for you after
    all; if nothing else it's a simple test.

    I found another thread that indicated a qemu patch had been submitted to
    help with si being high due to network traffic, but your box doesn't sound
    busy enough to warrant that.... still who knows. If it is "just" a simple
    LAMP box, have you considered upgrading? In my experience these types of
    upgrades are really easy and low risk, with the hardest part being keeping
    downtime to a minimum as you move over the last (if applicable) database
    changes going form the old to new systems, but even that can be really
    simple depending on the application's use of the DB.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  3. Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    Hi ab,

    unfortunately that didn't help. Still high si. Is there a way to find the reason for that ? It's just a "simple" Lamp but ...
    Although it's not heavily utilized, it's really absolut buisness-critical. Downtime over the day must not happen.
    And the developer who cared about it has left us and we will get no substitute. My collegue and i really don't want to upgrade it,
    because we can't be absolut sure that new versions of mysql, perl and apache don't interfere the application.
    And we are both no perl developers. And the app is very mighty, it has a lot of functionality. Testing every function is like hell,
    and we surely will oversee some. That's the problem. Downtime in the evening or at the weekend could last for hours,
    that's not the point.

    Bernd

  4. #4

    Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    Bummer; I have never, as far as I know, seen an issue with si being high,
    so I do not have a lot of experience there, thus my results all being from
    Google last time.

    I understand the simple-yet-critical argument; having the original
    developer gone is painful but that is where you are. If you can come up
    with a couple reliable ways to monitor the system, and test the critical
    functions fairly quickly, I think your best bet is probably still to
    upgrade. The nice thing about this type of setup, usually, is that you
    can copy it to a new system, pound it with tests for an hour/day/week, and
    then when you are ready to swap over just refresh the data in the database
    and update DNS pointers (hopefully clients are using DNS) or change IP
    addresses over and that's it. Falling back to the original system is just
    as easy, in case you find something was missed after some time.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  5. #5

    Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    What do you get from the following command:

    Code:
    cat /proc/interrupts
    I'm trying to find ways to find causes of interrupts, and the types of
    interrupts, and perhaps this virtual file will help us.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  6. Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    I would like to upgrade. But as said the system is buisness-critical. Really. We are a research institute, and the DB is for our mice breeding. We have about 5.000 mice, and if the DB is inconsistent or buggy, we are completely lost.
    And the source code is not really fine and well written, we saw already some lines which are horrible. And the DB, which was not developped by ourself, has no relations !!! Just one table has a primary key. So now you know what we are talking about.
    The perl-scripts have in sum about 60.000 lines of code. That's about 2000 pages. That's not really big, surely there is a mass of bigger applications. But it's big enough that we hesitate.
    If someone could promise me that an upgrade will run smoothly at 100 %, ok. Do you know someone who will promise that ? If you see any way, tell me.

    Bernd

  7. Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    I did it three times that you get a better insight:

    vm58820-4:~ # cat /proc/interrupts
    CPU0 CPU1 CPU2 CPU3
    0: 439757 0 0 0 IO-APIC-edge timer
    1: 53 0 285 0 IO-APIC-edge i8042
    8: 1 0 0 0 IO-APIC-edge rtc
    9: 0 0 0 0 IO-APIC-level acpi
    10: 1 0 0 0 IO-APIC-level virtio2, uhci_hcd:usb1
    12: 104 0 0 0 IO-APIC-edge i8042
    15: 122 0 0 10637 IO-APIC-edge ide1
    177: 0 0 0 0 PCI-MSI-X virtio1-config
    185: 6906 26624 0 0 PCI-MSI-X virtio1-requests
    193: 0 0 0 0 PCI-MSI-X virtio0-config
    201: 146 0 38029 0 PCI-MSI-X virtio0-input
    209: 86 0 0 1551 PCI-MSI-X virtio0-output
    NMI: 0 0 0 0
    LOC: 881734 905777 481458 882053
    ERR: 0
    MIS: 0
    vm58820-4:~ # cat /proc/interrupts
    CPU0 CPU1 CPU2 CPU3
    0: 440141 0 0 0 IO-APIC-edge timer
    1: 53 0 285 0 IO-APIC-edge i8042
    8: 1 0 0 0 IO-APIC-edge rtc
    9: 0 0 0 0 IO-APIC-level acpi
    10: 1 0 0 0 IO-APIC-level virtio2, uhci_hcd:usb1
    12: 104 0 0 0 IO-APIC-edge i8042
    15: 122 0 0 10643 IO-APIC-edge ide1
    177: 0 0 0 0 PCI-MSI-X virtio1-config
    185: 6906 26636 0 0 PCI-MSI-X virtio1-requests
    193: 0 0 0 0 PCI-MSI-X virtio0-config
    201: 146 0 38057 0 PCI-MSI-X virtio0-input
    209: 86 0 0 1556 PCI-MSI-X virtio0-output
    NMI: 0 0 0 0
    LOC: 882538 906581 481864 882757
    ERR: 0
    MIS: 0
    vm58820-4:~ # cat /proc/interrupts
    CPU0 CPU1 CPU2 CPU3
    0: 440338 0 0 0 IO-APIC-edge timer
    1: 53 0 285 0 IO-APIC-edge i8042
    8: 1 0 0 0 IO-APIC-edge rtc
    9: 0 0 0 0 IO-APIC-level acpi
    10: 1 0 0 0 IO-APIC-level virtio2, uhci_hcd:usb1
    12: 104 0 0 0 IO-APIC-edge i8042
    15: 122 0 0 10649 IO-APIC-edge ide1
    177: 0 0 0 0 PCI-MSI-X virtio1-config
    185: 6906 26636 0 0 PCI-MSI-X virtio1-requests
    193: 0 0 0 0 PCI-MSI-X virtio0-config
    201: 146 0 38075 0 PCI-MSI-X virtio0-input
    209: 86 0 0 1563 PCI-MSI-X virtio0-output
    NMI: 0 0 0 0
    LOC: 882988 907053 482069 883108
    ERR: 0
    MIS: 0
    vm58820-4:~ #

    Bernd

  8. #8

    Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    Quote Originally Posted by berndgsflinux View Post
    I would like to upgrade. But as said the system is buisness-critical. Really. We are a research institute, and the DB is for our mice breeding. We have about 5.000 mice, and if the DB is inconsistent or buggy, we are completely lost.
    And the source code is not really fine and well written, we saw already some lines which are horrible. And the DB, which was not developped by ourself, has no relations !!! Just one table has a primary key. So now you know what we are talking about.
    The perl-scripts have in sum about 60.000 lines of code. That's about 2000 pages. That's not really big, surely there is a mass of bigger applications. But it's big enough that we hesitate.
    If someone could promise me that an upgrade will run smoothly at 100 %, ok. Do you know someone who will promise that ? If you see any way, tell me.

    Bernd
    Hi Bernd,

    I can understand why you don't want to make changes to an old, unsupported, poorly developed, business critical system but this system has an issue that needs to be resolved and will likely require some changes to the system.

    In a previous post, ab (Aaron) said:
    The nice thing about this type of setup, usually, is that you
    can copy it to a new system, pound it with tests for an hour/day/week, and
    then when you are ready to swap over just refresh the data in the database
    and update DNS pointers (hopefully clients are using DNS) or change IP
    addresses over and that's it. Falling back to the original system is just
    as easy, in case you find something was missed after some time.
    If you follow his suggestion, you will not be making any changes to your business critical system!
    • Make a clone of your VM.
    • In the evening, shut down your production VM and start the clone.
    • Make sure there are no IP address conflicts so the production VM can be started and left running.
    • Upgrade the clone to SLES11 SP4.

    This will help determine whether your issue can be resolved by upgrading SLES to a newer version. The newer version has many patches which may resolve your issue. You have not changed your production system and can decide at a later time whether or not you should update your production VM.

    A note about MySQL:
    While a single table is probably not the best database design, it means that this is a very simple database that is likely exploiting very few MySQL features. While performance may suffer, there is a much better chance it will be compatible with a newer version of MySQL.

    We are fortunate to have Aaron helping in these forums. His Linux skills are extensive. If anyone can help you with your issue, it will be he.

    I do have one suggestion for you:
    When posting the output from commands, please use code tags (click on the "#" icon at the top of the reply box) and paste the results between the [C0DE] and [/C0DE] tags. This will make your information much easier to read.
    Kevin Boyle - Knowledge Partner
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below this post. Thanks.

  9. #9

    Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    LOC stands for 'Local timer interrupts' according to my laptop's version
    of the file (openSUSE, newer version).

    Some reading online says this has to do with multi-CPU process handling,
    and that it's not a bad thing generally. Did you run those commands one
    second apart? One minute? One hour? Other?

    I suppose you could try something crazy and decrease the number of
    assigned cores. They're all virtual anyway, so maybe your idle box has an
    older bug that causes a lot of unnecessary overhead due to number of
    processors which is just causing your problem instead of helping
    performance. That would be ironic.

    None of your other numbers seem to be that big, or growing very quickly,
    but again I do not know the time intervals involved.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  10. #10

    Re: SLES 10 SP4 guest slowly in SLES 11 SP4 host - high si

    On 02/27/2017 10:04 AM, berndgsflinux wrote:
    >
    > I would like to upgrade. But as said the system is buisness-critical.
    > Really. We are a research institute, and the DB is for our mice
    > breeding. We have about 5.000 mice, and if the DB is inconsistent or
    > buggy, we are completely lost.


    Yes, this is not abnormal though. Severs are usually somewhere from
    important to mission-critical, and yet upgrades happen.

    As Kevin mentioned, the option I'm proposing is to build a new box (copy
    existing or build new, whatever) either as the new code, or to be upgraded
    to the current code without being the currently-running prod box.

    > And the source code is not really fine and well written, we saw already
    > some lines which are horrible. And the DB, which was not developped by
    > ourself, has no relations !!! Just one table has a primary key. So now
    > you know what we are talking about.


    That's not abnormal either, which is why I proposed building a new box,
    testing it there without having it be the box that servers the
    mission-critical work until you are sure the upgrade is a success.

    > The perl-scripts have in sum about 60.000 lines of code. That's about
    > 2000 pages. That's not really big, surely there is a mass of bigger


    I presume you mean 200 pages; 30 lines per page seems a bit short, not
    that lines per page mean anything since pages only count if you print,and
    then it's influenced by nonsense like font size, and nobody prints source
    code.

    > applications. But it's big enough that we hesitate.
    > If someone could promise me that an upgrade will run smoothly at 100 %,
    > ok. Do you know someone who will promise that ? If you see any way, tell
    > me.


    You should hesitate; that's why you have the job you do. A guarantee of
    success is probably impossible, but I would recommend the other approach
    for that very reason, especially when combined with the need for this
    service to persist reliably.

    1. Build a new SLES 11 or SLES 12 box.
    2. Be sure you have Perl/MySQL/Apache2(httpd) (or MariaDB instead of
    MySQL) as needed.
    3. Copy over the data; Perl scripts, configuration files, the DB itself, etc.
    4. Test it, while your existing system keeps happily-ish performing
    badly, but otherwise doing its job.
    5. When ready, update the DB on the new box, point clients there, and
    turn off the old box.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •