I/O error

OS: SLES 12 for SAP SP 3 + fully updated/patched.

Once in a week, System(application and Database) becomes unresponsive due to I/O error.
While having I/O error, we can ping and access the system via SSH/PuTTY, though none of the standard linux command runs successfully due to I/O error:

:~ # top
:~ # /usr/bin/top: Input/output error
:~ # dmesg
:~ # /usr/bin/dmesg: Input/output error
:~ # tail -f /var/log/messages
:~ # /usr/bin/tail: Input/output error

The interesting part is that issue always gets fixed(for next 4-5 days) simply by hard rebooting the server(system didnt even reboots via command) and system keeps running without any issue till next I/O error(repeats every 5-6 days).

No single FS error is every reported(in the logs) on this system. We even run the file system checks too.

SUSE Support advised us
There is no errors or messages indicating issue on the file system level so far. I/O errors are not only related to File system.
By looking at the logs, it seems the memory cached keeps increasing and its not getting freed. I would recommend to tune up the memory as indicated in the TID below :
https://www.suse.com/support/kb/doc/?id=7021211

Please let me know once this is done and send me a new supportconfig.

We will then monitor the server behavior.

I am unable to understand how memory tuning would prevent the I/O error ? Interestingly this is the SAP HANA replication target, i.e this system is a Passive node, while we never ever face I/O errors on Master/Primary SAP Server.

Comments

  • suseluckycementsuseluckycement New or Quiet Member
    # df -h
    Filesystem      Size  Used Avail Use% Mounted on
    devtmpfs        252G     0  252G   0% /dev
    tmpfs           393G   80K  393G   1% /dev/shm
    tmpfs           252G  9.9M  252G   1% /run
    tmpfs           252G     0  252G   0% /sys/fs/cgroup
    /dev/sda4       788G  266G  522G  34% /
    /dev/sda2       985M   74M  860M   8% /boot
    /dev/sda6       1.0T   28G  997G   3% /hana/log
    /dev/sda7       297G   24G  273G   8% /hana/shared
    /dev/sda5       2.0T  321G  1.7T  16% /hana/data
    tmpfs            51G     0   51G   0% /run/user/485
    tmpfs            51G     0   51G   0% /run/user/1000
    tmpfs            51G     0   51G   0% /run/user/1006
    tmpfs            51G   16K   51G   1% /run/user/487
    tmpfs            51G     0   51G   0% /run/user/1004
    
    # /usr/bin/free -h
                 total       used       free     shared    buffers     cached
    Mem:          503G       140G       363G       7.5G        85M       8.9G
    -/+ buffers/cache:       131G       372G
    Swap:          20G         0B        20G
    
    
  • malcolmlewismalcolmlewis Knowledge Partner
    Hi
    With that amount of memory available, consider tweaking any swap usage?

    I use;
    cat /etc/sysctl.d/98-grover.conf
    
    #disable swap
    vm.swappiness=1
    vm.vfs_cache_pressure=50
    

    This will ensure RAM is actually used before hitting the swap space.

    So disks are all ok, memory has all been tested, filesystem checks run?

    Have you run iotop rather than top, I would leave it (iotop) running in a session...
  • suseluckycementsuseluckycement New or Quiet Member
    Nice advise, however I am interested to know if Tuning Memory would possibly prevent I/O error to occur ? For me its hard to imagine.
  • malcolmlewismalcolmlewis Knowledge Partner
    Nice advise, however I am interested to know if Tuning Memory would possibly prevent I/O error to occur ? For me its hard to imagine.
    Hi
    Hi
    Well is the system swapping when i/o issues occur? Have you checked the disks and filesystems as well as the RAM?

    Running iotop may give a better indication of what is happening.
Sign In or Register to comment.