mce detected memory error

Hi Experts,
I found some interesting issue in /var/log/messages. I installed SLES 12 SP3 for SAP on Lenovo X3850 X6.

2019-10-17T18:02:15.441498+07:00 hostname mcelog[5163]: Running trigger `socket-memory-error-trigger'
2019-10-17T18:02:15.441503+07:00 hostname mcelog[5163]: Hardware event. This is not a software error.
2019-10-17T18:02:15.441548+07:00 hostname mcelog[5163]: Corrected error
2019-10-17T18:02:15.441575+07:00 hostname mcelog[5163]: Transaction: Memory read error
2019-10-17T18:02:15.441579+07:00 hostname mcelog[5163]: MemCtrl: Corrected memory read error


After surfing some solution, I found this (https://www.suse.com/support/kb/doc/?id=7022118). I add that kernel options (mce=ignore_ce).
And then this error/invalid appear on

2019-10-29T00:33:32.041734+07:00 hostname kernel: [ 9.938351] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
2019-10-29T00:33:32.041973+07:00 hostname kernel: [ 15.612800] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 0.
2019-10-29T00:33:32.041974+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 1.
2019-10-29T00:33:32.041975+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2.
2019-10-29T00:33:32.041980+07:00 hostname kernel: [ 15.612802] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3.
2019-10-29T00:33:32.041981+07:00 hostname kernel: [ 15.612803] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4.
2019-10-29T00:33:32.041982+07:00 hostname kernel: [ 15.612804] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5.
2019-10-29T00:33:32.041983+07:00 hostname kernel: [ 15.612805] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6.
2019-10-29T00:33:32.041984+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7.
2019-10-29T00:33:32.041985+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.

How to solve this? :confused:


Thanks and regards

Thomson Malau

Comments

  • malcolmlewismalcolmlewis Knowledge Partner
    Hi Experts,
    I found some interesting issue in /var/log/messages. I installed SLES 12 SP3 for SAP on Lenovo X3850 X6.

    2019-10-17T18:02:15.441498+07:00 hostname mcelog[5163]: Running trigger `socket-memory-error-trigger'
    2019-10-17T18:02:15.441503+07:00 hostname mcelog[5163]: Hardware event. This is not a software error.
    2019-10-17T18:02:15.441548+07:00 hostname mcelog[5163]: Corrected error
    2019-10-17T18:02:15.441575+07:00 hostname mcelog[5163]: Transaction: Memory read error
    2019-10-17T18:02:15.441579+07:00 hostname mcelog[5163]: MemCtrl: Corrected memory read error


    After surfing some solution, I found this (https://www.suse.com/support/kb/doc/?id=7022118). I add that kernel options (mce=ignore_ce).
    And then this error/invalid appear on

    2019-10-29T00:33:32.041734+07:00 hostname kernel: [ 9.938351] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
    2019-10-29T00:33:32.041973+07:00 hostname kernel: [ 15.612800] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 0.
    2019-10-29T00:33:32.041974+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 1.
    2019-10-29T00:33:32.041975+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2.
    2019-10-29T00:33:32.041980+07:00 hostname kernel: [ 15.612802] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3.
    2019-10-29T00:33:32.041981+07:00 hostname kernel: [ 15.612803] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4.
    2019-10-29T00:33:32.041982+07:00 hostname kernel: [ 15.612804] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5.
    2019-10-29T00:33:32.041983+07:00 hostname kernel: [ 15.612805] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6.
    2019-10-29T00:33:32.041984+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7.
    2019-10-29T00:33:32.041985+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.

    How to solve this? :confused:


    Thanks and regards

    Thomson Malau
    Hi
    Are you sure it's not a real hardware problem with the RAM? Have you tested the ram, reseated it?
  • Thomson Rihad DodyThomson Rihad Dody New or Quiet Member
    Hi Malcolm,
    Thanks for your reply.
    Hi
    Are you sure it's not a real hardware problem with the RAM?

    When we saw hardware log, there is no error about memory or other failure.
    Have you tested the ram, reseated it?

    I already tested system with stress-test tools (stress-ng) and running well.


    Thomson Malau
  • malcolmlewismalcolmlewis Knowledge Partner
    Hi
    Then perhaps you can go in and tweak the trigger?

    The configuration files are in /etc/mcelog/ then the man pages... man mcelog.triggers, man mcelog.conf etc
Sign In or Register to comment.