Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Server crashes with a long BTRFS error list

Hybrid View

  1. #1

    Server crashes with a long BTRFS error list

    Suddenly I receive an error with the bellow dump when the SLES 12.4 started. Then, the system becomes unavailable.

    When I repair the root filesystem [on a guest VM otherwise it isn't possible because the root filesystem is mounted] then the root partition is marked as full even an empty space is shown in df.

    What happened, how can I solve it? The very strange thing is that when I restore a 3 weeks old backup [where all was OK ] into a new VM I receive the same error messages.

    [ 37.505018] invalid opcode: 0000 [#1] SMP NOPTI
    [ 37.505170] CPU: 10 PID: 568 Comm: systemd-journal Not tainted 4.12.14-95.13-default #1 SLE12-SP4
    [ 37.505476] task: ffff880003bacc00 task.stack: ffffc90041620000
    [ 37.505701] RIP: e030:create_reloc_root+0x295/0x2a0 [btrfs]
    [ 37.505853] RSP: e02b:ffffc90041623b98 EFLAGS: 00010282
    [ 37.505997] RAX: 00000000ffffffef RBX: ffff8800f731ae00 RCX: 0000000000000001
    [ 37.506205] RDX: 0000000000000003 RSI: ffff8800f5323460 RDI: 0000000000000200
    [ 37.506486] RBP: ffff8800f965f000 R08: ffff8800ef659cb0 R09: ffffc900416239d8
    [ 37.506678] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000376a2d0
    [ 37.506873] R13: ffff8800f58a0000 R14: 0000000000000110 R15: ffff880003bacc00
    [ 37.507071] FS: 00007fa2c23a0880(0000) GS:ffff8800faa80000(0000) knlGS:0000000000000000
    [ 37.507319] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 37.507532] CR2: 00007fa2bef25550 CR3: 00000000872fa000 CR4: 0000000000040660
    [ 37.507796] Call Trace:
    [ 37.507923] btrfs_init_reloc_root+0x8e/0xa0 [btrfs]
    [ 37.508130] record_root_in_trans+0xa9/0xf0 [btrfs]
    [ 37.508333] btrfs_record_root_in_trans+0x4a/0x70 [btrfs]
    [ 37.508539] start_transaction+0xab/0x440 [btrfs]
    [ 37.508691] btrfs_dirty_inode+0x49/0xe0 [btrfs]
    [ 37.508839] file_update_time+0xa6/0xf0
    [ 37.508972] btrfs_page_mkwrite+0x129/0x490 [btrfs]
    [ 37.509109] ? vsnprintf+0x1e5/0x4b0
    [ 37.509212] do_page_mkwrite+0x31/0x70
    [ 37.509373] do_wp_page+0x43f/0x570
    [ 37.509473] __handle_mm_fault+0x793/0xef0
    [ 37.509601] handle_mm_fault+0xc4/0x1d0
    [ 37.509719] __do_page_fault+0x1f3/0x4c0
    [ 37.509831] do_page_fault+0x2b/0x70
    [ 37.509934] ? do_syscall_64+0x9a/0x150
    [ 37.510044] ? page_fault+0x2f/0x50
    [ 37.510172] page_fault+0x45/0x50
    [ 37.510301] RIP: 0510:0x7ffefbe30518
    [ 37.510424] RSP: 0024:00005575011ab0a0 EFLAGS: 5575011a46b0
    [ 37.510427] Code: 48 83 c6 02 41 83 e8 02 66 89 4f fe e9 37 fe ff ff 8b 0e 48 83 c7 04 48 83 c6 04 41 83 e8 04 89 4f fc e9 2b fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 1f 44 00 00 0f 1f 44 00 00 48 89 f9 45 31
    [ 37.511135] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache af_packet iscsi_ibft iscsi_boot_sysfs xenfs xen_privcmd intel_rapl sb_edac x86_pkg_temp_thermal coretemp crc32_pclmul ghash_clmulni_intel pcbc xen_netfront aesni_intel aes_x86_64 crypto_simd glue_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs xor raid6_pq xen_blkfront crc32c_intel sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
    [ 37.512270] Supported: Yes
    [ 37.512419] ---[ end trace ab510ab54e7d565e ]---
    [ 37.512563] RIP: e030:create_reloc_root+0x295/0x2a0 [btrfs]
    [ 37.512721] RSP: e02b:ffffc90041623b98 EFLAGS: 00010282
    [ 37.512724] RAX: 00000000ffffffef RBX: ffff8800f731ae00 RCX: 0000000000000001
    [ 37.512726] RDX: 0000000000000003 RSI: ffff8800f5323460 RDI: 0000000000000200
    [ 37.512728] RBP: ffff8800f965f000 R08: ffff8800ef659cb0 R09: ffffc900416239d8
    [ 37.512730] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000376a2d0
    [ 37.512732] R13: ffff8800f58a0000 R14: 0000000000000110 R15: ffff880003bacc00
    [ 37.512740] FS: 00007fa2c23a0880(0000) GS:ffff8800faa80000(0000) knlGS:0000000000000000
    [ 37.512744] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 37.512750] CR2: 00007fa2bef25550 CR3: 00000000872fa000 CR4: 0000000000040660
    Last edited by AAEBHolding; 14-May-2019 at 09:20.

  2. #2

    Re: Server crashes with a long BTRFS error list

    This is the output from btrfs check --repair /dev/xvdc2

    enabling repair mode
    Checking filesystem on /dev/xvdc2
    UUID: c88dbf5b-3513-4966-b3d6-5bb6c9b7717e
    checking extents
    Fixed 0 roots.
    checking free space cache
    cache and super generation don't match, space cache will be invalidated
    checking fs roots
    checking csums
    checking root refs
    found 9380937728 bytes used err is 0
    total csum bytes: 8061100
    total tree bytes: 230670336
    total fs tree bytes: 181600256
    total extent tree bytes: 34013184
    btree space waste bytes: 41120097
    file data blocks allocated: 10417725440
    referenced 8217731072


    Then I get this:

    Filesystem 1MB-blocks Used Available Use% Mounted on
    /dev/xvda2 11249MB 9918MB 0MB 100% /

  3. Re: Server crashes with a long BTRFS error list

    Hi
    The df tools in SP4 are not btrfs friendly...

    See this thread: http://forums.suse.com/showthread.php?t=13627

    And also https://www.suse.com/documentation/s...s_volfull.html
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  4. #4

    Re: Server crashes with a long BTRFS error list

    Quote Originally Posted by malcolmlewis View Post
    Hi
    The df tools in SP4 are not btrfs friendly...

    See this thread: http://forums.suse.com/showthread.php?t=13627

    And also https://www.suse.com/documentation/s...s_volfull.html
    @df: You are right, but this is not the problem.
    @volume full: This is not the problem and does not solve my issue.

    I may have been not accurate enough: The root filesystem is totally damaged. When I start the server the 1st time [from a 3 weeks old backup] so I get after a while the above error messages.
    When I try to start the server then the lot of error messages appear and the server becomes unavailable. I found also this post where the same error messages are posted.

    Please note that the system is referring with the message 'kernel BUG at ../fs/btrfs/relocation.c:1449!' to a bug in the kernel in the 1st line of the dump!

    Again: The server is idle, this means, apart from the normal services, nothing special is running. Suddenly, the error messages in the 1st post appear in the console and then the VM is broken!

    It looks to me like the kernel got a very serious bug in one of the previous updates.
    Last edited by AAEBHolding; 14-May-2019 at 15:42.

  5. Re: Server crashes with a long BTRFS error list

    Quote Originally Posted by AAEBHolding View Post
    @df: You are right, but this is not the problem.
    @volume full: This is not the problem and does not solve my issue.

    I may have been not accurate enough: The root filesystem is totally damaged. When I start the server the 1st time [from a 3 weeks old backup] so I get after a while the above error messages.
    When I try to start the server then the lot of error messages appear and the server becomes unavailable. I found also this post where the same error messages are posted.

    Please note that the system is referring with the message 'kernel BUG at ../fs/btrfs/relocation.c:1449!' to a bug in the kernel in the 1st line of the dump!

    Again: The server is idle, this means, apart from the normal services, nothing special is running. Suddenly, the error messages in the 1st post appear in the console and then the VM is broken!

    It looks to me like the kernel got a very serious bug in one of the previous updates.
    Hi
    At GRUB, can you select advanced boot options and boot to an earlier snapshot?
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  6. #6

    Re: Server crashes with a long BTRFS error list

    Quote Originally Posted by malcolmlewis View Post
    Hi
    At GRUB, can you select advanced boot options and boot to an earlier snapshot?
    It is a VM running under XenServer. I have only snapshots available which are provided by XenServer - the SLES snapshots are not available.

    I think I should provide this information because it may be the real reason for this issue; the root partition was running out of space so I did this in order to increase the root partition:
    1. Resized in XenCenter the root partition.
    2. Detached the SLES 12.4 VM.
    3. Attached the root partition to another SLES 12.4 VM.
    4. Resized the root partition in the other VM with yast/Partition Manager.
    5. Detached from the helper VM the root partition.
    6. Attached the resized root partition to the origin VM.


    All steps like increasing, detaching and attaching have been performed when the VMs have been down.

    Since then even an old backup crashes after few seconds when the VM started.
    Is it possible there is a more deeper information stored in the disk and when even I restore to a snapshot where the disk hasn't been resized so it doesn't match and the problems starts?

    Does it help?

  7. Re: Server crashes with a long BTRFS error list

    On Wed 15 May 2019 08:24:01 AM CDT, AAEBHolding wrote:

    malcolmlewis;57684 Wrote:
    > Hi
    > At GRUB, can you select advanced boot options and boot to an earlier
    > snapshot?


    It is a VM running under XenServer. I have only snapshots available
    which are provided by XenServer - the SLES snapshots are not available.

    I think I should provide this information because it may be the real
    reason for this issue; the root partition was running out of space so I
    did this in order to increase the root partition:

    - Resized in XenCenter the root partition.
    - Detached the SLES 12.4 VM.
    - Attached the root partition to another SLES 12.4 VM.
    - Resized the root partition in the other VM with yast/Partition
    Manager.
    - Detached from the helper VM the root partition.
    - Attached the resized root partition to the origin VM.
    -


    All steps like increasing, detaching and attaching have been performed
    when the VMs have been down.

    Since then even an old backup crashes after few seconds when the VM
    started.
    Is it possible there is a more -deeper- information stored in the disk
    and when even I restore to a snapshot where the disk hasn't been resized
    so it doesn't match and the problems starts?

    Does it help?


    Hi
    It does make it clearer Are you in a position to raise a SR (Support
    Request)?

    --
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    Tumbleweed 20190512 | GNOME Shell 3.32.1 | 5.0.13-1-default
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!


  8. #8

    Re: Server crashes with a long BTRFS error list

    Quote Originally Posted by malcolmlewis View Post
    Hi
    It does make it clearer Are you in a position to raise a SR (Support
    Request)?
    Not really, I never raised a SR. I am alone with in business and I am maintaining my servers and the XenServer on my own. It is now working like this more or less perfectly since 4 years but with this issue I am totally overstrained.

  9. Re: Server crashes with a long BTRFS error list

    Hi
    So if you boot the system to runlevel 1 (at grub and 1 to the options), can you mount the / partition and look at the logs to see what's failing? Or boot the system in rescue mode.
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  10. #10

    Re: Server crashes with a long BTRFS error list

    Quote Originally Posted by malcolmlewis View Post
    Hi
    So if you boot the system to runlevel 1 (at grub and 1 to the options), can you mount the / partition and look at the logs to see what's failing? Or boot the system in rescue mode.
    Which logs should I check? I am in the rescue mode and mounted the root partition as /mnt.
    Can I try to fix it somehow? Running btrfs check --repair /dev/xvda2 doesn't solve the problem. As I wrote, when I then boot regularly the / partion is out of space.

Page 1 of 2 12 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •