Bootup takes very long time

Hello Team,
I am running SLES15 SP2 on my host. And I am getting NVMe devices as a storage from a SAN box. When I have very less devices (<50) I see the bootup time is reasonable to < 1m. But as I increase the devices (>1000), the bootup time increases significantly. I see that upon configuring about 6000 devices, the bootup time is about 21m, and on reaching 8000 devices, it goes uptill 33 minutes. I checked systemd-analyze blame, and I see the most time taking service is lvm2-monitor. But I couldn't figure out why it is taking time. Moreover, dev-ttyS0 times out.
Any clue on how can I speed up the bootup time.
Attaching the bootup logs with 6000 devices.
Regards

Comments

  • malcolmlewismalcolmlewis Knowledge Partner

    @Smash Hi, if possible can you upload the boot log to a paste site, eg https://paste.opensuse.org unverified tarballs would (or shouldn't) be downloaded from the forum ;)
    Is it just lvm2-monitor or other lvm2 services, one wonders if it's hitting a race condition.

  • SmashSmash New or Quiet Member

    Have put the boot logs at https://paste.opensuse.org/46113668

  • malcolmlewismalcolmlewis Knowledge Partner

    @Smash can you add the following to the GRUB kernel options crashkernel=162M,high crashkernel=72M,low?

    Since you have multipath disabled, have you modified /etc/lvm/lvm.conf?

    If not, can you read and modify the following lines in the above file;

    - multipath_component_detection = 1
    + multipath_component_detection = 0
    
    - md_component_detection = 1
    + md_component_detection = 0
    
    - udev_sync = 1
    + udev_sync = 0
    
    - udev_rules = 1
    + udev_rules = 0
    
  • SmashSmash New or Quiet Member

    Hello Malcom, Thanks a lot for this tip. This greatly improved the boot up time. Reduced from 33 minutes to 5 minutes. But I see that udev is configuring the devices even after boot up is completed. Anyways, to increase udev threads or speed up creation of those udev files ?
    I have added blame and boot logs for 8000 devices.
    blame - https://paste.opensuse.org/39558159
    boot log - https://paste.opensuse.org/56897874
    Apart from that, I am still trying to find out how to avoid ttyS0 timing out. It starts timing out as soon as 2000 devices are added. Any clues ?

  • malcolmlewismalcolmlewis Knowledge Partner

    @Smash Hi, the maintenance service is transient, for postfix, are you using IPv6, If not /etc/postfix/main.cf needs at edit to change inet_protocol from all to ipv4.
    Since your not using plymouth I would remove all the installed plymouth packages (about 10) add a zypper lock and rebuild initrd. See how that goes for the moment.

  • SmashSmash New or Quiet Member

    Hello Malcom,
    I have done required changes, and also incorporated some changes for my needs (like addition of docker). Added new logs for 8000 devices. Though systemd-analyze reported as -
    Startup finished in 4.029s (kernel) + 2min 30.783s (initrd) + 3min 13.203s (userspace) = 5min 48.016s
    Blame[8000]: https://paste.opensuse.org/64440302
    Boot log[8000]: https://paste.opensuse.org/86475707
    But when I had 16K devices configured the time to boot up shot to 12 min -
    Startup finished in 4.029s (kernel) + 9min 9.478s (initrd) + 3min 21.455s (userspace) = 12min 34.962s
    Blame [16000]: https://paste.opensuse.org/570656
    Boot log [16000]: https://paste.opensuse.org/99398385
    Any idea, why "dracut-initqueue.service" is taking so long with 16K devices ?
    At the end things are in much better shape now, except that I am not able to login to console due to ttyS0 timing out, for which I don't have any solution yet.
    Thanks

Sign In or Register to comment.