PDA

View Full Version : SLES 12 VM in ESXi hang after msg.snapshot.error-QUIESCINGERROR



swadm
15-Jan-2016, 15:48
In the cause of daily VM backups (actually with VDP, but that should not matter), we incidentally got msg.snapshot.error-QUIESCINGERROR when the backup is creating snapshots, and then, after a short time, the VMs will freeze: you can ping them, but any other interaction is impossible, and we eventually need to power-cycle them.

VMWare support recommended to set the following:


/etc/vmware-tools/tools.conf
[vmbackup]
enableSyncDriver = false


There seem to be quite a few linux users experiencing this freeze, and it seems to have to do with the vmsync driver

"causing a deadlock if vmsync is used during a VMware Snapshot with quiesce enabled. The source of the deadlock occurs because of a problem with the FIFREEZE/FITHAW ioctl() feature within the Linux guest that is utilized to quiesce the filesystem." (https://www.veritas.com/support/en_US/article.000021419)

Does anybody have an insight deep enough to judge if this is likely to be addressed by either open-vmtools engeneering, or SLES engeneering?

"enableSyncDriver = false" may help, but it includes a risk of inconsistencies in the backup.

Thanks, Thomas

jmozdzen
19-Jan-2016, 14:35
Hi Thomas,

> Does anybody have an insight deep enough to judge if this is likely to be addressed by either open-vmtools engeneering, or SLES engeneering?

according to VMware (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2038606) this is a kernel-level problem, hence it will likely need to be handled by SUSE engineering:


Certain kernel versions of RedHat Enterprise Linux have a known problem with the vmsync driver causing a deadlock if vmsync is used during a VMware Snapshot with quiesce enabled. The source of the deadlock occurs because of a problem with the FIFREEZE/FITHAW ioctl() feature within the Linux guest that is utilized to quiesce the filesystem.

According to https://communities.vmware.com/blogs/vgeeks/2014/11/09/snapshot-quiescing-fails-on-linux-vms this affects kernel 2.6.32 and lower... while SLES12SP1 is running kernel 3.12.49. But looking around on the net, it seems that people are running into difficulties with vmsync on much newer kernels, i.e. 3.6.3... with an interesting advice in http://sourceforge.net/p/open-vm-tools/tracker/141/#f12e : "Just disable vmsync on newer kernels, this driver is not needed as long as kernel provides FIFREEZE/FITHAW ioctls."

So maybe disabling the sync driver is not a bad idea after all, as it could be the double & intertwined action (FIFREEZE + vmsync) is causing the deadlock?

Regards,
Jens

swadm
20-Jan-2016, 09:21
Jens, Thanks!

as I believe that quite a few SLES customers are running on ESXi, an official recommendation by Novell / Microfocus would by good.

Kind regards, Thomas

thsundel
20-Jan-2016, 09:36
Jens, Thanks!

as I believe that quite a few SLES customers are running on ESXi, an official recommendation by Novell / Microfocus would by good.

Kind regards, Thomas

Since these forums are not officially monitored by MF/SUSE then you probably won't get an official recommendation through this channel, if possible then open a SR and ask and also kindly ask them publish a TID about this issue?

Thomas