PDA

View Full Version : Xen VirtualDomain OCF Heartbeat resource not starting VM



6529034
05-Jan-2015, 07:09
I have been testing SLES 12 with the High Availability Extension pack installed.
I have 2 servers/nodes with all the latest patches installed.

I have configured DRBD as my shared storage and this is working fine - both nodes can start the VM successfully via the VMM.

I have configured the resource with parameters, config=<full path and filename to the VM XML configuration file>, hypervisor=<xen:///>, migration_transport=<ssh>.

The monitor, start, and stop op are as per defaults.

I have tried xen:///system and xen:///session for the hypervisor setting, but no differences noted. I have also tried a number of op settings for monitor, start, and stop settings - again, not differences.

My 2 nodes are configured for passwordless ssh login - this has been tested successfully.

When I start the resource it appears in a stopped state in the hawk cluster resources configured, however, the primitive is shown as started in its meta-attrbutes settings.

When I 'view details' on the resource it is shown as target role started and fail count = 0 for both nodes. There is no exit reason listed.
When I look at 'view recent events' I have 3 entries, 'Success' on node 1, 'Success' on node 2, and 'Success' on node 1 again. No errors are reported anywhere.

I have used the xen ocf resource before withiut issues, in SLES 12, there is only the OCF VirtualDomain resource.

Can anyone help in getting this resource settup such that the VM can be started by the cluster?

One interesting effect I have also noticed is if I try to live migrate a VM it ends up running on both nodes when I use the command line virsh migrate --live <vm name> xen+ssh://<node 2>.corp ( I also tried the ip address of the node). There is an error, error: operation failed: Failed to unpause domain.

Can anyone shed any light on why this might be?

Thank you for any help,

John

Automatic reply
11-Jan-2015, 14:30
6529034,

It appears that in the past few days you have not received a response to your
posting. That concerns us, and has triggered this automated reply.

Has your issue been resolved? If not, you might try one of the following options:

- Visit http://www.suse.com/support and search the knowledgebase and/or check all
the other support options available.
- You could also try posting your message again. Make sure it is posted in the
correct newsgroup. (http://forums.suse.com)

Be sure to read the forum FAQ about what to expect in the way of responses:
http://forums.suse.com/faq.php

If this is a reply to a duplicate posting, please ignore and accept our apologies
and rest assured we will issue a stern reprimand to our posting bot.

Good luck!

Your SUSE Forums Team
http://forums.suse.com

6529034
26-Mar-2015, 00:24
I have overcome my issue when trying to start a XEN VM via a HA resource in SLES 12. It turned out to be a cluster configuration setting needing adjustment - the 'placement strategy' needed to be set to 'default' (i had it on 'balanced). I had the dual-master drbd resources running fine with the 'balanced' setting, just the VM resource would not start.

This is great, of course.

However, using XEN, I still have the issue (weather using HA VM resource or command line virsh), where the live migration still fails. Both methods return the same error, 'Migration failed: Could not unpause domain'. The result is that the VM ends up running on both nodes simultaneously.

I researched much on how libvirt should be configured for live migration and could not find anything that I have incorrectly configured. I did find that the 'tun' service should be enabled - I installed and enabled it - for the purpose of live migration and preventing simultaneous dual instances of the VM running - this, however, did not resolve the issue.

I then rebooted my 2 nodes in the normal kernel (which I had already installed KVM on) and tried the HA resource and live migration using KVM. The HA resource started the VM successfully (as per the XEN VM previously) and live migration worked perfectly (it also worked perfectly using the virsh commandline.

So, to wrap up, KVM works fine using HA and live migration and XEN does not live migrate at all - all this as tested on the exact same 2 nodes running SLES 12 with HA extension pack installed.


I guess I will be using KVM on SLES 12 for now until the XEN issues are ironed out.


However, does anyone know of a solution to this XEN VM issue running on SLES 12 with live migration set up?

Regards,
John