Results 1 to 6 of 6

Thread: xen live migration for fully virtualized domUs hangs

  1. #1

    xen live migration for fully virtualized domUs hangs

    Version Info:
    -------------------------------------------------
    drbd-8.4.4-0.22.9
    xen-4.2.4_02-0.7.1
    lvm2-2.02.98-0.29.1 <---- updated recently
    lvm2-clvm-2.02.98-0.29.1 <--- updatede recently

    I have a two node cluster running SlES 11 SP3 , HA Extension,
    it worked fine for 14 domUs including four Windows Server 2008
    We do the online Updates once a month.

    since two or three weeks the live migration for the fully virtualized Windows domUs fails.
    No problem for the linux VMs.

    I tried to migrate manually - independend from pacemaker - with the command:

    migrate winsrv2008 ha1infra -live

    and had the same problem: the VM moves from node ha2infra to ha1infra and then it hangs.

    It shows the Windows screen but is not reachable.
    The xendlog of the 'migrate_to' node ends with

    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: boot, val: c
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: fda, val: None
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: fdb, val: None
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: soundhw, val: None
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: localtime, val: 1
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: serial, val: ['pty']
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: std-vga, val: 0
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: isa, val: 0
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: acpi, val: 1
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: usb, val: 1
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: usbdevice, val: tablet
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: gfx_passthru, val: None
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: watchdog, val: None
    [2014-08-13 11:00:17 10651] DEBUG (image:981) args: watchdog-action, val: reset
    [2014-08-13 11:00:17 10651] INFO (image:909) Need to create platform device.[domid:15]
    [2014-08-13 11:00:17 10651] INFO (image:505) spawning device models: /usr/lib/xen/bin/qemu-dm ['/usr/lib/xen/bin/qemu-dm', '-d', '15', '-domain-name', 'winsrv2008', '-videoram', '4', '-k', 'de', '-vnc', '127.0.0.1:0', '-vncunused', '-vcpus', '2', '-vcpu_avail', '0x3L', '-boot', 'c', '-localtime', '-serial', 'pty', '-acpi', '-usb', '-usbdevice', 'tablet', '-watchdog-action', 'reset', '-net', 'none', '-M', 'xenfv', '-loadvm', '/var/lib/xen/qemu-resume.15']
    [2014-08-13 11:00:17 10651] INFO (image:554) device model pid: 1277
    [2014-08-13 11:00:17 10651] DEBUG (XendDomainInfo:1908) Storing domain details: {'console/port': '5', 'description': 'None', 'console/limit': '1048576', 'vm': '/vm/95ae0edb-feaf-e439-535c-b9b6a463fd30-2', 'domid': '15', 'store/port': '4', 'console/type': 'ioemu', 'cpu/0/availability': 'online', 'memory/target': '4194304', 'control/platform-feature-multiprocessor-suspend': '1', 'store/ring-ref': '1044476', 'cpu/1/availability': 'online', 'control/platform-feature-xs_reset_watches': '1', 'image/suspend-cancel': '1', 'name': 'winsrv2008'}

    [2014-08-13 11:00:17 10651] INFO (image:677) waiting for sentinel_fifo
    [2014-08-13 11:00:17 10651] DEBUG (XendDomainInfo:3165) XendDomainInfo.completeRestore done
    [2014-08-13 11:00:17 10651] DEBUG (XendDomainInfo:1995) XendDomainInfo.handleShutdownWatch
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices tap2.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vif.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:144) Waiting for 0.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:671) hotplugStatusCallback /local/domain/0/backend/vif/15/0/hotplug-status.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:685) hotplugStatusCallback 1.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vkbd.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices ioports.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices tap.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vif2.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices console.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:144) Waiting for 0.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vscsi.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vbd.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:144) Waiting for 768.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:671) hotplugStatusCallback /local/domain/0/backend/vbd/15/768/hotplug-status.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:685) hotplugStatusCallback 1.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices irq.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vfb.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices pci.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vusb.
    [2014-08-13 11:00:17 10651] DEBUG (DevController:139) Waiting for devices vtpm.


    Any clue ?
    Any idea ?
    for paravirtualized Linux domUs live migration works
    for fully virtualized Windows it fails

  2. #2

    Re: xen live migration for fully virtualized domUs hangs

    Quote Originally Posted by karo80 View Post
    ...since two or three weeks the live migration for the fully virtualized Windows domUs fails.
    No problem for the linux VMs....
    Which version of the VMDP pack doe you have installed on those Windows VM's?

    This reminds me of an older issue I've come across (with the VMDP 1.5/1.7 version IIRC) where a freeze/hang on the domU happens during a live migration of Windows domU's that only have one vCPU. So I'm curious, how many vCPU's do the Windows domU's have?
    If this hang/freeze also happens with Windows VM's that currently already have 2 or more vCPU's, it's probably not that.

    I have one Xen site that was also patched (Xen hosts) a week or three ago.. haven't seen issues with that. I don't use the HAE extension on my setups though. I'll check which lvm package versions I have running there... but I can't do that right now (no access to that site at the moment).

    Cheers,
    Willem
    Knowledge Partner (voluntary sysop)
    ---
    If you find a post helpful and are logged into the web interface,
    please show your appreciation and click on the star below it. Thanks!

  3. #3

    Re: xen live migration for fully virtualized domUs hangs

    my Windows domUs have 2 CPUs, the dom0 has 20

    here are some lines of the /etc/xen/vm/... file:

    name="winsrv2008"
    description="None"
    uuid="95ae0edb-feaf-e439-535c-b9b6a463fd30"
    memory=4096
    maxmem=4096
    vcpus=2 <----- the domU has two cpus
    cpus="2-19" <----- cpu0 and cpu1 are reserved for dom0
    on_poweroff="destroy"
    ...


    karl

  4. #4

    Re: xen live migration for fully virtualized domUs hangs

    Quote Originally Posted by karo80 View Post
    my Windows domUs have 2 CPUs, the dom0 has 20

    here are some lines of the /etc/xen/vm/... file:

    name="winsrv2008"
    description="None"
    uuid="95ae0edb-feaf-e439-535c-b9b6a463fd30"
    memory=4096
    maxmem=4096
    vcpus=2 <----- the domU has two cpus
    cpus="2-19" <----- cpu0 and cpu1 are reserved for dom0
    on_poweroff="destroy"
    ...


    karl
    Ok, so that seems to be different to what I was seeing.

    And which VMDP version do you have running on those Windows domU's?

    -Willem
    Knowledge Partner (voluntary sysop)
    ---
    If you find a post helpful and are logged into the web interface,
    please show your appreciation and click on the star below it. Thanks!

  5. #5

    Re: xen live migration for fully virtualized domUs hangs

    I have VMDP-WIN-2.1

    karl

  6. #6

    Re: xen live migration for fully virtualized domUs hangs

    Some news.

    I move the Windows domU to a test cluster with the same software level.
    But instead DRBD I have a shared SAN.
    The migration works!
    So the problem could be between DRBD and cLVM.

    I append my DRBD configuration.

    The software stack:
    - the DRBD resource r0 is the PV for the clustered VG and
    - the LVs are the Xen Images.

    -------------------------------------------------
    resource r0 {
    startup {
    become-primary-on both;
    }
    net {
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
    verify-alg md5;
    max-buffers 8192;
    max-epoch-size 8192;
    sndbuf-size 512k;
    ko-count 4;
    }
    device /dev/drbd_r0 minor 0;
    meta-disk internal;
    on ha1infra {
    address 172.17.232.11:7788;
    disk /dev/disk/by-id/dm-uuid-part1-mpath-360080e50001c150e00000bee52eb1a88;
    }
    on ha2infra {
    address 172.17.232.12:7788;
    disk /dev/disk/by-id/scsi-360080e500036de180000035151d0f3e5-part1;
    }
    syncer {
    rate 50M;
    }
    }


    my problem appeared two weeks ago, around the time when
    these two lvm updates came:

    lvm2-2.02.98-0.29.1
    lvm2-clvm-2.02.98-0.29.1

    karl

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •