Hi all,

after updating our SLES11SP1 xen server from
xen-tools-4.0.1_21326_06-0.4.1 to xen-tools-4.0.2_21511_02-0.7.1
(together with the other packages as well, of course) we were unable to
start our NPIV-based DomUs.

The same upgrade on a test server with only file-based DomUs works like
a charm.

The error manifests in xend.log:


Code:
--------------------
[2011-08-31 18:25:13 8020] DEBUG (XendDomainInfo:103) XendDomainInfo.create(['vm', ['name', 'nde02'], ['memory', 1536], ['maxmem', 1536], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['on_xend_start', 'ignore'], ['on_xend_stop', 'ignore'], ['vcpus', 4], ['uuid', '9f5b3095-2032-6a4f-581f-9490b742649d'], ['oos', 1], ['bootloader', '/usr/bin/pygrub'], ['bootloader_args', '-q'], ['image', ['linux', ['videoram', 4], ['args', ' '], ['tsc_mode', 0], ['nomigrate', 0]]], ['s3_integrity', 1], ['device', ['vbd', ['uname', 'npiv:100000c0dd1306bb-29001100aa001002-29011100aa011002-2100001b32125f50-0'], ['dev', 'xvda'], ['mode', 'w']]], ['device', ['vbd', ['uname', 'npiv:100000c0dd1306bb-29001100aa001002-29011100aa000000-2100001b32125f50-1'], ['dev', 'xvdb'], ['mode', 'w']]], ['device', ['vif', ['bridge', 'br300'], ['mac', '00:16:3e:00:10:02']]], ['device', ['vkbd']], ['device', ['vfb', ['vncunused', '1'], ['vnc', '1'], ['vncdisplay', '2'], ['xauthority', '/root/.Xauthority']]]])
[2011-08-31 18:25:13 8020] DEBUG (XendDomainInfo:4531) Acquiring lock for domain nde02
[2011-08-31 18:25:13 8020] DEBUG (XendDomainInfo:2548) XendDomainInfo.constructDomain
[2011-08-31 18:25:13 8020] DEBUG (balloon:239) Balloon: 22805152 KiB free; need 16384; done.
[2011-08-31 18:25:13 8020] DEBUG (XendDomain:482) Adding Domain: 1
[2011-08-31 18:25:13 8020] DEBUG (XendDomainInfo:2876) XendDomainInfo.initDomain: 1 256
[2011-08-31 18:25:13 8020] INFO (XendDomainInfo:3322) Mounting 100000c0dd1306bb-29001100aa001002-29011100aa011002-2100001b32125f50-0 on /dev/xvdz.
[2011-08-31 18:25:13 8020] DEBUG (DevController:95) DevController: writing {'virtual-device-ext': '268441856', 'backend-id': '0', 'state': '1', 'device-type': 'disk', 'backend': '/local/domain/0/backend/vbd/0/268441856'} to /local/domain/0/device/vbd/268441856.
[2011-08-31 18:25:13 8020] DEBUG (DevController:97) DevController: writing {'domain': 'Domain-0', 'frontend': '/local/domain/0/device/vbd/268441856', 'uuid': 'ddfd1987-d48c-e53e-d89e-407bc498305d', 'bootable': '0', 'dev': '/dev/xvdz', 'state': '1', 'params': '100000c0dd1306bb-29001100aa001002-29011100aa011002-2100001b32125f50-0', 'mode': 'w', 'online': '1', 'frontend-id': '0', 'type': 'npiv'} to /local/domain/0/backend/vbd/0/268441856.
[2011-08-31 18:25:13 8020] DEBUG (DevController:144) Waiting for 268441856.
[2011-08-31 18:25:13 8020] DEBUG (DevController:671) hotplugStatusCallback /local/domain/0/backend/vbd/0/268441856/hotplug-status.
[2011-08-31 18:25:22 8020] DEBUG (DevController:671) hotplugStatusCallback /local/domain/0/backend/vbd/0/268441856/hotplug-status.
[2011-08-31 18:25:22 8020] DEBUG (DevController:685) hotplugStatusCallback 1.
[2011-08-31 18:25:22 8020] DEBUG (DevController:616) frontendStatusCallback /local/domain/0/device/vbd/268441856/state = 4
[2011-08-31 18:25:22 10381] DEBUG (XendBootloader:130) Launching bootloader as ['/usr/bin/pygrub', '--args= ', '--output=/var/run/xend/boot/xenbl.587', '-q', '/dev/xvdz'].
[2011-08-31 18:25:22 8020] INFO (XendDomainInfo:3348) Unmounting /dev/xvdz from /dev/xvdz.
[2011-08-31 18:25:22 8020] ERROR (XendDomainInfo:489) VM start failed
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 475, in start
XendTask.log_progress(31, 60, self._initDomain)
File "/usr/lib64/python2.6/site-packages/xen/xend/XendTask.py", line 209, in log_progress
retval = func(*args, **kwds)
File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 2878, in _initDomain
self._configureBootloader()
File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", line 3349, in _configureBootloader
_, vbd_info = dom0.info['devices'][mounted_vbd_uuid]
KeyError: 0
[2011-08-31 18:25:22 8020] DEBUG (XendDomainInfo:3117) XendDomainInfo.destroy: domid=1
[2011-08-31 18:25:22 8020] DEBUG (XendDomainInfo:2456) No device model
[2011-08-31 18:25:22 8020] DEBUG (XendDomainInfo:2458) Releasing devices
[2011-08-31 18:25:22 8020] DEBUG (XendDomainInfo:4560) Releasing lock for domain nde02
[2011-08-31 18:25:22 8020] ERROR (XendDomainInfo:108) Domain construction failed
--------------------


I have checked XendDomainInfo.py and believe to have found a porting
bug... seems like someone backported a change from 2010
(http://xen.1045712.n5.nabble.com/xen...2639427.html):


Code:
--------------------
*mounted_vbd_uuid = 0*
def _shouldMount(types):
if types[0] in ('file', 'phy'):
return False
if types[0] in ('tap', 'tap2'):
if types[1] in ('aio', 'sync'):
return False
else:
return True
return os.access('/etc/xen/scripts/block-%s' % types[0], os.X_OK)

mounted = _shouldMount(types)
if mounted:
# This is a file, not a device. pygrub can cope with a
# file if it's raw, but if it's QCOW or other such formats
# used through blktap, then we need to mount it first.
# Try all possible loopback_devices
for loopback_device in BOOTLOADER_LOOPBACK_DEVICES:
log.info("Mounting %s on %s." % (fn, loopback_device))
vbd = { 'mode' : 'RW', 'device' : loopback_device, }
try:
from xen.xend import XendDomain
dom0 = XendDomain.instance().privilegedDomain()
*_vbd_uuid_=_dom0.create_vbd(vbd,_disk)_*
*_dom0._waitForDeviceFrontUUID(vbd_uuid)_*
fn = loopback_device
break
except VmError, e:
if str(e).find('already connected.') != -1:
continue
elif str(e).find('isn\'t accessible') != -1:
dom0.destroyDevice('vbd', loopback_device, force = True, rm_cfg = True)
continue
else:
raise
else:
raise

try:
blcfg = bootloader(blexec, fn, self, False,
bootloader_args, kernel, ramdisk, args)
finally:
if mounted:
log.info("Unmounting %s from %s." %
(fn, loopback_device))
* _, vbd_info = dom0.info['devices'][mounted_vbd_uuid]*
if devtype in ['tap', 'tap2']:
dom0.destroyDevice('tap', loopback_device, rm_cfg = True)
else:
dom0.destroyDevice('vbd', loopback_device, rm_cfg = True)
--------------------


The two bold lines (not underlined) were added in 4.0.2, introducing
the variable "mounted_vbd_uuid" as present in the referenced change. The
two bold+underlined changes were intruduced in the referenced change,
too, but were available in SLES' xen-tools 4.0.1 already.

BUT: The SLES versions use the variable name "vbd_uuid" with the
dom0.create_vbd(), while the referenced change consistently uses
"mounted_vbd_uuid" on all four lines. I found no other place where these
variables are used, so it's not very surprising that "mounted_vbd_uuid"
is still "0" when the logging statement is invoked.

As the logging call bails out, the dom0.destroyDevice() never gets
invoked, which fits to our symptoms (the VBD device name gets
decremented by one on each call (xvdz, xvdy, xvdx,...). Bailing out also
breaks the creation process, that's why the DomU never gets created
(only the virtual disk gets connected to Dom0 so the boot image can be
(and is) extracted).

Could somebody verify this? We're going to test-upgrade a server again
and change the script to use a consistent variable name and I'll report
back, but still somebody should make the developers aware of the
situation...

Regards,
Jens


--
from the times when today's "old school" was "new school"
------------------------------------------------------------------------
jmozdzen's Profile: http://forums.novell.com/member.php?userid=32246
View this thread: http://forums.novell.com/showthread.php?t=444635