PDA

View Full Version : SLES 11 SP3 Strange situation - DomU running, but cannot be used



clausbc
06-Feb-2016, 21:49
Hi all,

I have a hardware host running sles11sp3 with XEN hypervisor. I have 4 DomU's running with sles11sp2. This have been running well and without issues for quite a while. During the last month I have been hit by a very strange situation:
Mailserver
- Can be ping'ed
- reponds to ssh and prompts for password. When password has been entered, the prompt does not appear, hitting ctrl-c will show prompt of DomU
- Checking mailservices with rcgrpwise shows all running
- However, no mail can pass - neither out- or inbound
- A simple top command does not show anything, have to break it with ctrl-c
File and print server
- Can be ping'ed
- reponds to ssh and prompts for password. When password has been entered, the prompt does not appear, hitting ctrl-c will show prompt of DomU
- A simple top command does not show anything, have to break it with ctrl-c
- Cannot access any files, since no drives can be mapped, from client

So the DomU's seems to run, but cannot be used, launch anything like top, pass mails. I am really confused:confused::confused:. Other DomU's run without issues at same time from same host.

Any ideas or suggestion on where to start looking?

jmozdzen
09-Feb-2016, 12:41
Hi Claus,

as you seem to be able to access a shell on the systems, try to run "dmesg" to see if anything turns up there.

I've seen similar symptoms when there were file system conditions within the VM, mostly if the backing device hung or if remote file systems were connected, but not responsive.

How's the storage for these VMs set up - are those on local disk, from Dom0's point of view? And are there any messages relevant in the DomU log files on Dom0? Can you "see" / "access" (read-only, please) the back-end devices/files from Dom0?

Regards,
Jens

clausbc
09-Feb-2016, 21:50
Hi Jens,
thank you replying.


Hi Claus,

as you seem to be able to access a shell on the systems, try to run "dmesg" to see if anything turns up there.

I have tried that - nothing what so ever is logged


I've seen similar symptoms when there were file system conditions within the VM, mostly if the backing device hung or if remote file systems were connected, but not responsive.

How's the storage for these VMs set up - are those on local disk, from Dom0's point of view? And are there any messages relevant in the DomU log files on Dom0? Can you "see" / "access" (read-only, please) the back-end devices/files from Dom0?
Dom0 partitions are used for DomU disks. Yes, they are online and can be seen
For the DomU logs on Dom0 I have found anything. Where would you look?

I have this in the xen-debug.log:

Xend started at Sat Feb 6 21:15:19 2016.
/usr/lib64/python2.6/site-packages/xen/xend/XendAPI.py:551: DeprecationWarning: object.__new__() takes no parameters
return object.__new__(cls, *args, **kwds)
xc: info: VIRTUAL MEMORY ARRANGEMENT:
Loader: 0000000000100000->000000000019cbe4
Modules: 0000000000000000->0000000000000000
TOTAL: 0000000000000000->00000000c0000000
ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
4KB PAGES: 0x0000000000008a00
2MB PAGES: 0x00000000000005bb
1GB PAGES: 0x0000000000000000

jmozdzen
10-Feb-2016, 08:47
Hi Claus,

> nothing what so ever is logged

then this is strange indeed. Do you have remote mounts active in these DomUs, that could somehow cause trouble? Another, completely different route may be name resolution that is somehow hanging (i.e. for top resolving uids to usernames).

> For the DomU logs on Dom0 I have found anything. Where would you look?

In /var/log/xen/qemu-dm-*.log

It'd be interesting to see when this starts happening (if it is repeatable). Maybe you can find some way to monitor this (i.e. remotely trying to access the process list via snmp - if this hangs like "top" does, it'd be a good indicator). You'd then know what point in time to look at, on i.e. other systems, for symptoms that may be related or even point at the root cause.

Best regards,
Jens