PDA

View Full Version : SLES 12 SP1 share disks betwee VM on diferent hosts



dsferz
22-Apr-2016, 08:27
Hello everyone!

I have next problem.
what done:
1. We deploy 2 physical hosts with SLES 12 SP1 and install XEN hypervizor on both.
2. On each host we provide few block devices via FC.
3. On each host we deploy fullvirt VM with SLES 12 SP1 guest system.
4. Block devices connected to each of VM as scsi or xendisk (virsh dumpxml above)
5. from one of VM, we create LVM on disks (1 VG per 1 PV) fs=ext3
6. after rescan another VM can see new VG

example:
1. on vm1:
mount /dev/vg1/lv1 /srv/lv1
echo 'test from 1st vm' > /srv/lv1/1test.txt
umount /srv/lv1
vgchange -an vg1 && vgexport vg1
2. on vm2:
vgimport vg1 && vgchange -ay vg1
mount /dev/vg1/lv1 /srv/lv1
cat /srv/lv1/1test.txt
>>> test from 1st vm
echo 'answer from 2nd vm' >> /srv/lv1/1test.txt
echo 'test from 2nd vm' > /srv/lv1/2test.txt
umount /srv/lv1
vgchange -an vg1 && vgexport vg1
3. on vm1:
mount /dev/vg1/lv1 /srv/lv1
cat /srv/lv1/1test.txt
>>> test from 1st vm
cat /srv/lv1/2test.txt
>>> no such file


additional info:
1. no lock configured;
2. if i do same steps on physical hosts, all works good;
3. shared disk config :
<disk type='block' device='disk'>
<driver name='qemu' type='raw'/>
<source dev='/dev/mapper/360002ac0000000000000003200019bcc'/>
<target dev='sdd' bus='scsi'/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
alse tried with different disk options, such as:
<driver name='qemu' type='raw' cache='none' io='native'/>

Any idea, how to resolve this issue?

Thank You!

jmozdzen
22-Apr-2016, 11:34
Hi dsferz,

> (virsh dumpxml above)

either you created a new account or it was some other place you originally posted this question - you have a post count of "1" with this user, so there's no "above" ;)

I guess you're seeing the effects of block buffering on Dom0 - the writes from DomU go to the io buffers in DomU and are flushed there when you close the FS - but those writes go to the Dom0 block device, which will do it's own buffering.

On the other Dom0, where you run vm1, you'll run into the same situation: Dom0 caches the block device again. So for a "clean test", you'll have to

- do your write action & vg deactivation on vm1
- sync & flush the block cache on Dom0(vm1)
- flush the block cache on Dom0(vm2)
- activate the vg, read, write, deactivate vg on vm2
- sync & flush the block cache on Dom0(vm2)
- flush the block cache on Dom0(vm1)
- activate the vg, read on vm1

Obviously this is no way to go for a production system.

Regards,
Jens

dsferz
22-Apr-2016, 12:09
Hi dsferz,

> (virsh dumpxml above)

either you created a new account or it was some other place you originally posted this question - you have a post count of "1" with this user, so there's no "above" ;)

I guess you're seeing the effects of block buffering on Dom0 - the writes from DomU go to the io buffers in DomU and are flushed there when you close the FS - but those writes go to the Dom0 block device, which will do it's own buffering.

On the other Dom0, where you run vm1, you'll run into the same situation: Dom0 caches the block device again. So for a "clean test", you'll have to

- do your write action & vg deactivation on vm1
- sync & flush the block cache on Dom0(vm1)
- flush the block cache on Dom0(vm2)
- activate the vg, read, write, deactivate vg on vm2
- sync & flush the block cache on Dom0(vm2)
- flush the block cache on Dom0(vm1)
- activate the vg, read on vm1

Obviously this is no way to go for a production system.

Regards,
Jens

Hi Jens,

Not above, but below, sorry, it was my mistake. And there I write, that one of my attempt was to specify cache mode with value "none".
<driver name='qemu' type='raw' cache='none' io='native'/>

I'll be realy happy to listen Your opinion how to provide storage resources for 2 VM (resources in cluster state, and must moving between VM, and You cannot use iscsi cause SAN on FC), cause I have no more ideas.

Thank You for reply.

jmozdzen
22-Apr-2016, 14:04
Hi dsferz,

> how to provide storage resources for 2 VM (resources in cluster state, and must moving between VM, and You cannot use iscsi cause SAN on FC), cause I have no more ideas

before walking down that route, have you tried to explicitly de-caching on Dom0s to verify that Dom0 caching is really the cause? Maybe it's actually something different that's preventing the update to be seen on the other DomU, especially since data written on vm1 seems to be visible on vm2?

Also, how about doing two rounds of "writing on vm1, then reading on vm2" to verify that it works reliably in that direction? It could help in better understanding the issue.

Regards,
Jens

dsferz
22-Apr-2016, 14:55
Hi dsferz,

> how to provide storage resources for 2 VM (resources in cluster state, and must moving between VM, and You cannot use iscsi cause SAN on FC), cause I have no more ideas

before walking down that route, have you tried to explicitly de-caching on Dom0s to verify that Dom0 caching is really the cause? Maybe it's actually something different that's preventing the update to be seen on the other DomU, especially since data written on vm1 seems to be visible on vm2?

Also, how about doing two rounds of "writing on vm1, then reading on vm2" to verify that it works reliably in that direction? It could help in better understanding the issue.

Regards,
Jens

Ok, but I am new in xen virtualization, could You explain please, how can I do this correctly: sync & flush the block cache on Dom0

Thank You!

jmozdzen
25-Apr-2016, 12:34
Hi dsferz,

> You explain please, how can I do this correctly: sync & flush the block cache on Dom0

I was referring to "sync && echo 3 > /proc/sys/vm/drop_caches", being the standard Linux sequence to first try to write all dirty cache pages and then drop any (clean) cached pages.

Regards,
Jens

dsferz
25-Apr-2016, 21:45
Hi jmozdzen!

Looks like You were right. I have tried to move volume between VMs by Your instructions and no data loss.

What about next steps, how can I ignore this buffer and do write direct on SAN from VMs, any suggestions?

Thank You.

jmozdzen
27-Apr-2016, 12:45
Hi dsferz,

> What about next steps, how can I ignore this buffer and do write direct on SAN from VMs, any suggestions?

My best bet would have been using "cache=none" (which you already tried) - this will not avoid the write cache, but in your sequence of operations, the guest should have issued the required flush I/O commands to purge that cache.

In other words, I have no idea on how to make sure the caches will be fully avoided :( Probably you should consider an active/active approach using a cluster FS?

If you can spare a "service request", you might try to get an answer from the SUSE engineers on how to successfully de-configure caching.

Regards,
Jens

dsferz
27-Apr-2016, 13:16
Hi jmozdzen,

I already tried ocfs2 in same options (cache=none) but I have same issue. (not install ha addon on hypervisors, only on VMs)

What I do now:
1. create task that sync and drop cache data on hypervisors every minute;
2. add parametr start-delay=90 seconds on all file resources on VM

Looks like this work, but I don't know any potentional problems of this solutions. Now we have an idea to move cluster from VMs to physical hosts.