Hi John,

The end goal of these 2 servers and the reason for the 10GB network is combining LVM and DRBD where I can pass drive partitions directly to the VMs as harddrives. And have that replicated between the 2 xen servers allowing me to live migrate the VMs between the 2 xen servers.
I believe that this will be too much of a hassle, in the end.

How about setting up the disk images as files on a cluster file system (i.e. OCFS2) above replicated storage (i.e. via DRBD)?

Either way you'll have to solve some design issues:

1. What situation do you want to protect from? Disk failure, node failure, both?
2. How much resources can you spend? Would it be possible to split up storage and compute? Will you need to expand to three or more Xen nodes in the foreseeable future?
3. How do you want to distribute your compute load? I assume that you'd prefer to run the "Xen part" in active/active mode, that is running VMs on both nodes simultaneously.
4. How are you going to fail over? Manually or automatically?

I strongly recommend to run Xen with VM locking, to protect from starting the same VM on multiple nodes simultaneously. Been there, done that You'll need a cluster FS to solve this issue.

Your "change process" will determine on how to bring LVM into the picture, i.e.: If you think about setting up a shared disk via DRB active/active and use LVM on top of that, you cannot simply change the VG on nodeA and have the changes active on nodeB. You'll need to run something like cLVM for that.

From personal experience, you'll likely get a more stable solution by running some HAE iSCSI solution (i.e. over a/p DRBD) offering a common disk resource, which you can use on both (and optionally: more/all) Xen servers as an OCFS2 storage device. Putting both services (storage and compute) on the same nodes will, to some extend, complicate things, but as long as that split is very present when you think about your services, you should get along.

