PDA

View Full Version : Cluster



Bob Crandell
19-Mar-2012, 23:29
Hi,

One of my customers suffered a power outage when a branch knocked down a
power line and blew up a transformer. The surge took out lots of stuff.

Their server lost 2 hard drives and the RAID controller. I'm blaming it
on the branch even though I don't really know what really took them out.

They want a 3 node cluster to improve redundancy. This is a hardware
question. They don't want a single point of failure (RAID controller).
I believe the easy answer is 3 individual computers. Are there other
configurations that will satisfy their need? Blade maybe?

They said they lost almost a $1,000.00 and hour. The server went down
Tuesday and was finally functional Friday. They are a 24/7 operation.

Thanks

Massimo Rosen
19-Mar-2012, 23:56
On 19.03.2012 23:29, Bob Crandell wrote:
> Hi,
>
> One of my customers suffered a power outage when a branch knocked down a
> power line and blew up a transformer. The surge took out lots of stuff.
>
> Their server lost 2 hard drives and the RAID controller. I'm blaming it
> on the branch even though I don't really know what really took them out.
>
> They want a 3 node cluster to improve redundancy. This is a hardware
> question. They don't want a single point of failure (RAID controller).

There is no such thing as *no* single point of failure. In a cluster, no
matter how many nodes, it's the shared storage. Of course, that can be
"mirrored" too, but still there's some SPOF, *somewhere*. Next time,
whatver murphy finds, will corrupt the data, which will mirror to all
online copies.

> They said they lost almost a $1,000.00 and hour. The server went down
> Tuesday and was finally functional Friday. They are a 24/7 operation.

They don't have a single point of failure problem, but a *massive*
disaster recovery problem. Apparently, they have had no (working) DR
plan. Three days+ qualifies as "not working".

CU,
--
Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de

Bob Crandell
20-Mar-2012, 01:44
On Mon, 19 Mar 2012 22:56:44 +0000, Massimo Rosen wrote:

> On 19.03.2012 23:29, Bob Crandell wrote:
>> Hi,
>>
>> One of my customers suffered a power outage when a branch knocked down
>> a power line and blew up a transformer. The surge took out lots of
>> stuff.
>>
>> Their server lost 2 hard drives and the RAID controller. I'm blaming
>> it on the branch even though I don't really know what really took them
>> out.
>>
>> They want a 3 node cluster to improve redundancy. This is a hardware
>> question. They don't want a single point of failure (RAID controller).
>
> There is no such thing as *no* single point of failure. In a cluster, no
> matter how many nodes, it's the shared storage. Of course, that can be
> "mirrored" too, but still there's some SPOF, *somewhere*. Next time,
> whatver murphy finds, will corrupt the data, which will mirror to all
> online copies.
So a cluster consists of 2 or more computers and shared storage?
(Teaching moment) I thought it could be done that way or each node handle
it's own copy of the data. Well, that changes things.

>
>> They said they lost almost a $1,000.00 and hour. The server went down
>> Tuesday and was finally functional Friday. They are a 24/7 operation.
>
> They don't have a single point of failure problem, but a *massive*
> disaster recovery problem. Apparently, they have had no (working) DR
> plan. Three days+ qualifies as "not working".
This is true.

>
> CU,

So if we were to start over from the beginning then it would be better to
build a server, clustered or not and take a snap shot once every 6 months
to a year and keep replacement parts on hand in case of branches.
Yes? No?

Lance Haig
20-Mar-2012, 11:25
On 20/03/12 00:44, Bob Crandell wrote:
> On Mon, 19 Mar 2012 22:56:44 +0000, Massimo Rosen wrote:
>
>> On 19.03.2012 23:29, Bob Crandell wrote:
>>> Hi,
>>>
>>> One of my customers suffered a power outage when a branch knocked down
>>> a power line and blew up a transformer. The surge took out lots of
>>> stuff.
>>>
>>> Their server lost 2 hard drives and the RAID controller. I'm blaming
>>> it on the branch even though I don't really know what really took them
>>> out.
>>>
>>> They want a 3 node cluster to improve redundancy. This is a hardware
>>> question. They don't want a single point of failure (RAID controller).
>>
>> There is no such thing as *no* single point of failure. In a cluster, no
>> matter how many nodes, it's the shared storage. Of course, that can be
>> "mirrored" too, but still there's some SPOF, *somewhere*. Next time,
>> whatver murphy finds, will corrupt the data, which will mirror to all
>> online copies.
> So a cluster consists of 2 or more computers and shared storage?
> (Teaching moment) I thought it could be done that way or each node handle
> it's own copy of the data. Well, that changes things.
>
>>
>>> They said they lost almost a $1,000.00 and hour. The server went down
>>> Tuesday and was finally functional Friday. They are a 24/7 operation.
>>
>> They don't have a single point of failure problem, but a *massive*
>> disaster recovery problem. Apparently, they have had no (working) DR
>> plan. Three days+ qualifies as "not working".
> This is true.
>
>>
>> CU,
>
> So if we were to start over from the beginning then it would be better to
> build a server, clustered or not and take a snap shot once every 6 months
> to a year and keep replacement parts on hand in case of branches.
> Yes? No?
>

I would have a solution where you have 2 clusters that have mirrored
data at SAN level or log shipping if it is a DB.
each of the clusters should be in a different racks with separate power
supply if in the same datacenter, a different datacenter if possible
would be better.

you should have to different network paths to the different racks/
datacenters to allow for carrier/ switch/ power failure

I agree that they need to have DR plan worked out.

Without more info it is difficult to make a call.

Lance

Bob Crandell
20-Mar-2012, 15:20
On Tue, 20 Mar 2012 10:25:40 +0000, Lance Haig wrote:

> On 20/03/12 00:44, Bob Crandell wrote:
>> On Mon, 19 Mar 2012 22:56:44 +0000, Massimo Rosen wrote:
>>
>>> On 19.03.2012 23:29, Bob Crandell wrote:
>>>> Hi,
> SNIP <
>>
>> So if we were to start over from the beginning then it would be better
>> to build a server, clustered or not and take a snap shot once every 6
>> months to a year and keep replacement parts on hand in case of
>> branches. Yes? No?
>>
>>
> I would have a solution where you have 2 clusters that have mirrored
> data at SAN level or log shipping if it is a DB. each of the clusters
> should be in a different racks with separate power supply if in the same
> datacenter, a different datacenter if possible would be better.
>
> you should have to different network paths to the different racks/
> datacenters to allow for carrier/ switch/ power failure
>
> I agree that they need to have DR plan worked out.
>
> Without more info it is difficult to make a call.
>
> Lance

At least I have a better understanding of what I think I know. Now I get
to see how much they really want to spend.

Thanks

Massimo Rosen
20-Mar-2012, 17:03
Hi.

On 20.03.2012 11:25, Lance Haig wrote:
> I would have a solution where you have 2 clusters that have mirrored
> data at SAN level or log shipping if it is a DB.

Still a (many!) SPOF. If the data gets corrupted for whatever reason
(broken RAID controller, OS running wild, user error), the data is still
lost on both sides.

CU,
--
Massimo Rosen
Novell Knowledge Partner
No emails please!
http://www.cfc-it.de

GofBorg
20-Mar-2012, 18:32
> They said they lost almost a $1,000.00 and hour. The server went down
> Tuesday and was finally functional Friday. They are a 24/7 operation.

Everyone cries the blues when the system goes down, but if you hand them
a bill for the amount of services and hardware required to prevent it, they
usually quiet down a good bit and find it more as 'darned inconvenient'
rather than some catastrophic loss.

Scott A. Campbell
20-Mar-2012, 21:23
Bob Crandell wrote:

> So if we were to start over from the beginning then it would be
> better to build a server, clustered or not and take a snap shot once
> every 6 months to a year and keep replacement parts on hand in case
> of branches. Yes? No?

It really depends on:
1). How quickly they need to be running
2). How much data they are prepared to loose
3). How much they are willing to spend

With virutalisation you can build up some pretty reasonable DR
solutions inexpensively to *reduce* the risk.

I'm just wrapping up a DR project and was frankly stunned at what we
could achieve for the money we spent.

Lance Haig
20-Mar-2012, 21:46
On 20/03/12 16:03, Massimo Rosen wrote:
> Hi.
>
> On 20.03.2012 11:25, Lance Haig wrote:
>> I would have a solution where you have 2 clusters that have mirrored
>> data at SAN level or log shipping if it is a DB.
>
> Still a (many!) SPOF. If the data gets corrupted for whatever reason
> (broken RAID controller, OS running wild, user error), the data is still
> lost on both sides.
>
> CU,

Agreed completely.

We used to have our db logs shipped over to the second DB server but not
imported. so if we had a DB corruption we would be able to import logs
upto the corruption and then do the rest manually.

It saved my bacon twice.

Lance

leroyjjr
20-Mar-2012, 23:16
Depending on the Hardware involved.
You could cluster two of the Servers and use the third as a “Snapshot
server” for routine “snaps” of the Clustered Volumes.
If you have a SAN in place, even better – Create a Virtual environment
as a Pseudo “DR” for the Clustered Servers. Of course you would need
some type of replication software and/or a backup solution in place to
replicate the Clustered data over to the “DR” site.
You could use Operating Systems like SLES10/11 OES2/OES11 for
your Virtual “DR” site on your SAN or if a Server is ‘powerful’ enough,
that can be used for the ‘muscle’ required to run multiple guests within
the host.


Leroy Joseph
Visual Click Software
(eDirectory Management and Reporting)
'eDirectory Management | DSRAZOR for eDirectory'
(http://www.visualclick.com/content/dsrazor-for-edirectory.htm)


--
leroyjjr
------------------------------------------------------------------------
leroyjjr's Profile: http://forums.novell.com/member.php?userid=75462
View this thread: http://forums.novell.com/showthread.php?t=453643

Bob Crandell
21-Mar-2012, 15:54
On Tue, 20 Mar 2012 17:32:46 +0000, GofBorg wrote:

>> They said they lost almost a $1,000.00 and hour. The server went down
>> Tuesday and was finally functional Friday. They are a 24/7 operation.
>
> Everyone cries the blues when the system goes down, but if you hand them
> a bill for the amount of services and hardware required to prevent it,
> they usually quiet down a good bit and find it more as 'darned
> inconvenient' rather than some catastrophic loss.

Yeah. I offered the advice I'm getting here and haven't heard a peep.
Maybe they are trying to make up for last week.