An Open Source EqualLogic Replacement: Part 1

I run a XenServer cluster that depends on an EqualLogic PS5000XV for shared storage.  While I hold the EqualLogic unit in high regard, I will be trying to replace it with open-source tools running on fairly standard server hardware.  I will be documenting the process here.

What’s Wrong with EqualLogic, anyway?

A solid array
The PS5000XV

There is nothing wrong with EqualLogic, provided you know what you are getting into when you buy it.  I didn’t — my existing SAN solution was failing horribly and I needed to move to something faster with more performance, and a hardware vendor I use provided the EqualLogic on short notice at a superb price.  I jumped, and it’s performed admirably.

What a PS5000XV is:

  • Reliable.  The box shown above contains redundant power supplies, redundant controllers, and (of course) redundant drives.  This box is immune from the normal failure modes, as there is no component in the box that will bring the storage array offline if it fails.  It is possible that failover from one controller to another will take long enough that my XenServer cluster will time out and I will need to reattach storage, but this is a very fair price to pay for data integrity.
  • Fast.  Those 16 drives you see above are all 15,000 RPM SAS drives.  This unit is faster than any hard-drive based solution I’ve ever used.
  • Simple.  The administration console is very straightforward.  Monitoring is easy, as is creating new shared volumes.
  • Expandable.  I can add a new box to the array to expand it.  Management stays the same, and all units are configured using the standard tools I am already familiar with.  Data will be migrated for best use of resources, so if I were to add a PS5000E (which uses SATA drives for capacity) the system would watch the behavior of my virtual machines and keep the database servers on the XV for best performance while moving the file servers to the E for capacity.  It’s all automatic.
  • Smart.  I have had one drive fail in the time I have owned the array.  I received an e-mail letting me know a drive had failed overnight, and before I could log in to check the status of the array one of the two hot spares had been used to recreate the failed drive.  All I saw was this on the console:

failedHD

I called the datacenter and had them replace the drive and I was good to go.

What the EqualLogic is not:

  • Supportable via Dell/EqualLogic.  The unit I purchased was still under the original warranty even though I bought it used, and Dell wouldn’t talk to me even after I provided proof of purchase.  Only the original purchaser is entitled to access to things like firmware updates, tool downloads, and especially replacement parts.
  • Supportable from my vendor.  I have a third party vendor under contract to provide next business day service and parts on the unit, but they have decided to leave this business segment.  That is worrisome when looking at an essential network component that is completely proprietary.
  • Inexpensive.  The price I bought this array for was wonderful, and there are new-old-stock units selling on eBay for much cheaper than new units retail for.  But support will be something I will have to provide, and replacement parts will become more and more scarce as time goes on.  This was inexpensive in the short term, but in the long term the value proposition on this box changes drastically.
  • Redundant.  I know that the box is full of completely redundant parts, but that doesn’t help me sleep well at night.  Someone called this the inverted pyramid of doom.  The author is biased, but his arguments have some merit: all of my critical business data is sitting in one box.  If something happens to that box, regardless of how internally redundant it is, it will be a bad day.
  • Fully Compatible.  XenServer doesn’t really integrate with EqualLogic all that well.  There is no clear way to integrate automate snapshots into a complete backup strategy, for instance.  I have two network connections on each XenServer host, but these are active/passive bonds that only offer half the bandwidth they should as per EqualLogic’s recommendations.

The Big Problem: It’s Hard Drive Based

In the olden days before SSD drives were available the thinking went something like this:

  • Database applications don’t care about drive transfer speed.  The care about IOPS (essentially transactions per second).
  • The way to get real performance for database applications is to use the fastest hard drives possible, and run the data on an array that contains as many hard drives as possible.  The goal was increased “spindle count” for performance.

The PS5000XV does this very well.  It is full of very fast drives, and the array was quickly configured as a 14-drive RAID-10 array with two hot spares by default.  Simple, and about as fast as you can get with hard drives.  When you need more performance you can add another identical box, and the array software will automatically stripe the two arrays together to double your performance, at least in theory.

The problem now is that we have SSDs that provide a couple of orders of magnitude more IOPS than these super-fast drives.  You can look at a Seagate Cheetah 15k SCSI review and see that the drive could get all of 338 IOPS in the file serving tests.  This is exceptional.  The problem is that in 2013 a low-end desktop-grade SSD can perform more than ten times better than that.

I run applications that depend on database access.  When I build a storage solution, I want to build one that takes advantage of SSD drives.  In 2013 making storage just SSD drives is prohibitively expensive and is limiting in terms of overall space, so I’m looking for a hybrid solution: one that uses hard drives for inexpensive storage, and uses SSDs for their speed.

Sounds like a pipe dream, right?  I intend to find out, but I think I can pull this off.

Leave a Reply