After an email conversation with one of our distributors yesterday I realised that it had been quite a while since I wrote up anything about the product we sourced from his organisation, SIOS Protection Suite For Linux. As we have a new site, and a vastly simplified content submission system, I thought - why not knock something up?
Deep breath. Here goes….
Before I really get going, I just thought I would mention that, In the past, we had always liked to write about products we carried, with 'top 10' articles. Such items quickly explained the top 10 selling points of any product, and needless to say, the vendors of said products, loved them. Thinking back about these rather vendor centric articles, thinking from a perspective of someone interested in actually buying such a solution - I think we could have done better. When was the last time someone called us up and said 'If you can tell me 10 great things about this product, I will buy it!". Errr, not that often. So let’s simply discuss informally how we came to look at SIOS Protection Suite For Linux, and why it ended up being the ideal choice for a customer of ours. This should be enough information for you to work out if it's something that might work for you also.
First of all, a quick introduction to the Linux product. There are two main components of it, LifeKeeper - the 'H/A' component, and 'Datakeeper' the DR component. These two components are highly integrated, and it was because of this tight dual functionality integration that the product drew our attention when searching for a solution for a particular customer requirement.
The requirement in question was to upgrade a small-ish (14 server) hosting setup, which had organically grown over the years as multiple silos of 'standalone' application servers. From an O/S point of view they were to be considered stand alone, but all servers were booting off a SAN cluster, and there was spare (metal) capacity to swap failed hardware around if needed. So the servers were protected by 'cold spares', the storage all clustered two way, just not the applications that ran on them.
So the requirement was to unify all application storage into an H/A setup in order to enable load balancing of the app tier, and at the same time, DR everything to a remote site. In real time.
The SAN used by the customer, had the ability to send snapshots over to a remote site, but only on a schedule, and not in real time. Also although the SAN itself provided the capability to enable node clustering, this wasn't going to be easy as the environment was full Windows, but without any Active Directory in place. The customer had no appetite to add this at the time, nor physical space to do it either.
We're DoubleTake SPLA partners, and the first solution we investigated was DoubleTake Availability. This is a fantastic replication product, and it did tick all the boxes, it provided both H/A and DR, however we quickly came across a problem - there were literally millions of small files to replicate, and every time we performed a failover, DT would have to re-verify the mirrors, a task which would have taken days at the rate experienced. The file count was projected to increase dramatically over the years too - so the problem would just get worse. We realised we needed to find a block based solution - but something that was more continuous than SAN snapshots.
And so we came across SIOS Protection Suite For Linux. It was the perfect fit.
The product provided the H/A (LifeKeeper) and DR (DataKeeper) requirements we were looking for - but crucially, it went far further, in that there is not much point having an H/A solution if there are challenges in determining when the application you are running has failed, and you need to fail over. Ping failing or filesystem heart beats, simply won't cut it. SPS for Linux has a number of tailored 'kits' which are designed to monitor and test deep into that particular application. Kits currently include Samba, NFS, WebSphere, MySQL, DB2, Oracle, Postfix, Apache, and even SAP - which the solution is fully certified for.
There are two main methods we considered getting H/A going with SPS LifeKeeper. You can have two completely standalone systems, spec them up with as much crazy high performance hardware as you like, such as NVMe SSDs, and create a SANless cluster with awesome IO performance. The applications running on your active server will run warp speed, with no SAN backplane limitations, and if the active server was to fail (or be taken down for patching), the passive would quickly take over, and life would continue. So the pros of this setup, are simplicity, and very much, performance. This is most likely the fastest and most cost effective way to achieve an H/A SAP setup with full certification.
In our customer’s case, they already had a SAN, and wanted to benefit from the sorts of things that SANs provide you, such as block level snapshots. It was possible to present dual SAN LUNs to both active and passive servers on the SAN, however this would have taken up double the storage, and there was an awful lot of files to put on the system - that wasn't viable. Plus it would have created a huge amount more load on the SAN itself, so not a good option in this particular case.
So the solution is to use Lifekeepers 'Shared Disk' mode. In this mode, you present your unique SAN iSCSI connected LUNs from the SAN to both Active and Passive Lifekeeper servers. Once you have the H/A setup on LifeKeeper, you mark the passive server with a lower priority and as such, the active locks and mounts the LUNs, and the passive simply waits to be called into play. In a failure event on the active server, the connected iSCSI LUNs on that server are essentially 'unplugged' from that server, and re-plugged on to the passive, when everything is mounted the VIP moves over, and service resumes. So this is a very clever system to get O/S level H/A from unique LUNs, without the need for 'true' clustering and all the complexity that comes with it.
For the DR requirement (Datakeeper), this is set up in a very similar way to Lifekeeper with regards to priority levels, the replication is 'extended' past the passive server and over to the DR site. Simply pausing replication automatically mounts the disk in DR and you are able to fully inspect & test to ensure your DR site is OK. We would say there is no point in having a DR solution, if it's not this easy to test it – one click.
In the end - it was the ability of the replication solution to closely integrate with application services, Samba, and NGINX, that made this a slam dunk for the customer. SIOS SPS For Linux enabled us to supply the client with what was essentially a pair of highly compact, low cost, and super reliable H/A & DR 'appliances'. The production side server count for the original solution (Active Directory, Clustering, Fileservers, NGINX servers, was reduced from a total of 8 servers, down to just 2. The physical space saved in the deployment of this solution would later prove to be extremely valuable when more SAN storage capacity was needed later on.
Finally, I must add a brief note about the distributor, whose email exchange motivated me to write up this blog today. Bipin and his team at Open Minds HAS are an essential part of helping us move from evaluating to finally obtaining full certification (a certification which I must remember to post the details of in our certification section!). We bang on about it quite a bit, how we are not only just interested in evaluating products, but also the vendor, their supply chain and the support network behind them. The adding of SIOS SPS to our solution set was a great example of us living up to this ethic. Open Minds have undoubtedly played a big part in the success of our all SIOS SPS projects, so it's only right we thank them for the great support they have provided us. I don’t like to think of them as simply a 'VAD' or ‘Value Added Distributor’ (why is it that VADs always seem to ‘add value’ where you don’t want it?!?) so in my books to call them a VAD would be kind of insulting. Perhaps a better term would be DOV: a Distributor Of Value, because they don’t add it, they are it.
That’s a much more appropriate TLA for them...
I wonder if it will catch on!?