Archive for the ‘replication’ Category

Clouds, Consistency, and Progress Bars

Monday, April 25th, 2011

I have the bad habit of staring at progress bars.

I was backing up my Mac laptop to a CloudArray volume.¹ With gigabit ethernet, a full backup to a CloudArray volume takes less time than my local USB drive. Apple’s Time Machine actually only does a full backup once, followed by hourly incrementals that are rolled together into dailies, weeklies, and synthetic fulls. That’s a fantastic model for the cloud, since it saves a lot on bandwidth, but I usually tear down most of my volumes and run the full backup again. It’s a good way for me to keep an eye on a number of different variables that can affect CloudArray performance.

Anyway, I set up a backup volume and sat back to watch the progress bars. Here’s a good one:

Flush the first

At this point, the backup on my laptop was mostly done. You can see that my CloudArray cache still had 35 gigs of dirty data, and it was just starting to work on flushing 8 gigs out to the cloud. Also, I’d been staring too long, and popped off to do important CTO-type stuff.

A few minutes later, important CTO-type stuff being done, I checked back in on my progress bar:

flush2

The same flush was still in progress, and it was mostly done. But wait! The cache still reports 35 gigs of dirty data! (Actually, 35.1… the operating system hadn’t finished flushing its own cache the last time I checked.) But if an 8 gigabyte flush was mostly done, shouldn’t the cache be almost 8 gigs cleaner? What ever can be going on?

The answer, of course, is a teachable moment.

I’ve been building storage arrays of one type or another for pretty much my whole career. The most important aspect of any array’s firmware is its consistency model, by which I mean: how does it ensure that the data that it stores accurately represents the data that the host applications wrote? If an application writes “AB” to the disk, how does the firmware ensure that the next time it reads from that disk, it gets back “AB”? That is absolutely the most fundamental requirement of a storage system: everything else is just icing.

That might not sound like that hard of a problem, but the nuances of storing data in a complex, shared, networked controller can be subtle. For example, if my application writes “A”, then “B”, then “C” to different locations, I always want to return A, B, C for those locations. But if you add in a cache to the controller, and assume that the cache will fail (you always assume that every hardware component will eventually fail), then it’s not enough to just store the data in the cache. If you are implementing a write-back cache, you have to store information about the order in which those writes occur, so that the underlying backing store (a physical disk, say) gets those writes in the same order. Otherwise, when that cache fails, your application might read back A and C, but not B.

Why is that a problem? What if your application is a database, (A, B) is a credit card transaction, and C is the database checkpoint? In that case, your database will correctly read A, read corrupted data in place of B, and C will tell it that the corrupted junk is just fine. That’s bad.

If your cache firmware is well implemented, though, and only gets the chance to write two blocks before the cache hardware fails, then it will write A and B. Now, when your database tries to reread the data, it’ll find (A, B), but without that crucial C, it’ll do a proper rollback of the transaction.

In CloudArray, we’ve got an added complication: our backing store is not a local physical drive. It’s a massively scalable set of redundant data centers probably located a thousand miles away from our cache. The performance difference between our local devices and the cloud is several orders of magnitude. So how can we maintain consistency?

The answer lies in our rather complex representation of block devices as objects. First, we notice that strict write ordering is not an absolute necessity. We simply need to ensure that our data in the cloud represents some state that existed in our virtual volume, so that if C is present in the cloud, then (A, B) is there, too, but we don’t need to represent each of the intermediate states (A), (A, B), (A, B, C). Then, we have to partition our incoming data into sequences that can represent transitions between these states: these sequences are what we call a flush, and we try to design those partitions to maximize bandwidth utilization while also minimizing the temporal distance between state transitions. Finally, after we’ve transmitted a flush to the cloud, we have to perform an atomic commit on our representation, so that the new state of the cloud is entirely consistent.

And we have to do that in a way that is mindful of the architecture of cloud storage systems, which are often designed around the (not at all scary and in fact quite cool in a nerdy way, in spite of what some people say) eventual consistency model.

What’s all that got to do with my progress bar?

Well, in order to make sure that our cloud data maintains consistency, especially in the presence of sometimes quite flaky networks, we can’t clean out our cache until we’ve successfully committed and verified the most recent state transition, i.e. the last flush. So my progress bar is not really indicating the amount of data that’s been emptied out of the cache: it only tells me how much of the most recent state has been transmitted to the cloud. The data can’t be marked clean in the cache until the actual, final commit has been completed.

So what happens when the flush completes? Let’s see:

flush3

Huh. There it is. The cache now has only 27.1 gigabytes of dirty pages left. Mission accomplished.

And if my CloudArray were to experience some kind of catastrophe right now, like some dastardly CTO yanking out a cache storage device, what would happen? Once I restored it to operation, then Time Machine would pull the nice, consistent image out of the cloud, notice the missing 27.1 gigs, and pick right up from there. Like I said, it’s a nice piece of software, but it does rely on consistent storage.

¹It’s pretty easy to set up a Time Machine backup using the the Studio Network Solutions globalSAN iSCSI initiator for OS X: just install it, point it at a CloudArray, and voila! Up pops whatever capacity I need. Launch Time Machine, set the CloudArray volume as the target disk, and I’ve got a whole bunch of progress bars to stare at.

Election Day Lunch & Learn Seminar – See the Results: How Your Business Can Leverage Cloud Backup Services

Tuesday, October 26th, 2010

For anyone local to the Waltham MA area, TwinStrata will be participating in a BNMC hostedcomplementary Lunch & Learn Seminar along withVMware, Vizioncore/Quest Software, and Hosted Solutions about how your business can leverage Cloud Backup Services – affordably, reliably, and easily. The seminar will discuss solutions for Enterprise-class data protection and disaster recovery using your existing backup infrastructure, TwinStrata’s CloudArray, and  the private and secure cloud infrastructure at Hosted Solutions.  BNMC offers their Cloud Backup Service powered by the CloudArray technology which delivers cost-effective and seamless managed storage services optimized for performance, multi-site availability and elastic storage capacity at a substantial cost-savings over traditional cloud storage solutions. 

The BNMC Cloud Backup Service using CloudArray greatly enhances your business resiliency by providing seamless “zero-U” iSCSI storage that “stretches” your backup infrastructure or your VMware High Availability into the cloud.  The solution plugs right into your existing VMware environment (as a virtual machine) or can be added as a separate appliance in a physical environment to work alongside  your existing backup infrastructure to protect all assets.

Online Registration for the event:  http://www.regonline.com/builder/site/Default.aspx?EventId=903608

SMEs: Keep your head in the clouds, especially for off-site data protection

Monday, October 18th, 2010

Since joining TwinStrata, it’s given me a refreshing new perspective about the potential for SME type companies to adopt and use storage in the cloud. It makes sense when you think about how these companies regularly deal with limited resources and technology challenges while trying to manage revenue growth and stay competitive.  So how and where do they get started?  What could drive SMEs more to the cloud is having a solution that makes the decision to utilize cloud storage simpler – for some things anyway – and which could yield immediate cost savings and efficiencies. Using storage in the cloud for off-site data protection purposes is a good place to start. Some of the more obvious reasons include: eliminating the need for cumbersome tape operations, faster recovery times, reduced tier 2 storage purchases, lowered storage management costs, and streamlining IT operations. But there are other reasons as well that may not be as obvious. Having an affordable and innovative solution that delivers enterprise class offsite data protection with performance and reliability capabilities will further help SME’s to consider cloud storage for these reasons as well:  

  1. Two-Site DR: It’s expensive to have a second site for DR purposes so having your data stored in the cloud can provide smaller and medium sized companies with instant, optimized, and low cost second-site disaster recovery capabilities. And if the solution is flexible enough, then you can also select from alternative DR methods that helps to improve overall DR readiness with a direct disk to cloud architecture.  
  2. Security and Compliance: Any business that hands off their data to someone else will want to know how secure their data will be. However, it may not be good enough to just secure data at rest while at the cloud storage provider site.  An optimized solution will also secure the data while in-flight providing an added level of security.
  3. Investment protection:  Once on-site backup software is configured and regularly operating, the last thing IT wants is something that will cause a disruption to data protection operations or introduce another layer of complexity in order to extend the backup process off-site. Off-site data protection solutions that can enable transparency through seamless and non-intrusive interoperability with existing backup operations, work with different backup products, and automate off-site data protection and DR operations will be much more attractive to IT organizations already strapped by resource constraints.
  4. Choice of cloud storage provider: All cloud storage providers offer something a little different and because they do it may require companies to create special APIs in order to connect to them. This also has a tendency to “lock” you into the storage provider when you may want choice as part of your strategy. The solution that can provide you with a wide array of integration choices relative to server support, virtualization software, back-up tools, and back-end cloud storage providers will prove best for companies reaching to the clouds for affordable, on-demand/pay-as-you-go capacity expansion, and compute anywhere accessibility.

Amazon S3 RRS Cloud Storage — Secondary storage tiers at 33% savings

Wednesday, May 26th, 2010

Amazon recently announced S3 RRS starting at $0.10/GB per month, a very palatable 33% less than S3 standard storage. What’s the catch? RRS means reduced redundancy storage, a tier of storage that maintains fewer copies of data than Amazon S3 standard service. According to Amazon, S3 standard storage provides “99.999999999% durability and 99.99% availability of objects over a given year” and can sustain “the concurrent loss of data in two facilities.” RRS storage provides “99.99% durability and 99.99% availability of objects over a given year” and can  “sustain the loss of data in a single facility.”

Why is this interesting to CloudArray customers? Because many CloudArray solutions involve secondary tiers of storage in the cloud with full copies of primary data on-premise. In these cases, RRS is a great cost-saving tradeoff for the secondary storage tier in the cloud with very little impact on overall data availability thanks to the full on-premise copy.

It is important to use this new tier of storage wisely. For those solutions using primary tiers of storage in the cloud without full onsite copies, it may make sense to stick with the standard S3 service.

Because CloudArray makes storage providers and policies flexible and transparent, our customers now have the benefit of a more cost-effective tier of storage for backup, data replication and business continuity solutions to Amazon.

Cloud Storage That Solves Business Problems — Customer Proof Points

Monday, May 24th, 2010

This weekend, I was glancing through a blog by George Crump entitled “When To Use Cloud Storage?” George concludes “Cloud storage providers and ISVs should focus on solving the business problem, not on the value of putting a bunch of data out in the internet.” We couldn’t agree more with that statement.

You may have seen our press announcement today regarding two recent customer wins through our partner Tricore Solutions. Let’s look closely at the customer use cases: Color Kinetics, a division of Phillips, is using CloudArray to improve their Oracle RMAN backup process. Tape costs disappear, backups complete faster and restore speeds improve. Tanya Creations, a large jewelry manufacturer is using CloudArray to enable Microsoft Exchange 2010 off-site replication and recovery, providing enterprise-class availability for their email system; all without the capital and administrative burden of building and managing a secondary site.

The message from these customers goes a lot deeper than removing cloud storage adoption hurdles to create a new tier of data storage. At the end of the day, CloudArray enabled business solutions that improved the operation of each respective enterprise. The cost savings over traditional off-site storage solutions makes the case even stronger.

It’s no secret that IT departments that were once technology-driven have become more and more business-driven. While cloud storage technology may or may not be a business priority today, compelling solutions for protecting and enhancing access to business data certainly are.

More to come…

Enable Oracle® Recovery Manager (RMAN) to store backup data to Cloud storage

Monday, December 7th, 2009

Together with one of our partners, Tricore Solutions, we are working with a number of Oracle customers to reduce cost and drive efficiency for database backups. TriCore is recognized as one of Oracle’s leading partners delivering a suite of services for enterprise accounts.

One of the challenges we are focused on is to protect Oracle databases using Oracle Recovery Manager (RMAN) and Cloud storage. The challenge faced by most enterprise customers is to reduce cost and drive efficiency to protect Oracle databases and improve restore process. Companies using tape and off-site vaulting services to store Oracle database backup data are shouldering a mountain of costs that will continue to increase linearly with database growth. Costs for such items as media (tape), tape transportation, and vaulting fees to house data offsite. And the process for restoring data is difficult and time consuming taking anywhere from hours to a week to retrieve data. In today’s business environment, expectations for information retrieval should be near real-time if not “real time”.

The solution we provide is an on-demand, expandable, low cost storage tier integrated seamlessly with Oracle RMAN. This solution is made up of CloudArray software together with public Cloud storage and services performed by TriCore solutions. The CloudArray solution enables users to provision volumes on-demand having policies to store backup data locally (cache) having a replica in the Cloud. CloudArray also encrypts and compresses data prior to delivering it to Cloud storage thereby providing security and performance.

We provide an overview of the solution in the following presentation:

Benefits:
The solution lessens the need to store backup data to tape and eliminates the cost involved in managing tape off-site by third party tape vaulting companies. Depending upon retention policies and the amount of data stored off-site, the cost of physical media is the largest cost component of the backup process. Data vaulting services may run as much as 15% of the media costs. For example, a company spending $40,000 per year on LTO could expect to spend as much as $6,000 per year to store the data off-site. CloudArray software and public Cloud storage combined is approx 1/20th of the cost resulting in a significant savings to the customer with near immediate ROI.

Via CloudArray software, Veeam now supports Cloud storage

Wednesday, December 2nd, 2009

CloudArray software is qualified with Veeam Backup & Replication software. Veeam, used by SMB to large enterprises for fast recovery of VMware ESX and ESXi environments can now leverage Cloud storage for backup and replication data. Companies we speak to continue to work on driving the cost out of backup, off-site tape management and replication platforms. Cloud storage via CloudArray software address these challenges.

Learn more about CloudArray and Veeam working seamlessly together to protect virtual server environments by viewing the following presentation. The presentation also provides a 3 yr TCO for backup and replication of a 5 TB environment. It compares traditional solutions with Cloud storage using CloudArray.

Enjoy.