Posts Tagged ‘online backup’

Clouds, Consistency, and Progress Bars

Monday, April 25th, 2011

I have the bad habit of staring at progress bars.

I was backing up my Mac laptop to a CloudArray volume.¹ With gigabit ethernet, a full backup to a CloudArray volume takes less time than my local USB drive. Apple’s Time Machine actually only does a full backup once, followed by hourly incrementals that are rolled together into dailies, weeklies, and synthetic fulls. That’s a fantastic model for the cloud, since it saves a lot on bandwidth, but I usually tear down most of my volumes and run the full backup again. It’s a good way for me to keep an eye on a number of different variables that can affect CloudArray performance.

Anyway, I set up a backup volume and sat back to watch the progress bars. Here’s a good one:

Flush the first

At this point, the backup on my laptop was mostly done. You can see that my CloudArray cache still had 35 gigs of dirty data, and it was just starting to work on flushing 8 gigs out to the cloud. Also, I’d been staring too long, and popped off to do important CTO-type stuff.

A few minutes later, important CTO-type stuff being done, I checked back in on my progress bar:

flush2

The same flush was still in progress, and it was mostly done. But wait! The cache still reports 35 gigs of dirty data! (Actually, 35.1… the operating system hadn’t finished flushing its own cache the last time I checked.) But if an 8 gigabyte flush was mostly done, shouldn’t the cache be almost 8 gigs cleaner? What ever can be going on?

The answer, of course, is a teachable moment.

I’ve been building storage arrays of one type or another for pretty much my whole career. The most important aspect of any array’s firmware is its consistency model, by which I mean: how does it ensure that the data that it stores accurately represents the data that the host applications wrote? If an application writes “AB” to the disk, how does the firmware ensure that the next time it reads from that disk, it gets back “AB”? That is absolutely the most fundamental requirement of a storage system: everything else is just icing.

That might not sound like that hard of a problem, but the nuances of storing data in a complex, shared, networked controller can be subtle. For example, if my application writes “A”, then “B”, then “C” to different locations, I always want to return A, B, C for those locations. But if you add in a cache to the controller, and assume that the cache will fail (you always assume that every hardware component will eventually fail), then it’s not enough to just store the data in the cache. If you are implementing a write-back cache, you have to store information about the order in which those writes occur, so that the underlying backing store (a physical disk, say) gets those writes in the same order. Otherwise, when that cache fails, your application might read back A and C, but not B.

Why is that a problem? What if your application is a database, (A, B) is a credit card transaction, and C is the database checkpoint? In that case, your database will correctly read A, read corrupted data in place of B, and C will tell it that the corrupted junk is just fine. That’s bad.

If your cache firmware is well implemented, though, and only gets the chance to write two blocks before the cache hardware fails, then it will write A and B. Now, when your database tries to reread the data, it’ll find (A, B), but without that crucial C, it’ll do a proper rollback of the transaction.

In CloudArray, we’ve got an added complication: our backing store is not a local physical drive. It’s a massively scalable set of redundant data centers probably located a thousand miles away from our cache. The performance difference between our local devices and the cloud is several orders of magnitude. So how can we maintain consistency?

The answer lies in our rather complex representation of block devices as objects. First, we notice that strict write ordering is not an absolute necessity. We simply need to ensure that our data in the cloud represents some state that existed in our virtual volume, so that if C is present in the cloud, then (A, B) is there, too, but we don’t need to represent each of the intermediate states (A), (A, B), (A, B, C). Then, we have to partition our incoming data into sequences that can represent transitions between these states: these sequences are what we call a flush, and we try to design those partitions to maximize bandwidth utilization while also minimizing the temporal distance between state transitions. Finally, after we’ve transmitted a flush to the cloud, we have to perform an atomic commit on our representation, so that the new state of the cloud is entirely consistent.

And we have to do that in a way that is mindful of the architecture of cloud storage systems, which are often designed around the (not at all scary and in fact quite cool in a nerdy way, in spite of what some people say) eventual consistency model.

What’s all that got to do with my progress bar?

Well, in order to make sure that our cloud data maintains consistency, especially in the presence of sometimes quite flaky networks, we can’t clean out our cache until we’ve successfully committed and verified the most recent state transition, i.e. the last flush. So my progress bar is not really indicating the amount of data that’s been emptied out of the cache: it only tells me how much of the most recent state has been transmitted to the cloud. The data can’t be marked clean in the cache until the actual, final commit has been completed.

So what happens when the flush completes? Let’s see:

flush3

Huh. There it is. The cache now has only 27.1 gigabytes of dirty pages left. Mission accomplished.

And if my CloudArray were to experience some kind of catastrophe right now, like some dastardly CTO yanking out a cache storage device, what would happen? Once I restored it to operation, then Time Machine would pull the nice, consistent image out of the cloud, notice the missing 27.1 gigs, and pick right up from there. Like I said, it’s a nice piece of software, but it does rely on consistent storage.

¹It’s pretty easy to set up a Time Machine backup using the the Studio Network Solutions globalSAN iSCSI initiator for OS X: just install it, point it at a CloudArray, and voila! Up pops whatever capacity I need. Launch Time Machine, set the CloudArray volume as the target disk, and I’ve got a whole bunch of progress bars to stare at.

TwinStrata and PEER 1 Team Up to Deliver Enterprise Class Cloud Storage

Monday, January 31st, 2011

 

TwinStrata conintues to broaden its partner ecosystem with leading cloud storage providers. Here is yet another example…

NATICK, Mass. and VANCOUVER, British Columbia, Jan. 31, 2011 /PRNewswire/ — TwinStrata, Inc., the leading innovator in iSCSI SAN, data protection and disaster recovery solutions using cloud storage, today announced it has achieved Bronze Partner Program status with international hosting provider, PEER 1 Hosting (TSX: PIX), further demonstrating TwinStrata’s continued commitment to supporting the industry’s key cloud storage providers along with their customers.

Read the full press release…

Cloud Storage Effect on Storage Management: Reduced Complexity, Maximized Resources, Improved Efficiency

Monday, January 24th, 2011

 

IT administrators continue to face the age-old challenges of storage management complexity and cost while the burden of managing exponential data growth has businesses of all sizes considering the best ways to store, protect, and archive their files, Exchange, and SharePoint data. The need to maximize resources and infrastructure, optimize storage requirements, and improve efficiencies remain top drivers for most of these businesses today. 

With all of these factors to consider, one of the most difficult skill sets for IT to find and retain are expert level administrators for specific storage management disciplines including storage administrators.

When you deploy an Enterprise or Mid-Range storage array, you generally need a team of people who are specialized in configuring, provisioning, and managing those storage arrays (let alone the compliance, disaster recovery, and other more advanced storage specializations).  Decisions made daily include RAID configuration, performance tuning, device management, storage pool provisioning, management of remote replication, management of consistency groups, and management of capacity and storage tiering. These are highly specialized and vendor specific skills. They will extend out to your application servers with CLI and API command sets which must be used to perform even simple client side tasks.

Most, if not all of these technology skill demands will disappear once you deploy Cloud Storage. Of course, if you deploy a Private Cloud, you will merely be moving the skill pools to a different area, but they will still largely vanish from your day-to-day data center operations.  With Public Clouds, they will go away almost immediately and entirely.

As Cloud Storage gets provisioned through CloudArray, your administrators will largely be working at the level of an average system administrator skill set when it comes to provisioning and managing storage.  Configuration requirements will be reduced to basic volume count, volume size, encryption requirements, and page size requirements.  None of this requires advanced degrees, decades of storage management experience, or high level vendor certifications.

By deploying a Cloud Storage model – especially for routine use cases such as online backup, archive, and disaster recovery operations – you can begin to free up highly skilled administrators and other IT specialists to redeploy and focus on other critical areas of your IT operations. Cloud Storage doesn’t necessarily mean direct reductions in headcount. Efficiency is in part about resource re-deployment without having to incur additional costs for people or infrastructure. Conversely, Cloud Storage might even allow growth in areas you otherwise couldn’t hire into before.

Essentially, as more leading-edge technologies begin to creep into IT shops and data centers, Cloud Storage is a direct and immediate way to reduce management complexity and costs affording IT the chance to spend more time on business applications, business continuity, and strategic IT planning and projects.

The best way to see this is to download and try it for yourself.  Visit www.TwinStrata.com for more information.

Why a Massachusetts High School Picked TwinStrata CloudArray over Competition

Tuesday, January 18th, 2011

 

Kyle Jones, technology manager, Essex Agricultural and Technical High School in Hathorne, MA tested cloud gateway products from Nasuni and TwinStrata to meet specific IT budget and operational objectives. The reasons TwinStrata won out are worth reading about, especially if you are part of a small to medium size business considering cloud storage for either offsite backup, archive, or disaster recovery and business continuity.

Mr. Jones was interviewed recently by TechTarget Senior Site Editor, Andrew Burton where he discussed his requirements, offsite storage/data protection options, and why CloudArray was a better business and technology solution choice to handle the school’s backup to cloud storage needs. 

You can read more about it here:  High School Deploys TwinStrata CloudArray Cloud Storage Gateway

TwinStrata CloudArray Picked as Finalist in Storage Magazine/SearchStorage 2010 Products of the Year Competition

Thursday, January 13th, 2011

 

Backup and Disaster Recovery (DR) Software and Services: 2010 Products of the Year Finalists Announced 

From nearly 200 entries, the judges of Storage magazine’s and SearchStorage.com’s 2010 Products of the Year awards have selected 43 products as finalists including TwinStrata CloudArray. CloudArray was selected as a finalist in the Backup and Disaster Recovery Software and Services category which covers backup, recovery, DR, snapshot, replication, electronic vaulting, and archives.

Read more… 

  

TwinStrata CloudArray Enables Westway to Easily Cut Storage Costs Without Compromising Data Protection

Thursday, January 13th, 2011

 

Westway is a company located in New Orleans, specializing in Bulk Chemical Storage and Liquid Feeds.  The numerous challenges they faced included trying to figure out how to establish a way to flexibly add storage as they grew without breaking the bank and reduce onsite infrastructure complexity and costs – all targeted at  cost effectively satisfying data storage and retention requirements while ensuring full data control and access and rapid recovery.

The accompanynig case study discusses Westways’ experience using CloudArray for offsite data protection that reduced their overnight backups to just a few hours making.

Enterprise class online backup and cloud storage just got easier in the Big Easy. Read more: 

Cloud Storage Performance: I/O Does Matter

Tuesday, January 11th, 2011

 

One of the first decisions you will need to make when tuning your environment for Cloud Storage I/O is what page size you will use to perform writes to your Cloud Storage Provider (CSP).  This is one of the configuration parameters you will enter when configuring a new volume in CloudArray. 

Page sizing is an important consideration, and represents the smallest unit of data that will be sent to your CSP from your CloudArray appliance, or read back when needed. Choose a size that is too small and you may have to do a lot more I/Os if you need to move a lot of data in bulk.  Choose a size that’s too big and you will move more data than you need to.

For example, if your application needs to read a lot of small chunks of data that don’t already reside in your CloudArray cache, then CloudArray will have to issue read requests to the CSP for each of those chunks.  If each chunk was 64K in length, and they weren’t contiguous, then CloudArray might have to issue 4 separate read requests for that data.  If the data were contiguous, then only 1 read request would need to be made.

Larger Page Sizes will result in more data being read than is needed for the current operation, but it may help performance if that data is needed at any point while it is still in cache. In CloudArray, you can choose variable page sizes from 128 KB (default) all the way to 2 MB.  512 KB is the recommended value for backups and in cases where there is a lot of sequential I/O. 

If you do a lot of random small block I/O, you should choose the smaller default page size.  This will prevent having to move a lot of empty data between your CSP and the CloudArray appliance.  A large page size here will cause slower overall performance since a lot more pages will have to be written to accommodate the data requirements. Likewise, if you are doing backups, you will want the 512 KB (or larger) page size.  This will result in fewer overall writes compared to a smaller block size and performance will increase.

This can have an impact in your cost model as well, but nowhere nearly as much as some vendors selling file system-based appliances would have you believe.  Some CSP’s will charge a small per transaction fee for each read or write request you make.   For Amazon S3 for example, the charge is $0.00001 per write transaction.  And so for a 1TB backup, that amounts to 8,388,608 x 128KB transactions, or $83.  If you used a 512KB page size, that would amount to 2,097,152 write transactions, or $20 for the write transaction costs.  Reads are cheaper by an order of ten.  That’s a far cry from the $1K+ figure for a 100GB write that another vendor would cost you.

Visit www.TwinStrata.comfor more info about CloudArray.