Posts Tagged ‘performance’

Measuring Cloud Storage Performance

Wednesday, January 18th, 2012

There are many excellent reasons to use cloud storage, but fast and efficient transfer of large amounts of data isn’t usually listed as a benefit. That’s one of the reasons why people use cloud storage gateways: to speed up cloud storage access. Recently, I realized we’ve never published any details on the performance gains that one should expect when using the CloudArray storage gateway, so I decided to create a simple illustrative test. In this article, I describe the results and explain some cloud storage implementation details that contribute to performance differences.

I came up with a quick test: copy one gigabyte of fully random data to the cloud, broken up into 32768 32k files. The questions are, how long would it take for a user to copy that much data to a CloudArray volume, and how soon before all of that data is safely stored in the cloud?

To set up the test, I used an old, slow laptop to run CloudArray. (That’s what I had handy in my office…) I used just the default basic configuration of a 25G cache to configure a single 100G encrypted volume, attached to Amazon S3, and mapped to a linux client. I formatted the volume with an ext3 file system and used the ‘cp’ command to copy the files. As a basis for comparison, I used an open-source file transfer utility, CyberDuck, to transfer exactly the same files to exactly the same the S3 account.

I’ve broken the CloudArray results down into two events: the data was “usable” when the ‘cp’ operation completes, and “complete” once all of the data had been copied to the cloud. At the usable point, any host attached to the CloudArray could access any of the files, as they were all stored in the local cache. In the background, the CloudArray busily pushed the data to the cloud, and when it was done, I considered the test complete. Of course, for the transfer utility, there’s no such split. The data was either copied to the cloud, or not.

The results of running the test were, well, dramatic:

Write performance comparison

For the record, that’s 6:22 (min:sec) for the CloudArray to reach the usable state and another 3:19 to reach the complete state, while it took the file transfer 110:05 to transfer exactly the same data.

What’s the reason for the huge difference? Well, I confess that I did set up this test to highlight one of the advantages of CloudArray; because we’re block-based, we’re not sensitive to the kinds of problems that plague file-based approaches.

The caching accounts for the rapid time to usability, but the more significant part of the equation is the aggregation that CloudArray performs due to the block-level IO. We send out all data in large, cloud-optimized chunks. Cloud storage provider systems store a single 1M object faster than they store a 32k object, so sending them 1024 1M objects is easier for them to handle. That same arithmetic applies to the number of requests, so that we sent 1024 PUT requests as opposed to the 32768 PUTs that CyberDuck (and, indeed, any file-based cloud utility) must send to handle this particular workload.

In other words, regardless of the size or number of files that you store on a CloudArray volume, traffic to the cloud is optimized to give the best performance.

To test read performance, I just copied the same files back from the cloud. Using CloudArray, that once again gave two separate cases: either all of the files were already in local cache, which meant that it was roughly the same as a local file copy, or they weren’t, in which case the data had to be read back from the cloud. Again, the data was automatically read from the cloud in large chunks, giving the best performance. The copy was performed with the ‘cp’ command on the linux host, and compared to CyberDuck transferring the files back.

The results are actually pretty symmetrical with respect to the original write performance:

Read performance comparison

Copying all the files from the local CloudArray cache took 6:05, while invalidating the cache and doing the same copy again took 11:42. On the other hand, the utility transfer took 112:24.

None of the numbers that I’ve given are all that useful for comparison with other environments. A large number of factors affect actual performance, e.g. WAN speed, LAN speed, and local disk speed, and I made no attempt to optimize any of them. A faster local disk or SSD, for example, would substantially reduce the time to usability. The importance of these results is the relative performance of CloudArray when compared to the raw transfer, and those results are impressive: 17.3x faster to usability, 11.4x faster to durability, and 9.6x faster to reload (18.4x if the data is in cache).

That’s minutes versus hours.

It’s a quick cup of coffee versus a three-martini lunch.

It’s renewing online versus waiting at the DMV.

It’s time saved.

6 Key Features of Cloud Storage Gateways (On-ramps or Enablers)

Monday, February 28th, 2011

Are you considering cloud storage for your business?  There are many reasons you should.  Using innovative cloud technology, IT is solving data storage problems in new ways. Whether it’s for off-site data protection, disaster recovery or just storage capacity expansion, the pay-as-you go model pioneered by a number of cloud storage providers can be very compelling.

Rather than use cloud storage directly by writing to custom APIs, building your own security policies and architecting a performance framework to meet application needs, you may find that on-premise cloud storage software or hardware (i.e. gateways, on-ramps, enablers) make integration simpler.  Purchasing a product that handles security, performance, data reduction and plug-and-play integration can significantly accelerate and simplify deployment.

With a handful of gateway products already on the market that can connect your on-premise environment to cloud storage, a natural question may (or should) be “what is the difference between these products?” above and beyond the aforementioned functionality.

To answer this, we’ve put together a list of 6 differentiating features you should consider when choosing a cloud storage gateway:

1) Dynamic caching policies to meet application needs:  A monolithic cloud storage cache may not be able to handle the performance needs for all applications. A backup application may benefit from a cache consisting of low-cost storage optimized for large sequential access, while an NTFS file system may benefit more from an SSD-based cache, optimized for smaller, more randomized access. Each application may require more or less cache over time. Having application-specific caching policies that are dynamic means you can meet needs of different applications using a single solution.

2) Option to replicate a local copy to the cloud: Some vendors argue that having a full local copy defeats the purpose of cloud storage – not at all true! Imagine replacing a real-time replicated secondary site requiring hardware, infrastructure and maintenance costs with a pay-as-you-go cloud! Or imagine not having a secondary site to begin with and now finding a 2-site replication solution within easy reach. This is a very compelling business proposition, particularly for transactional applications that require a full local copy for latency reasons.

3) In-cloud snapshots: Snapshots are rapidly becoming a key part of modernized backup and, when using the cloud, it is important to find out whether a gateway solution offers snapshots. If yes, are the snapshots copy-on-write and on-premise, meaning potential bandwidth thrashing between the local site and the cloud? Or are the snapshots in-cloud, redirect-on write, meaning no bandwidth penalty or performance penalty and readily available in case of disaster? If you have the option of the latter, you may have gathered that it is far superior.

4) Block and file-level access: It’s amazing to hear arguments from vendors trying to convince users that file access is better than block access for cloud gateways. The reality is that there are advantages to file access and advantages to block access. Supporting both means supporting that widest variety of operating systems, file systems and applications; and there is no longer any argument. Hint: having native block access (like iSCSI) means you can support both.

5) “Zero-friction” entry point to cloud storage: Deploying cloud storage should not mean continuing to spend additional CapEx/OpEx associated with traditional storage infrastructure and incurring the same 3-yr upgrade cycles.  Sure, there are advantages to optimized hardware appliances for accessing cloud storage, but only when needs and budget dictate. A choice of software, hardware and subscription models  with upgrade paths between each are the ideal way to start using cloud storage with minimal risk/cost and the ability to grow.

6) In-cloud disaster recovery and Compute-Anywhere capability: Once your data is in the cloud, you can access it anywhere, but how about in the cloud? Why not be able to leverage unlimited pay-as-you-go cloud compute cycles for disaster recovery or test. Beyond disaster recovery, your data or snapshots of data can and should “work” for you in the cloud. You can even leverage Big Data without dedicated processing resources by using cloud compute. Think about a vision of a hybrid data center and how this capability can enhance IT.

In summary, all cloud gateways, on-ramps, or enablers are not equal and it takes looking beyond  the similarities in features to understand whether they will meet the needs specific to your environment and grow to meet your future needs. It pays to look under the covers before purchasing…

Perhaps you have found a cloud storage solution that has all of these features. If you haven’t, we suggest you consider a cloud storage solution that does….

Cloud Storage Effect on Storage Management: Reduced Complexity, Maximized Resources, Improved Efficiency

Monday, January 24th, 2011

 

IT administrators continue to face the age-old challenges of storage management complexity and cost while the burden of managing exponential data growth has businesses of all sizes considering the best ways to store, protect, and archive their files, Exchange, and SharePoint data. The need to maximize resources and infrastructure, optimize storage requirements, and improve efficiencies remain top drivers for most of these businesses today. 

With all of these factors to consider, one of the most difficult skill sets for IT to find and retain are expert level administrators for specific storage management disciplines including storage administrators.

When you deploy an Enterprise or Mid-Range storage array, you generally need a team of people who are specialized in configuring, provisioning, and managing those storage arrays (let alone the compliance, disaster recovery, and other more advanced storage specializations).  Decisions made daily include RAID configuration, performance tuning, device management, storage pool provisioning, management of remote replication, management of consistency groups, and management of capacity and storage tiering. These are highly specialized and vendor specific skills. They will extend out to your application servers with CLI and API command sets which must be used to perform even simple client side tasks.

Most, if not all of these technology skill demands will disappear once you deploy Cloud Storage. Of course, if you deploy a Private Cloud, you will merely be moving the skill pools to a different area, but they will still largely vanish from your day-to-day data center operations.  With Public Clouds, they will go away almost immediately and entirely.

As Cloud Storage gets provisioned through CloudArray, your administrators will largely be working at the level of an average system administrator skill set when it comes to provisioning and managing storage.  Configuration requirements will be reduced to basic volume count, volume size, encryption requirements, and page size requirements.  None of this requires advanced degrees, decades of storage management experience, or high level vendor certifications.

By deploying a Cloud Storage model – especially for routine use cases such as online backup, archive, and disaster recovery operations – you can begin to free up highly skilled administrators and other IT specialists to redeploy and focus on other critical areas of your IT operations. Cloud Storage doesn’t necessarily mean direct reductions in headcount. Efficiency is in part about resource re-deployment without having to incur additional costs for people or infrastructure. Conversely, Cloud Storage might even allow growth in areas you otherwise couldn’t hire into before.

Essentially, as more leading-edge technologies begin to creep into IT shops and data centers, Cloud Storage is a direct and immediate way to reduce management complexity and costs affording IT the chance to spend more time on business applications, business continuity, and strategic IT planning and projects.

The best way to see this is to download and try it for yourself.  Visit www.TwinStrata.com for more information.

Why a Massachusetts High School Picked TwinStrata CloudArray over Competition

Tuesday, January 18th, 2011

 

Kyle Jones, technology manager, Essex Agricultural and Technical High School in Hathorne, MA tested cloud gateway products from Nasuni and TwinStrata to meet specific IT budget and operational objectives. The reasons TwinStrata won out are worth reading about, especially if you are part of a small to medium size business considering cloud storage for either offsite backup, archive, or disaster recovery and business continuity.

Mr. Jones was interviewed recently by TechTarget Senior Site Editor, Andrew Burton where he discussed his requirements, offsite storage/data protection options, and why CloudArray was a better business and technology solution choice to handle the school’s backup to cloud storage needs. 

You can read more about it here:  High School Deploys TwinStrata CloudArray Cloud Storage Gateway

Cloud Storage Performance: I/O Does Matter

Tuesday, January 11th, 2011

 

One of the first decisions you will need to make when tuning your environment for Cloud Storage I/O is what page size you will use to perform writes to your Cloud Storage Provider (CSP).  This is one of the configuration parameters you will enter when configuring a new volume in CloudArray. 

Page sizing is an important consideration, and represents the smallest unit of data that will be sent to your CSP from your CloudArray appliance, or read back when needed. Choose a size that is too small and you may have to do a lot more I/Os if you need to move a lot of data in bulk.  Choose a size that’s too big and you will move more data than you need to.

For example, if your application needs to read a lot of small chunks of data that don’t already reside in your CloudArray cache, then CloudArray will have to issue read requests to the CSP for each of those chunks.  If each chunk was 64K in length, and they weren’t contiguous, then CloudArray might have to issue 4 separate read requests for that data.  If the data were contiguous, then only 1 read request would need to be made.

Larger Page Sizes will result in more data being read than is needed for the current operation, but it may help performance if that data is needed at any point while it is still in cache. In CloudArray, you can choose variable page sizes from 128 KB (default) all the way to 2 MB.  512 KB is the recommended value for backups and in cases where there is a lot of sequential I/O. 

If you do a lot of random small block I/O, you should choose the smaller default page size.  This will prevent having to move a lot of empty data between your CSP and the CloudArray appliance.  A large page size here will cause slower overall performance since a lot more pages will have to be written to accommodate the data requirements. Likewise, if you are doing backups, you will want the 512 KB (or larger) page size.  This will result in fewer overall writes compared to a smaller block size and performance will increase.

This can have an impact in your cost model as well, but nowhere nearly as much as some vendors selling file system-based appliances would have you believe.  Some CSP’s will charge a small per transaction fee for each read or write request you make.   For Amazon S3 for example, the charge is $0.00001 per write transaction.  And so for a 1TB backup, that amounts to 8,388,608 x 128KB transactions, or $83.  If you used a 512KB page size, that would amount to 2,097,152 write transactions, or $20 for the write transaction costs.  Reads are cheaper by an order of ten.  That’s a far cry from the $1K+ figure for a 100GB write that another vendor would cost you.

Visit www.TwinStrata.comfor more info about CloudArray.

TwinStrata Delivers Newest Version of its Leading CloudArray® Offsite Data Protection and Disaster Recovery Solution

Thursday, December 16th, 2010

 

CloudArray® Version 2.0 is available in both virtual and physical appliance configurations.

The new CloudArray Version 2.0 includes a broad range of updates designed to further enhance the overall operation, reliability, and performance of CloudArray’s robust and proven feature set for offsite data protection, archive, and disaster recovery. Additionally, CloudArray 2.0 expands its reach to include direct integration with Mezeo, Peer1, and Scalitycloud storage platforms, offering customers more choice and flexibility when implementing a cloud storage strategy. Other CloudArray 2.0 enhancements include support for HA clustering, Solaris interoperability, alert and portal improvements and performance improvements to the virtual appliance.

CloudArray V2.0 is available today from TwinStrata and through its network of partners. For more information, visit www.twinstrata.com/cloudarray, email sales@twinstrata.com or call 508-651-0199.  

You can also try CloudArray FREE for 30 days: www.twinstrata.com/cloudarray_evaluation

TwinStrataCloudArray is a proven solution that can significantly reduce capex/opex by enabling companies of all sizes to easily adopt offsite data protection and disaster recovery solutions in minutes without any changes to existing applications or need for new programming or APIs in order to connect with cloud storage providers. TwinStrata provides enterprise-class data protection solutions that are simple, affordable, and secure. These solutions leverage the scalability and efficiency of cloud storage while maintaining the availability, performance, and security of local storage. CloudArray software provides a substantial advantage over traditional off-site storage solutions, with a pay-as-you-go model, unlimited elastic capacity, local performance, in-cloud snapshots, AES256 bit encryption, and on-site, off-site or in-the-cloud access to data.

I Have NOT Lost My Mind — I Have It Backed Up On Tape Somewhere

Monday, December 6th, 2010

 

The question is: If you can eliminate tape, then can you even eliminate backup? In a recent article, George Crump (http://www.networkcomputing.com/deduplication/you-can-eliminate-backups.php) discussed the implications of eliminating backup altogether.  His argument is that with the capabilities of modern storage systems – snapshots, deduplication, compression and replication– you can preserve multiple restore points without the need for a separate backup operation. 

He specifically argues: “Using a combination of snapshots, deduplication, compression and replication is a cost-effective way of storing redundant copies. Many primary storage systems support a high number of snapshots and/or unlimited copies of data by leveraging deduplication. Most can then have that data replicated to a remote site so you are covered for a single site disaster. With these features deployed, we now have point-in-time local recovery and total system recovery in case of a disaster covered, but there are some potential drawbacks.”

Using Cloud Storage as the remote replication target in this case will work very well, and will be more cost effective than using your expensive primary storage devices for backup.  

With CloudArray, you can create instantaneous snapshots of your data, allowing you to establish multiple remote restore points from a single copy of your data.   This doesn’t have to be your primary data store. Cloud Storage can actually become an economical cog in your tiered storage strategy.

But getting back to George’s article, he discusses several drawbacks with using primary storage as your source for your restore points.  Basically, they come down to the risk, however small, of not having a separate copy of your data (both physically and logically).  Even in cases where you are replicating your data to a remote facility, a logical corruption fault could affect both sites, especially if the fault were with the logic of the de-duplication engine itself.

Some companies have eliminated separate backups very successfully, but it takes a great deal of planning in order to make sure that the restore points will be consistent across applications and data stores. It wouldn’t help you to have your accounts payable tables backed up at one point in time and your inventory shipments at a different point. 

But is it right for you? 

Maybe, but a safer approach is to still use backup software and write your backup to a physically separate data store than your primary storage.  CloudArray can do this for you as well.  If you use a backup product that can write to disk (D2D), then you can write to CloudArray and a copy of the backup images will be kept locally as well as in the Cloud.  Restores will always come from the local CloudArray disk cache if you size it properly, and in the event of a total site disruption, a copy of your data will still be housed safely offsite and can be recovered from any site you choose.

Eliminating tape is a good first step. Eliminating backup entirely might be an option for you down the road (or not). Remember the cardinal rule: “To go forward, you must backup.” So you probably shouldn’t be in a rush to eliminate it. But if you have lost your mind because you’ve backed it up on tape somewhere, then without CloudArray, you may never get it back!