Posts Tagged ‘Cloud Storage’

Measuring Cloud Storage Performance:
Blocks vs. Files

Monday, February 13th, 2012

What are some good reasons to adopt cloud storage? Cost, durability and flexibility.

So let me talk about performance, instead.

Look at this graph:

provider bandwidth measurements

As part of our daily testing, we do routine performance measurements across a broad swath of cloud storage providers. It gives us a check to ensure that the various CloudArray subsystems are performing as they should, and gives us the data to make optimization decisions. In this particular test, we measure transfer rates at various buffer sizes. We “fill the pipe” by queueing up multiple streams of data simultaneously, initiating one transfer as soon as the previous one finishes, so that latency doesn’t skew the data.

This particular provider reaches its peak bandwidth at 256K transfer size, and is actually transferring at 50% of peak at 32K.

But look at this graph:

microsoft file distribution data

This data is from a Microsoft study published in FAST 2007. It describes the distribution of file sizes in a file system — interestingly, even though the mean file size does creep up over the years of the study, the distribution doesn’t change much. What we can draw from it is the fact that in a typical file system, roughly 80% of the files are less than 32k.

We can see that a naive system which just maps files on the file system directly to objects in the cloud is going to spend 80% of its file transfers at the bottom half of the bandwidth curve, achieving less than 50% of the peak available bandwidth. That’s ignoring latency, too: our hypothetical naive system is going to have to be streaming files out at a perfectly synchronized pace in order to achieve the theoretical maximum.

Our Provider A actually does pretty well with small transfers, rising to peak bandwidth relatively rapidly. What about a different provider?

provider bandwidth measurements -- two providers

What abysmal performance, right? Provider B’s bandwidth only rises to almost 20% of Provider A’s by 512K. Forget about issues with small writes: what reason would anybody have for picking Provider B? Could any cost or durability benefits be enough to suffer performance penalties this big?

But let’s zoom out and take a bigger picture view:

provider bandwidth measurements: large IO

Oops. This graph tells an entirely different story. There’s a real performance benefit to using Provider B, assuming that you are transferring large chunks of data.

Every cloud storage provider has different characteristics, even if the APIs are similar. The role of a cloud storage gateway is to smooth over the differences and provide a predictable solution for storing data in the cloud. That’s what CloudArray does: aggregates lots of small writes into large-block transfers, absorbs transient failures and network faults, and generally works to manage and optimize cloud storage utilization.

If a gateway vendor tells you that they won’t work with a particular cloud storage provider because its performance doesn’t meet their SLAs, then the simple fact is that their gateway isn’t doing its job. The more complicated fact is that naive file-to-object mappers will always be fundamentally flawed when dealing with real-world business data, because real-world business data is housed within file systems, and file systems are designed to talk to storage subsystems, and storage subsystems are designed to talk to disk controllers.

Anybody who’s worked in the enterprise storage array business can tell you about the small write problem in RAID: here it is again, written in the clouds. You’d pay a penalty for writing small pieces of data to a RAID volume, except for the years of work that have been spent developing storage systems that smooth out performance without sacrificing reliability. Odds are that many users have never heard of the small write problem, much less tuned their software to it or tried to plan out their file systems around optimizing their storage arrays.

But that’s exactly what they’ll need to be doing with their cloud storage, unless they use CloudArray. What’s our secret sauce? Actually, it’s no secret: we’re a block storage device, and we can tune and perfect our transfers to match your provider. That means we minimize time-to-durability and maximize the effectiveness of our cache. And that’s why we can make pictures like this:

We don’t make recommendations or publish our performance test results because ultimately, the choice of cloud storage provider should be a business decision, based on business factors like cost, durability, location, and a host of others. CloudArray’s architecture and capabilities make it possible for our customers to make those decisions for themselves, while being assured of getting the best performance.

– John Bates, CTO

Footnote/Mathematical aside: a careful reader will note that I discussed 80% of transfers, not transferred data. In fact, depending upon the total number of files and the distribution of sizes of the upper 20%, small files may be less than 1% of the total used capacity. And therefore, given the totally unrealistic model which disregards latency, a system with a low ratio of file count to total capacity and with a provider performance profile like A’s would pay only a minor small write penalty.

But that just serves to strengthen my point: why should any of this matter you, the user? A system with a higher ratio, or with a high-bandwidth skewed provider, can wind up spending 60% of its time transferring 10% of its data. Why should it be up to you to calculate your file system distributions and match them to the right cloud storage?

Cloud Storage Accelerates into 2012 IT Priorities

Tuesday, January 31st, 2012

Every year, our friends at ESG post results of their annual Spending Intentions Survey, indicating where many businesses are likely to spend their IT dollars over the coming year. Recently Steve Duplessie posted an article on his blog entitled Cloud – The Cost Containment Strategy that concludes cloud has finally “crossed the chasm” in IT. According to preliminary data, cloud represents the largest % projected spending increase for 2012 IT initiatives– a very exciting turn.

Truth is, cloud storage addresses long-standing IT priorities, with three of these priorities topping the list nearly every year:

  • Improving backup and recovery: Tapes and offsite backup continue to be a struggle and do not offer the reliability or recovery times that many businesses demand. Moving off-site backups to the cloud improves the recoverability of backups without tape or dedicated infrastructure. Moreover, it offers the choice of leveraging existing backup software or modernizing backup software and processes.
  • Managing data growth effectively: As capacity grows, storage array life cycles mandate adding, replacing and decommissioning storage arrays whenever they run out of capacity, all involving considerable capital expense and internal administration. Cloud storage can instead provide an unlimited pool of storage that is on-demand and pay-as-you-go. For the financially minded, this enables replacing capital and administrative expense with pure operating expense (opex) and near-zero administration.
  • Improving business continuity and disaster recovery: For many businesses, it has become standard practice to duplicate on-site storage equipment expenses to build out a disaster recovery facility. For others, it may mean skimping on disaster recovery because of the high costs. Cloud storage enables turning disaster recovery expenses into opex while maintaining recovery time objectives that rival dedicated solutions.

So what is the challenge for cloud becoming an initiative for businesses? Let’s think of the early days of server virtualization. Priorities such as server consolidation and reducing server hardware footprint existed well before server virtualization. However, it took some time before virtualization gained mainstream acceptance. Indeed we may be in the midst of the same type of transition, with cloud storage moving from an adopter solution to a mainstream initiative.

What’s even more exciting about cloud storage is that it’s extremely simple for businesses to get started. Whereas traditional storage infrastructure requires a forklift to deliver and install, a software download of CloudArray and 10-15 minutes of configuration nets you Terabytes or Petabytes of secure storage capacity that’s ready to use. If you haven’t tried cloud storage, consider adding it to your 2012 initiatives.

A New Cloud Storage Gateway Has Sprouted

Thursday, January 26th, 2012

You may have seen Amazon’s announcement of the AWS Storage Gateway beta here, here, and here. Truth be told, the cloud storage gateway market is starting to catch fire.

Amazon’s move validates the need for an iSCSI cloud storage gateway to easily deliver cloud storage into business environments and acknowledges that integration through APIs is not a process businesses will easily embrace. The deeper implication is that gateways facilitate the adoption of cloud storage as an alternative to on-premise or off-premise traditional storage, helping Amazon tap into a large multi-billion dollar data storage market. Does this mean more cloud providers may want to offer a gateway in the future? You bet.

There’s a lot of product differentiation between AWS Gateway and existing storage gateway products like CloudArray. Caching, performance, encryption, deduplication, compression, disaster recovery, high availability, ease of use, administration, storage capacity are all points of comparison one should consider. I’m sure there will be plenty published by users comparing hands-on experiences, so I won’t dive into deeper details here beyond mentioning one fly in the ointment: the AWS Gateway supports only one cloud provider.

Whether it’s purchasing servers or storage, businesses naturally prefer to have choices. And our cloud storage experience has been similar with customers preferring to have the flexibility to choose from a broad array of cloud providers including  Amazon, AT&T, Nirvanix, Rackspace, HP Cloud and PEER1 to name a few. Additionally, we are seeing increased interest in private clouds like OpenStack, EMC Atmos, Mezeo, Nirvanix, Scality and some businesses are even using existing storage as a starting point. If you plan to use multiple providers, are considering private cloud or at least want the flexibility to keep those options open in the future, a multi-provider gateway is a better solution.

Bottom line? Amazon’s announcement further positions cloud storage as a viable alternative to traditional on-premise and off-premise solutions. It presents one more way for businesses to easily connect to Amazon cloud storage and having more gateway choices is always a win for the customer. For customers seeking a robust, enterprise-class feature set and the industry’s broadest choice of public and private cloud providers, solutions like CloudArray continue to be the best option.

Measuring Cloud Storage Performance

Wednesday, January 18th, 2012

There are many excellent reasons to use cloud storage, but fast and efficient transfer of large amounts of data isn’t usually listed as a benefit. That’s one of the reasons why people use cloud storage gateways: to speed up cloud storage access. Recently, I realized we’ve never published any details on the performance gains that one should expect when using the CloudArray storage gateway, so I decided to create a simple illustrative test. In this article, I describe the results and explain some cloud storage implementation details that contribute to performance differences.

I came up with a quick test: copy one gigabyte of fully random data to the cloud, broken up into 32768 32k files. The questions are, how long would it take for a user to copy that much data to a CloudArray volume, and how soon before all of that data is safely stored in the cloud?

To set up the test, I used an old, slow laptop to run CloudArray. (That’s what I had handy in my office…) I used just the default basic configuration of a 25G cache to configure a single 100G encrypted volume, attached to Amazon S3, and mapped to a linux client. I formatted the volume with an ext3 file system and used the ‘cp’ command to copy the files. As a basis for comparison, I used an open-source file transfer utility, CyberDuck, to transfer exactly the same files to exactly the same the S3 account.

I’ve broken the CloudArray results down into two events: the data was “usable” when the ‘cp’ operation completes, and “complete” once all of the data had been copied to the cloud. At the usable point, any host attached to the CloudArray could access any of the files, as they were all stored in the local cache. In the background, the CloudArray busily pushed the data to the cloud, and when it was done, I considered the test complete. Of course, for the transfer utility, there’s no such split. The data was either copied to the cloud, or not.

The results of running the test were, well, dramatic:

Write performance comparison

For the record, that’s 6:22 (min:sec) for the CloudArray to reach the usable state and another 3:19 to reach the complete state, while it took the file transfer 110:05 to transfer exactly the same data.

What’s the reason for the huge difference? Well, I confess that I did set up this test to highlight one of the advantages of CloudArray; because we’re block-based, we’re not sensitive to the kinds of problems that plague file-based approaches.

The caching accounts for the rapid time to usability, but the more significant part of the equation is the aggregation that CloudArray performs due to the block-level IO. We send out all data in large, cloud-optimized chunks. Cloud storage provider systems store a single 1M object faster than they store a 32k object, so sending them 1024 1M objects is easier for them to handle. That same arithmetic applies to the number of requests, so that we sent 1024 PUT requests as opposed to the 32768 PUTs that CyberDuck (and, indeed, any file-based cloud utility) must send to handle this particular workload.

In other words, regardless of the size or number of files that you store on a CloudArray volume, traffic to the cloud is optimized to give the best performance.

To test read performance, I just copied the same files back from the cloud. Using CloudArray, that once again gave two separate cases: either all of the files were already in local cache, which meant that it was roughly the same as a local file copy, or they weren’t, in which case the data had to be read back from the cloud. Again, the data was automatically read from the cloud in large chunks, giving the best performance. The copy was performed with the ‘cp’ command on the linux host, and compared to CyberDuck transferring the files back.

The results are actually pretty symmetrical with respect to the original write performance:

Read performance comparison

Copying all the files from the local CloudArray cache took 6:05, while invalidating the cache and doing the same copy again took 11:42. On the other hand, the utility transfer took 112:24.

None of the numbers that I’ve given are all that useful for comparison with other environments. A large number of factors affect actual performance, e.g. WAN speed, LAN speed, and local disk speed, and I made no attempt to optimize any of them. A faster local disk or SSD, for example, would substantially reduce the time to usability. The importance of these results is the relative performance of CloudArray when compared to the raw transfer, and those results are impressive: 17.3x faster to usability, 11.4x faster to durability, and 9.6x faster to reload (18.4x if the data is in cache).

That’s minutes versus hours.

It’s a quick cup of coffee versus a three-martini lunch.

It’s renewing online versus waiting at the DMV.

It’s time saved.

User at 35,000 feet gives new meaning to “Cloud Storage”

Wednesday, January 11th, 2012

This week, a CloudArray user sent us this email, and he gave me permission to share it. CloudArray has been installed in lots of countries, but I think this is the first airborne installation! We love getting product feedback (good and bad) to help us improve the CloudArray user experience. Thanks to Phil Flores and Magnus IT Solutions for sharing.

Good morning Ann and Phil,

How are you doing? I am groovy. I thought I would send over a couple of data points you can share with your team and/or prospective customers that might be useful.

On Friday night, I boarded a 5 hour flight that had WiFi access on board. As you know, I have had absolutely no formal training or experience with TwinStrata or the CloudArray portal at all.  However, I wanted to see if I could get your software up and running since I had plenty of time on my hands. From start to finish, and using your VM instance on my laptop, I had 400 GB provisioned in the Amazon S3 cloud (using the license information your team provided) within 35 minutes…and that was with no previous experience or knowledge about setting up your virtual appliance!


Last night, I put your AMI EC2 instance up within 5 minutes and provisioned a 17 GB data store, a 25 GB data store, and a 30 GB data store and attached them to my Windows box. Total time from start to finish for all of the connections to be made was 10 minutes. Formatting the drives took longer because they were so large…but that has nothing to do with the TwinStrata software.

Overall…a very impressive software package and kudos to your development team for making the rollout/deployment so easy and seamless. Please let me know if you have any questions or need more information. Take care and I hope you are having a great weekend!

Cheers,
Phil

5 Cloudy Resolutions for Your Data Storage

Wednesday, January 4th, 2012

With 2012 already upon us, the time has come to make resolutions for the New Year. In the world of IT, this means resolving to abandon bad habits, unreliable processes and cumbersome tasks that often get in the way of business priorities. With the emergence of cloud storage as a viable means to address growing data storage needs, IT administrators can abandon storage headaches of years past in favor of better, faster and easier processes for managing data.

Cloud storage, in combination with enterprise-class gateways like CloudArray, offers security, availability, and performance that rivals local storage with no fear of vendor lock-in. Moreover, it makes it possible to fulfill New Year’s IT resolutions that were once considered unattainable. With this in mind, here are 5 resolutions that you can live up to by augmenting existing storage infrastructure with the cloud.

  1. Never run out of storage capacity again: Cloud storage provides unlimited, elastic capacity that grows or shrinks with your business needs on a pay-as-you go basis. Not only do you never run out of storage capacity, but more importantly, you can avoid the never-ending cycle of replacing storage arrays that includes complex migrations, costly capital and maintenance expenditures. Even better, on-premise hardware footprint never needs to increase.
  2. Kiss tape backup goodbye: While tape may always have a use for long term archives, many businesses can do away with regular daily tape backups by storing backups in the cloud. With cloud storage you can continue using existing backup software and policies without the hassle, unreliability and manual labor of tape.
  3. Stop purchasing dedicated hardware for disaster recovery: Nothing depletes an IT budget faster than buying data storage systems in pairs in order to have a second system ready for disaster recovery. Typically, dedicated secondary storage systems are only active during disaster test or disaster recovery. The advantage of cloud is that storage and compute infrastructure is available when you need it on a pay-as-you go basis at a fraction of the cost.
  4. Centralize storage management across sites: It’s not easy managing storage in a decentralized environment where every site is a silo with a private storage footprint as there is no easy way to reallocate capacity across sites or cobble a unified disaster recovery strategy. Cloud storage using enterprise-class gateways centralizes storage management across remote offices, offering unlimited, elastic capacity that never requires upgrades/replacements along with built-in centralized disaster recovery to the cloud. Managing multiple sites from a central location is now a reality.
  5. Retain access to a broad ecosystem of solutions: Everyone wants an open ecosystem of solutions including cloud providers and solution providers. Whether you are looking for a private cloud solution, a public cloud solution, or a combination of both, an enterprise-class storage gateway offers you options and provides you the opportunity to choose best-of-breed solutions that meet your needs. Alternatively, if you are looking to leverage your existing storage infrastructure as a starting point, enterprise-class storage gateways can help you enjoy the attributes of cloud storage.

You should look closely at the recent advances in cloud storage and enterprise-class gateway technology.  2012 may just be the year to see your “cloudy” resolutions to fruition.

10 Hot Trends in Cloud Data for 2012

Wednesday, December 21st, 2011

As 2011 rolls to a close, it’s time to make a few predictions for 2012 in the cloud data space. 2011 was a year of adoption, during which many companies started to leverage the cloud, enjoying the economies of scale, security and ease in managing their growing data needs. Those successes promise even greater cloud adoption in 2012. With that in mind, here are 10  predictions for hot trends to watch for in the cloud data space:

  1. Hybrid data storage environments combining cloud storage with existing storage. For most companies, the notion of moving all of their data to the cloud is not fathomable. However, continuously expanding data storage needs are fueling a need for more capacity. What better way to address this need than with cloud storage? The benefits include access to a secure, limitless pool of storage capacity, no future need for upgrade or replacement and reduced capital expenses. Look for auto-tiering technologies to seamlessly combine hybrid cloud and on-premise environments in a way that operates with existing applications.
  2. Private cloud environments in enterprise companies. Enterprises looking to leverage the economies, efficiencies and scale of cloud providers are adopting cloud models in-house, such as OpenStack, for both compute and storage environments. These private clouds offer scale, agility and price/performance typically unmatched by traditional infrastructure solutions and can reside inside a company’s firewall. In the storage space, look for technologies that can combine existing SAN infrastructure and private cloud storage into a unified Cloud SAN.
  3. Disaster recovery to the cloud as a viable option. Traditionally, companies that need disaster recovery (DR) and business continuity (BC) have relied on dedicated replicated infrastructure at an offsite location to be able to recover from physical disaster. This means paying for idle hardware that’s waiting for a disaster. DR in the cloud, on the other hand, means not having to pay for this infrastructure except when it is needed. The tradeoff? While not necessarily a zero-downtime solution, look for cloud DR with recovery time objectives (RTOs) in a matter of hours.
  4. Disaster recovery from the cloud as a new need. What happens to business data stored by SaaS application in the case of a disaster? The truth is most SaaS providers do have a DR strategy, but many businesses will demand a recovery strategy under their control. Look for emergence of solutions that backup SaaS data either locally or to an alternate provider as an extra level of protection.
  5. Simplified onboarding of applications to the cloud. Certain business applications can move entirely to the cloud, thereby saving the administrative and maintenance headaches of their hardware/software platforms onsite. Many IT-strapped businesses can benefit from tools to make this migration viable. Look for robust tool sets that can migrate applications to a choice of cloud providers – and also bring those applications back on-premise should the need arise.
  6. Non-relational databases for big data. NoSQL databases, like Apache CouchDB, enable tremendous scalability in order to meet the needs of Terabytes and Petabytes of data accessed by millions of users. Big data will force many companies to consider these alternatives to traditional databases and cloud deployment models will simplify the roll-out. Look for vendors providing supported NoSQL solutions.
  7. Use of the cloud for analytics. Analytics tend to require a scalable compute and storage environment as well as rather expensive software. Similar to idle hardware for disaster recovery purposes, analytics for many businesses may represent a seasonal need that only runs in short bursts and may not justify purchasing a dedicated software/hardware environment. Analytic environments in the cloud can turn the expense into a “pay-per-use” bill, meeting business goals at a far lower price point.
  8. SSD tiers of storage in the cloud. Moving higher performance applications into the cloud doesn’t always guarantee that they will get the level of performance they need from their data storage. By offering high-performance tiers of storage that are SSD-based (i.e. flash), cloud providers will be able to address the needs for predictable and faster application response times.
  9. Improvements in data reduction technology. With cloud storage commanding a per GB operating expense, deduplication and compression technologies have become rather ubiqitous in minimizing costs. While some may argue the capacity optimization game has played out, there is still the challenge of capacity optimization on a more global scale across multiple tenants and a challenge for rich media content which does not fare particularly well with today’s reduction technologies. Look for the introduction of new data reduction technologies that address both needs.
  10. Cloud-envy” from cloud laggards. While many companies have already adopted the cloud and many more will adopt in 2012, others may still wait and ponder well past 2012. Regardless of which category a company falls into, the economics and efficiencies of the cloud have become irrefutable. As a result, some of the laggards will likely seek ways to leverage cloud methodologies that improve IT efficiency on-premise. Undoubtedly, some will fall prey to cloudwashing by purchasing traditional IT infrastructure named “cloud” in an attempt to satisfy their “cloud-envy.”

Bottom line? Cloud deployments are becoming simpler and more secure and the economics continue to improve. Which of these trends will your business follow in 2012?