Archive for the ‘Cloud Storage’ Category

Is BYOC the next cloud initiative?

Tuesday, February 21st, 2012

While the notion of Bring Your Own Cloud (BYOC) may seem a bit far-fetched, Shadow IT, where users essentially bring unauthorized cloud services into business environments, has become an increasing corporate concern as highlighted in a recent CFO.com article. The risk of Shadow IT is that it compromises IT’s ability to maintain a standardized infrastructure across employees and business units, opening the door to potential security and reliability issues. In spite of this, there is a benefit in analyzing Shadow IT to understand why users adopt non-standard processes and what benefits existing IT could not deliver. It often pays to examine how employees creatively accomplish their goals in constrained environments and figure out how to accommodate those employees in a supportable fashion.

Shadow IT is not just limited to cloud. In a close parallel, many IT organizations that once outlawed all but one “approved” mobile device to access corporate accounts and email have loosened restrictions on accommodating multiple devices, moving closer to a BYOD (Bring Your Own Device) model. While pundits argue BYOD might transform IT into the “Wild West,” many organizations are finding a middle ground that offers users a selection of approved mobile devices for corporate access. With a standard set of security and reliability policies, users and business units gain the flexibility to meet their needs and optimize costs without having to consider Shadow IT.

So will trends toward BYOD now spawn a movement toward BYOC? Well, cloud solutions naturally require policies to protect data security and integrity and not all cloud solutions are suitable for enterprise use. In fact, it’s best to be wary of consumer-oriented solutions. On the other hand, the adoption of cloud solutions by many businesses emphasizes the immediate benefits they offer, including:

  1. Faster provisioning of resources. Traditional IT requests may take days or weeks to service due to competing priorities or delays in procurement and deployment.
  2. Overcoming capital spending limits. Subscription and pay-per-use models enable access to resources on an ongoing basis, avoiding capital spending limits and often falling under a separate opex budget
  3. Meeting variable resource demands. Demand for IT resources may change weekly, monthly or seasonally, calling for dynamic and elastic resource management often out of the realm of IT’s static processes

With these benefits in mind, how can businesses rationalize cloud solutions so that they meet corporate security, control and reliability requirements and avoid inhibiting the capabilities of their users? In the data storage realm, the answer can be found in cloud storage gateway products like CloudArray, offering choices across many supported public, private and hybrid cloud storage solutions with a set of overarching policies that meet corporate requirements for security, control, vendor lock prevention and reliability. With these solutions, users or business units have virtually all the cloud choices, options and policies they need to optimize their data storage deployments.

While BYOC is an unlikely IT initiative in the near term, businesses can benefit greatly today from standardizing on a set of efficient cloud options that deliver new levels of IT flexibility.

Measuring Cloud Storage Performance:
Blocks vs. Files

Monday, February 13th, 2012

What are some good reasons to adopt cloud storage? Cost, durability and flexibility.

So let me talk about performance, instead.

Look at this graph:

provider bandwidth measurements

As part of our daily testing, we do routine performance measurements across a broad swath of cloud storage providers. It gives us a check to ensure that the various CloudArray subsystems are performing as they should, and gives us the data to make optimization decisions. In this particular test, we measure transfer rates at various buffer sizes. We “fill the pipe” by queueing up multiple streams of data simultaneously, initiating one transfer as soon as the previous one finishes, so that latency doesn’t skew the data.

This particular provider reaches its peak bandwidth at 256K transfer size, and is actually transferring at 50% of peak at 32K.

But look at this graph:

microsoft file distribution data

This data is from a Microsoft study published in FAST 2007. It describes the distribution of file sizes in a file system — interestingly, even though the mean file size does creep up over the years of the study, the distribution doesn’t change much. What we can draw from it is the fact that in a typical file system, roughly 80% of the files are less than 32k.

We can see that a naive system which just maps files on the file system directly to objects in the cloud is going to spend 80% of its file transfers at the bottom half of the bandwidth curve, achieving less than 50% of the peak available bandwidth. That’s ignoring latency, too: our hypothetical naive system is going to have to be streaming files out at a perfectly synchronized pace in order to achieve the theoretical maximum.

Our Provider A actually does pretty well with small transfers, rising to peak bandwidth relatively rapidly. What about a different provider?

provider bandwidth measurements -- two providers

What abysmal performance, right? Provider B’s bandwidth only rises to almost 20% of Provider A’s by 512K. Forget about issues with small writes: what reason would anybody have for picking Provider B? Could any cost or durability benefits be enough to suffer performance penalties this big?

But let’s zoom out and take a bigger picture view:

provider bandwidth measurements: large IO

Oops. This graph tells an entirely different story. There’s a real performance benefit to using Provider B, assuming that you are transferring large chunks of data.

Every cloud storage provider has different characteristics, even if the APIs are similar. The role of a cloud storage gateway is to smooth over the differences and provide a predictable solution for storing data in the cloud. That’s what CloudArray does: aggregates lots of small writes into large-block transfers, absorbs transient failures and network faults, and generally works to manage and optimize cloud storage utilization.

If a gateway vendor tells you that they won’t work with a particular cloud storage provider because its performance doesn’t meet their SLAs, then the simple fact is that their gateway isn’t doing its job. The more complicated fact is that naive file-to-object mappers will always be fundamentally flawed when dealing with real-world business data, because real-world business data is housed within file systems, and file systems are designed to talk to storage subsystems, and storage subsystems are designed to talk to disk controllers.

Anybody who’s worked in the enterprise storage array business can tell you about the small write problem in RAID: here it is again, written in the clouds. You’d pay a penalty for writing small pieces of data to a RAID volume, except for the years of work that have been spent developing storage systems that smooth out performance without sacrificing reliability. Odds are that many users have never heard of the small write problem, much less tuned their software to it or tried to plan out their file systems around optimizing their storage arrays.

But that’s exactly what they’ll need to be doing with their cloud storage, unless they use CloudArray. What’s our secret sauce? Actually, it’s no secret: we’re a block storage device, and we can tune and perfect our transfers to match your provider. That means we minimize time-to-durability and maximize the effectiveness of our cache. And that’s why we can make pictures like this:

We don’t make recommendations or publish our performance test results because ultimately, the choice of cloud storage provider should be a business decision, based on business factors like cost, durability, location, and a host of others. CloudArray’s architecture and capabilities make it possible for our customers to make those decisions for themselves, while being assured of getting the best performance.

– John Bates, CTO

Footnote/Mathematical aside: a careful reader will note that I discussed 80% of transfers, not transferred data. In fact, depending upon the total number of files and the distribution of sizes of the upper 20%, small files may be less than 1% of the total used capacity. And therefore, given the totally unrealistic model which disregards latency, a system with a low ratio of file count to total capacity and with a provider performance profile like A’s would pay only a minor small write penalty.

But that just serves to strengthen my point: why should any of this matter you, the user? A system with a higher ratio, or with a high-bandwidth skewed provider, can wind up spending 60% of its time transferring 10% of its data. Why should it be up to you to calculate your file system distributions and match them to the right cloud storage?

Cloud Storage Accelerates into 2012 IT Priorities

Tuesday, January 31st, 2012

Every year, our friends at ESG post results of their annual Spending Intentions Survey, indicating where many businesses are likely to spend their IT dollars over the coming year. Recently Steve Duplessie posted an article on his blog entitled Cloud – The Cost Containment Strategy that concludes cloud has finally “crossed the chasm” in IT. According to preliminary data, cloud represents the largest % projected spending increase for 2012 IT initiatives– a very exciting turn.

Truth is, cloud storage addresses long-standing IT priorities, with three of these priorities topping the list nearly every year:

  • Improving backup and recovery: Tapes and offsite backup continue to be a struggle and do not offer the reliability or recovery times that many businesses demand. Moving off-site backups to the cloud improves the recoverability of backups without tape or dedicated infrastructure. Moreover, it offers the choice of leveraging existing backup software or modernizing backup software and processes.
  • Managing data growth effectively: As capacity grows, storage array life cycles mandate adding, replacing and decommissioning storage arrays whenever they run out of capacity, all involving considerable capital expense and internal administration. Cloud storage can instead provide an unlimited pool of storage that is on-demand and pay-as-you-go. For the financially minded, this enables replacing capital and administrative expense with pure operating expense (opex) and near-zero administration.
  • Improving business continuity and disaster recovery: For many businesses, it has become standard practice to duplicate on-site storage equipment expenses to build out a disaster recovery facility. For others, it may mean skimping on disaster recovery because of the high costs. Cloud storage enables turning disaster recovery expenses into opex while maintaining recovery time objectives that rival dedicated solutions.

So what is the challenge for cloud becoming an initiative for businesses? Let’s think of the early days of server virtualization. Priorities such as server consolidation and reducing server hardware footprint existed well before server virtualization. However, it took some time before virtualization gained mainstream acceptance. Indeed we may be in the midst of the same type of transition, with cloud storage moving from an adopter solution to a mainstream initiative.

What’s even more exciting about cloud storage is that it’s extremely simple for businesses to get started. Whereas traditional storage infrastructure requires a forklift to deliver and install, a software download of CloudArray and 10-15 minutes of configuration nets you Terabytes or Petabytes of secure storage capacity that’s ready to use. If you haven’t tried cloud storage, consider adding it to your 2012 initiatives.

A New Cloud Storage Gateway Has Sprouted

Thursday, January 26th, 2012

You may have seen Amazon’s announcement of the AWS Storage Gateway beta here, here, and here. Truth be told, the cloud storage gateway market is starting to catch fire.

Amazon’s move validates the need for an iSCSI cloud storage gateway to easily deliver cloud storage into business environments and acknowledges that integration through APIs is not a process businesses will easily embrace. The deeper implication is that gateways facilitate the adoption of cloud storage as an alternative to on-premise or off-premise traditional storage, helping Amazon tap into a large multi-billion dollar data storage market. Does this mean more cloud providers may want to offer a gateway in the future? You bet.

There’s a lot of product differentiation between AWS Gateway and existing storage gateway products like CloudArray. Caching, performance, encryption, deduplication, compression, disaster recovery, high availability, ease of use, administration, storage capacity are all points of comparison one should consider. I’m sure there will be plenty published by users comparing hands-on experiences, so I won’t dive into deeper details here beyond mentioning one fly in the ointment: the AWS Gateway supports only one cloud provider.

Whether it’s purchasing servers or storage, businesses naturally prefer to have choices. And our cloud storage experience has been similar with customers preferring to have the flexibility to choose from a broad array of cloud providers including  Amazon, AT&T, Nirvanix, Rackspace, HP Cloud and PEER1 to name a few. Additionally, we are seeing increased interest in private clouds like OpenStack, EMC Atmos, Mezeo, Nirvanix, Scality and some businesses are even using existing storage as a starting point. If you plan to use multiple providers, are considering private cloud or at least want the flexibility to keep those options open in the future, a multi-provider gateway is a better solution.

Bottom line? Amazon’s announcement further positions cloud storage as a viable alternative to traditional on-premise and off-premise solutions. It presents one more way for businesses to easily connect to Amazon cloud storage and having more gateway choices is always a win for the customer. For customers seeking a robust, enterprise-class feature set and the industry’s broadest choice of public and private cloud providers, solutions like CloudArray continue to be the best option.

Measuring Cloud Storage Performance

Wednesday, January 18th, 2012

There are many excellent reasons to use cloud storage, but fast and efficient transfer of large amounts of data isn’t usually listed as a benefit. That’s one of the reasons why people use cloud storage gateways: to speed up cloud storage access. Recently, I realized we’ve never published any details on the performance gains that one should expect when using the CloudArray storage gateway, so I decided to create a simple illustrative test. In this article, I describe the results and explain some cloud storage implementation details that contribute to performance differences.

I came up with a quick test: copy one gigabyte of fully random data to the cloud, broken up into 32768 32k files. The questions are, how long would it take for a user to copy that much data to a CloudArray volume, and how soon before all of that data is safely stored in the cloud?

To set up the test, I used an old, slow laptop to run CloudArray. (That’s what I had handy in my office…) I used just the default basic configuration of a 25G cache to configure a single 100G encrypted volume, attached to Amazon S3, and mapped to a linux client. I formatted the volume with an ext3 file system and used the ‘cp’ command to copy the files. As a basis for comparison, I used an open-source file transfer utility, CyberDuck, to transfer exactly the same files to exactly the same the S3 account.

I’ve broken the CloudArray results down into two events: the data was “usable” when the ‘cp’ operation completes, and “complete” once all of the data had been copied to the cloud. At the usable point, any host attached to the CloudArray could access any of the files, as they were all stored in the local cache. In the background, the CloudArray busily pushed the data to the cloud, and when it was done, I considered the test complete. Of course, for the transfer utility, there’s no such split. The data was either copied to the cloud, or not.

The results of running the test were, well, dramatic:

Write performance comparison

For the record, that’s 6:22 (min:sec) for the CloudArray to reach the usable state and another 3:19 to reach the complete state, while it took the file transfer 110:05 to transfer exactly the same data.

What’s the reason for the huge difference? Well, I confess that I did set up this test to highlight one of the advantages of CloudArray; because we’re block-based, we’re not sensitive to the kinds of problems that plague file-based approaches.

The caching accounts for the rapid time to usability, but the more significant part of the equation is the aggregation that CloudArray performs due to the block-level IO. We send out all data in large, cloud-optimized chunks. Cloud storage provider systems store a single 1M object faster than they store a 32k object, so sending them 1024 1M objects is easier for them to handle. That same arithmetic applies to the number of requests, so that we sent 1024 PUT requests as opposed to the 32768 PUTs that CyberDuck (and, indeed, any file-based cloud utility) must send to handle this particular workload.

In other words, regardless of the size or number of files that you store on a CloudArray volume, traffic to the cloud is optimized to give the best performance.

To test read performance, I just copied the same files back from the cloud. Using CloudArray, that once again gave two separate cases: either all of the files were already in local cache, which meant that it was roughly the same as a local file copy, or they weren’t, in which case the data had to be read back from the cloud. Again, the data was automatically read from the cloud in large chunks, giving the best performance. The copy was performed with the ‘cp’ command on the linux host, and compared to CyberDuck transferring the files back.

The results are actually pretty symmetrical with respect to the original write performance:

Read performance comparison

Copying all the files from the local CloudArray cache took 6:05, while invalidating the cache and doing the same copy again took 11:42. On the other hand, the utility transfer took 112:24.

None of the numbers that I’ve given are all that useful for comparison with other environments. A large number of factors affect actual performance, e.g. WAN speed, LAN speed, and local disk speed, and I made no attempt to optimize any of them. A faster local disk or SSD, for example, would substantially reduce the time to usability. The importance of these results is the relative performance of CloudArray when compared to the raw transfer, and those results are impressive: 17.3x faster to usability, 11.4x faster to durability, and 9.6x faster to reload (18.4x if the data is in cache).

That’s minutes versus hours.

It’s a quick cup of coffee versus a three-martini lunch.

It’s renewing online versus waiting at the DMV.

It’s time saved.

User at 35,000 feet gives new meaning to “Cloud Storage”

Wednesday, January 11th, 2012

This week, a CloudArray user sent us this email, and he gave me permission to share it. CloudArray has been installed in lots of countries, but I think this is the first airborne installation! We love getting product feedback (good and bad) to help us improve the CloudArray user experience. Thanks to Phil Flores and Magnus IT Solutions for sharing.

Good morning Ann and Phil,

How are you doing? I am groovy. I thought I would send over a couple of data points you can share with your team and/or prospective customers that might be useful.

On Friday night, I boarded a 5 hour flight that had WiFi access on board. As you know, I have had absolutely no formal training or experience with TwinStrata or the CloudArray portal at all.  However, I wanted to see if I could get your software up and running since I had plenty of time on my hands. From start to finish, and using your VM instance on my laptop, I had 400 GB provisioned in the Amazon S3 cloud (using the license information your team provided) within 35 minutes…and that was with no previous experience or knowledge about setting up your virtual appliance!


Last night, I put your AMI EC2 instance up within 5 minutes and provisioned a 17 GB data store, a 25 GB data store, and a 30 GB data store and attached them to my Windows box. Total time from start to finish for all of the connections to be made was 10 minutes. Formatting the drives took longer because they were so large…but that has nothing to do with the TwinStrata software.

Overall…a very impressive software package and kudos to your development team for making the rollout/deployment so easy and seamless. Please let me know if you have any questions or need more information. Take care and I hope you are having a great weekend!

Cheers,
Phil

5 Cloudy Resolutions for Your Data Storage

Wednesday, January 4th, 2012

With 2012 already upon us, the time has come to make resolutions for the New Year. In the world of IT, this means resolving to abandon bad habits, unreliable processes and cumbersome tasks that often get in the way of business priorities. With the emergence of cloud storage as a viable means to address growing data storage needs, IT administrators can abandon storage headaches of years past in favor of better, faster and easier processes for managing data.

Cloud storage, in combination with enterprise-class gateways like CloudArray, offers security, availability, and performance that rivals local storage with no fear of vendor lock-in. Moreover, it makes it possible to fulfill New Year’s IT resolutions that were once considered unattainable. With this in mind, here are 5 resolutions that you can live up to by augmenting existing storage infrastructure with the cloud.

  1. Never run out of storage capacity again: Cloud storage provides unlimited, elastic capacity that grows or shrinks with your business needs on a pay-as-you go basis. Not only do you never run out of storage capacity, but more importantly, you can avoid the never-ending cycle of replacing storage arrays that includes complex migrations, costly capital and maintenance expenditures. Even better, on-premise hardware footprint never needs to increase.
  2. Kiss tape backup goodbye: While tape may always have a use for long term archives, many businesses can do away with regular daily tape backups by storing backups in the cloud. With cloud storage you can continue using existing backup software and policies without the hassle, unreliability and manual labor of tape.
  3. Stop purchasing dedicated hardware for disaster recovery: Nothing depletes an IT budget faster than buying data storage systems in pairs in order to have a second system ready for disaster recovery. Typically, dedicated secondary storage systems are only active during disaster test or disaster recovery. The advantage of cloud is that storage and compute infrastructure is available when you need it on a pay-as-you go basis at a fraction of the cost.
  4. Centralize storage management across sites: It’s not easy managing storage in a decentralized environment where every site is a silo with a private storage footprint as there is no easy way to reallocate capacity across sites or cobble a unified disaster recovery strategy. Cloud storage using enterprise-class gateways centralizes storage management across remote offices, offering unlimited, elastic capacity that never requires upgrades/replacements along with built-in centralized disaster recovery to the cloud. Managing multiple sites from a central location is now a reality.
  5. Retain access to a broad ecosystem of solutions: Everyone wants an open ecosystem of solutions including cloud providers and solution providers. Whether you are looking for a private cloud solution, a public cloud solution, or a combination of both, an enterprise-class storage gateway offers you options and provides you the opportunity to choose best-of-breed solutions that meet your needs. Alternatively, if you are looking to leverage your existing storage infrastructure as a starting point, enterprise-class storage gateways can help you enjoy the attributes of cloud storage.

You should look closely at the recent advances in cloud storage and enterprise-class gateway technology.  2012 may just be the year to see your “cloudy” resolutions to fruition.