ElasticVapor: Public Cloud Infrastructure Capacity Planning

In the run of a day I get a lot of calls from hosting companies and data centers looking to roll out public cloud infrastructures using Enomaly ECP. In these discussions there are a few questions that everyone seems to ask.

- How much is it going to cost?
- What is the minimum resources / capacity required to roll out a public cloud service?

Both questions are very much related. But to get to and idea of how much your cloud infrastructure is going to cost, you first need to fully understand what your resource requirements are and how much capacity (minimum resources) will be required to maintain an acceptable level of service and hopefully turn a profit.

In traditional dedicated or shared hosting environment, capacity planning is typically a fairly straight forward endeavor, (a high allotment of bandwidth and a fairly static allotment of resources), a single server (or slice of a server) with a static amount of storage and ram. If you run out of storage, or get too many visitors, well too bad. It is what it is. Some managed hosting providers offer more complex server deployment options but generally rather then one server you're given a static stack of several, but the concept of elasticity is not usually part of the equation.

Wikipedia gives a pretty good overview of concept of capacity planning which is described as process of determining the production capacity needed by an organization to meet changing demands for its products. Although this definition is being applied to a traditional business context, I think it works very well when looking at public cloud infrastructure.

Capacity is defined as the maximum amount of work that an organization is capable of completing in a given period of time with the following calculation, Capacity = (number of machines or workers) × (number of shifts) × (utilization) × (efficiency). A discrepancy between the capacity of an organization and the demands of its customers results in inefficiency, either in under-utilized resources or unfulfilled customers.

The broad classes of capacity planning are lead strategy, lag strategy, and match strategy.

Lead strategy is adding capacity in anticipation of an increase in demand. Lead strategy is an aggressive strategy with the goal of luring customers away from the company's competitors. The possible disadvantage to this strategy is that it often results in excess inventory, which is costly and often wasteful.
Lag strategy refers to adding capacity only after the organization is running at full capacity or beyond due to increase in demand (North Carolina State University, 2006). This is a more conservative strategy. It decreases the risk of waste, but it may result in the loss of possible customers.
Match strategy is adding capacity in small amounts in response to changing demand in the market. This is a more moderate strategy.

Compounding cloud capacity planning is the idea of elasticity. Now not only are you planning for typical usage, you must also try to forecast for sudden increases in demand across many customers using a shared multi-tenant infrastructure. In ECP we use the notion of capacity quota's where new customers are given a maximum amount of server capacity, say 20 VM's or 1TB of storage. For customers who require more, they then make a request to the cloud provider. The problem with this approach is it gives customers a limited amount of elasticity. You can stretch, but only so far. Another strategy we sometimes suggest is a flexible quota system (Match strategy) where after a period of time, you now trust the customer and automatically give them additional capacity or monitor their usage patterns and offer it to them before it becomes a problem. This is similar to how you seem to magically get more credit on your credit cards for being a good customer or get a call when you buy an unexpected big ticket item.

The use of a quota system is an extremely important aspect in any capacity / resource planning you will be doing when either launching or running your cloud service. A quota system gives you a predetermined level of deviation across a real or hypothetical pool of customers. Which with out it, is practically impossible to adequately run a public cloud service.

Next you must think of the notion of overselling your infrastructure. Let's say your default customer quota is 20 virtual servers, what percentage of those customers are going to use 100% of their allotment? 50%, 30%, 10%? Again this differs tremendously depending on the nature of your customers deployments and your comfort level. At the end of the day to stay competitive you're going to need to oversell your capacity. Overselling provides you the capital to continue to grow your infrastructure, hopeful slightly faster then your customers capacity requirements increase. The chances of 100% of your customers using 100% of their quota is probably going to be slim, the question you need to ask is what happens when 40% of your customers are using 60% of their quota? Does this mean 100% of the available capacity is being used? Cloud capacity planning also directly effects things like your SLA's and Q0S. Regardless of your platform, it's never good idea to use 100% of your available capacity nor should you. So determining the optimal capacity and having away to monitor it is going to be a crucial aspect in managing your cloud infrastructure.

I believe to fully answer the capacity question you must first determine your ideal customer. Determine where your sweet spot is, who you're going after (the low end, high end, commodity or niche markets). This will greatly help you determine your customer's capacity requirements. I'm also realistic, there is no one size fits all approach. For the most part, Cloud Computing is a best guess game, there are no best practices, architectural guidelines or practical references for you to base your deployment on. What it comes down to to is experience. The more of these we do the better we can plan. This is the value that companies such as Enomaly and the new crop of cloud computing consultants bring. What I find interesting is the more cloud computing as a service model is being adopted by hosting firms, the more these hosters are increasingly coming to us not only for our cloud infrastructure platform, but to help them navigate though a scary new world of cloud capacity planning.

ElasticVapor

Monday, September 21, 2009

Public Cloud Infrastructure Capacity Planning

#DigitalNibbles Podcast Sponsored by Intel

Instagram

Reuven Cohen ~ @ruv