Friday, May 29, 2009

Redux: The Universal Compute Unit & Compute Cycle

Recently I posed a questions, is there an opportunity to create a common or standard Cloud Performance Rating System? And if so, how might it work? The feed back has been staggering. With all the interest in a Standardized Cloud Performance Rating System concept, I thought it was time to reintroduce the Universal Compute Unit & Compute Cycle concept.

Last year I proposed the need for a "standard unit of measurement" for cloud computing similar to that of the International System of Units or better known as the metric system. This unit of cloud capacity is needed in order to ensure a level playing field as the demand and use of cloud computing becomes commoditized.

Nick Carr famously pointed out that before the creation of a standardized electrical grid it was nearly impossible for a large scale sharing of electricity. Cities and regions would have their own power plants limited to their particular area, and the energy itself was not reliable (specially during peak times). Then came the "universal system" which enabled a standard in which electricity could be interchanged and or shared using a common set of electrical standards. Generating stations and electrical loads using different frequencies could now be interconnected using this universal system.

Recently several companies have attempted to define cloud capacity, notably Amazon's Elastic Compute Cloud service uses a EC2 Compute Unit. Amazon states they use a variety of measurements to provide each EC2 instance with a consistent and predictable amount of CPU capacity. The amount of CPU that is allocated to a particular instance is expressed in terms of EC2 Compute Units. Amazon explains that they use several benchmarks and tests to manage the consistency and predictability of the performance from an EC2 Compute Unit. One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. They claim this is the equivalent to an early-2006 1.7 GHz Xeon processor. Amazon makes no mention of how they achieve their benchmark and users of the EC2 system are not given any insight to how they came to their conclusion. Currently there are no standards for cloud capacity and therefore there is no effective way for users to compare with other cloud providers in order to make the best decision for their application demands.

There have been attempts to do this type of benchmarking in the grid and high performance computing space in particular, but these standards pose serious problems for non scientific usage such as web applications. One of the more common methods has been the use of the FLOPS (or flops or flop/s) an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, but actually don't do much for applications outside of the scientifically realm because of its dependence on floating point calculations. Measuring floating point operation speed, therefore, does not predict accurately how the processor will perform on just any problem. For ordinary (non-scientific) applications, integer operations (measured in MIPS) are far more effective and could form the basis for the Universal Compute Unit.

By basing the Universal Compute Unit on integer operations it can form an (approximate) indicator of the likely performance of a given virtual machine within a given cloud such as Amazon EC2 or even avirtualized data center. One potential point of analysis may be in using a stand clock rate measured in hertz derived by multiplying the instructions per cycle and the clock speed (measured in cycles per second). It can be more accurately defined within the context of both a virtual machine kernel and standard single andmulticore processor types.

The Universal Compute Cycle (UCC) is the inverse of Universal Compute Unit. The UCC would be used when direct system access in the cloud and or operating system is not available. One such example is Google's App Engine. UCC could be based on clock cycles per instruction or the number of clock cycles that happen when an instruction is being executed. This allows for an inverse calculation to be performed to determine the UcU value as well as providing a secondary level of performance evaluation.

I am the first to admit there are a variety of ways to solve this problem and by no means am I claiming to have solved all the issues at hand. My goal at this point is to engage an open dialog to work toward this common goal.

To this end I propose the development of an open standard for cloud computing capacity called the Universal Compute Unit (UcU) and it's inverse Universal Compute Cycle (UCC). An open standard unit of measurement (with benchmarking tools) will allow providers, enablers and consumers to be able to easily, quickly and efficiently access auditable compute capacity with the knowledge that 1 UcU is the same regardless of the cloud provider.

The cloud isn't about anyone single VM or process but how many VM's or processes work together. For example AMD's PR Performance Rating system which was used to compare their (under performing) processors to the leader Intel. Problem was it was for a very particular use case, but generally it gave you the idea. (Anyone technical knew Intel was better at Floating point, but most consumers didn't care or weren't technical enough to know the difference)

Similarly cloud provider may want to use some aggregate performance metrics as a basis of comparing themselves to other providers. For example, Cloud A (High End) has 1,000 servers and fibre channel, Provider B (Commodity) has 50,000 servers but uses direct attached storage. Both are useful but for different reasons. If I want performance I pick Cloud A, if I want massive scale I pick Cloud B. Think of it like the food guide on back of your cereal box.

Reblog this post [with Zemanta]

#DigitalNibbles Podcast Sponsored by Intel

If you would like to be a guest on the show, please get in touch.