Sunday, August 9, 2009

Waiting in the Cloud Queue

Which would you rather have? A compute job that gets done over a 12 hour period on a Supercomputer with the catch that you need to wait 7 days until the job actually runs? Or a job that runs over a 60 hour period on a lower performance public cloud infrastructure that can start immediately?

In a recent post Ian Foster asked just this saying "what if I don't care how fast my programs run, I simply want to run them as soon as possible? In that case, the relevant metric is not execution time but elapsed time from submission to the completion of execution. (In other words, the time that we must wait before execution starts becomes significant.)"

What I can't help wondering is whether cloud computing may be shifting the focus of high performance computing from the need for optimized peak utilization of a few very specific tasks to lower performance cloud platforms that can run a much broader set of diverse parallel tasks.

Or to put it another way, in those seven days while I wait for my traditional HPC job to get scheduled and completed, I could have been running dozens of similar jobs on lower performance public cloud infrastructures capable of running multiple variations of the original task in parallel.

In a sense this question perfectly illustrates the potential economies of scale cloud computing enables. (a long run concept that refers to reductions in unit cost as the size of a facility, or scale, increases) On a singular basis my job will take a significantly longer period of time to execute. But on the other hand, by using a public cloud there is siginificanly more capacity available to me, so I am able to do significantly more at a much lower cost per compute cycle in roughly the same time my original job was in the queue .
Reblog this post [with Zemanta]

#DigitalNibbles Podcast Sponsored by Intel

If you would like to be a guest on the show, please get in touch.