Saturday, August 30, 2008

VMware's Cloud Ambitions

It appears that VMware's cloud ambitions are starting to take shape. I have received several reports indicating that VMware is activily recruiting people for its new cloud initiative based in Palo Alto.

In a recent job posting VMware states they are looking for a Sr. Software Engineer who can deliver products and features that relate to how VMware can participate in the hosting/Cloud computing/SaaS space. This person will participate in all stages of development from proof of concept to production documentation, and support.

Sounds like an early stage project, I'll keep you posted.

Friday, August 29, 2008

Cloud Computing Guide (Contributor Sign-up)

Over the last few weeks we've been busy finalizing the cloud computing book and I am now happy to announce the deal is done and will be published by a major book publisher (with an Irish name). We are now ready for potential contributors to signup to help us create the book.

If you are interested in contributing to the "Cloud Computing Guide" we have created an online submission form to help in the selection process. In order to be considered, you will need to fill out the form located at http://cloudcomputing.wufoo.com/forms/book-contributor-form-cloud-computing-guide/

If you have any questions, please feel free to get in touch.

We've also created a discussion group at http://groups.google.com/group/cloudguide

Wednesday, August 27, 2008

Layoffs: Future Hazy for Bungee Labs

It's been a very busy couple days for cloud related news.Yesterday, Bungee Labs fired 15 employees which they blamed on actual versus anticipated rates of adoption of their platform as a service offering. I've know team at Bungee for a few years and admire them all. I can only imagine this must have been a very tough decision for them to make.

That said I've always said that a hosted cloud IDE is a level of vendor lockin most customers are not prepared to engage. For Bungee Labs to be successful or any proprietary cloud platform to be successful, they need to have massive and broad adoption using common programming languages and tools. I feel the only way this will happen, specially in the platform as a service space is for PaaS service providers to GO OPEN SOURCE!

So whats next for Bungee, well, they're going open source.

Here's what they said.

We have begun the process of opening our own source code via the publication and “solicitation for comments” on the Bungee Labs Community Source License (BCSL) which will provide no-fee source code access to Bungee’s full stack when we emerge from Beta.

More Details: http://blogs.bungeeconnect.com/2008/08/27/changes/

Offline Cloud: Google says sorry for outage

It appears to be the summer of the cloud outage. After a significant gmail outage earlier this month. Google has come out with a number of improvements to their customer service and SLA. In an interesting turn of events, they seem to be taking a page from the Amazon Web Services playbook by offering a cloud dashboard to provide users with up to the minute system status information. It's nice to see Google starting to pay attention to their "paying customer" base.

I should also note Microsoft has done a particularly good job with their new cloud dashboard.

Here is the email Google sent to "paying" Google apps users.
We're committed to making Google Apps Premier Edition a service on which your organization can depend. During the first half of August, we didn't do this as well as we should have. We had three outages - on August 6, August 11, and August 15. The August 11 outage was experienced by nearly all Google Apps Premier users while the August 6 and 15 outages were minor and affected a very small number of Google Apps Premier users. As is typical of things associated with Google, these outages were the subject of much public commentary.

Through this note, we want to assure you that system reliability is a top priority at Google. When outages occur, Google engineers around the world are immediately mobilized to resolve the issue. We made mistakes in August, and we're sorry. While we're passionate about excellence, we can't promise you a future that's completely free of system interruptions. Instead, we promise you rapid resolution of any production problem; and more importantly, we promise you focused discipline on preventing recurrence of the same problem.


Given the production incidents that occurred in August, we'll be extending the full SLA credit to all Google Apps Premier customers for the month of August, which represents a 15-day extension of your service. SLA credits will be applied to the new service term for accounts with a renewal order pending. This credit will be applied to your account automatically so there's no action needed on your part.


We've also heard your guidance around the need for better communication when outages occur. Here are three things that we're doing to make things better:

  1. We're building a dashboard to provide you with system status information. This dashboard, which we aim to make available in a few months, will enable us to share the following information during an outage:

    1. A description of the problem, with emphasis on user impact. Our belief is during the course of an outage, we should be singularly focused on solving the problem. Solving production problems involves an investigative process that's iterative. Until the problem is solved, we don't have accurate information around root cause, much less corrective action, that will be particularly useful to you. Given this practical reality, we believe that informing you that a problem exists and assuring you that we're working on resolving it is the useful thing to do.
    2. A continuously updated estimated time-to-resolution. Many of you have told us that it's important to let you know when the problem will be solved. Once again, the answer is not always immediately known. In this case, we'll provide regular updates to you as we progress through the troubleshooting process.

  2. In cases where your business requires more detailed information, we'll provide a formal incident report within 48 hours of problem resolution. This incident report will contain the following information:

    a. business description of the problem, with emphasis on user impact;
    b. technical description of the problem, with emphasis on root cause;
    c. actions taken to solve the problem;
    d. actions taken or to be taken to prevent recurrence of the problem; and
    e. time line of the outage.

  3. In cases where your business requires an in-depth dialogue about the outage, we'll support your internal communication process through participation in post-mortem calls with you and your management team.

Once again, thanks for you continued support and understanding.

Sincerely,
The Google Apps Team

A Cloud Haiku

Today, Sam Charrington posted an entertaining cloud poem on the google cloud group. So I thought I'd take a stab at doing one as well, but mine is in haiku (俳句), and based on some resent cloud computing events :)

A white cloud rises
Cascading systems run
Breaks in the night

Major Storage issues at Flexiscale

I've been holding off reporting this. (Sorry Tony) But now that it has become public knowledge I feel its appropriate to post. Flexiscale has over last 24 hours been having some serious problems with their storage systems. Typically these type of problems relate to some runway process, like in Amazon S3's outage last month where their gossip protocol was to blame. But in the case of Flexiscale it seems that the problem appears to be that of a human error, which was made worst by a poor disaster recovery process. It seems that one of their administrators mistakenly deleted one of the main storage volumes. Now more then 12 hours later flexiscale users have read-only access to the storage platform but no read-write. Simply put, they have to rebuild their arrays, but don't have the space to do so.

Here's what Tony Lucas of FlexiScale had to say.

As some of you are aware, we have been having issues with I/O (disk speed) in recent weeks. We identified short term and long term measures to eliminate these problems. The short team measures involved reorganising how data was stored across our storage network in a more efficient manner, and the long term measure was to increase the overall I/O capacity of the platform.

As a preparatory step to adding additional capacity one of our engineers was reorganising the data structure on the storage network and whilst cleaning up the snapshots we use as our backup process accidentally deleted one of the main storage volumes. This caused an immediate outage to a large amount of our customers

We immediately took action to take the entire disk structure offline (which caused the remaining customers to be taken offline) as it was the only way to preserve the integrity of the data on the system. Work then commenced with our storage vendor to restore this data.

Although we have now successfully gained read-only access to everyones data, a bug in the storage platforms operating system has prevented us from providing read-write access to it. This was discovered at 11pm last night, just when we thought we were about to bring the entire disk structure back online.

After consulting with our storage vendor it was agreed the most sensible option would be to copy the entire volume to a new disk structure (still maintaining it's integrity and structure), from where we could re-mount it correctly. Unfortunately due to it's size we didn't have spare capacity on the platform to create a complete duplicate of it.

An investigation of other ways of restoring the data then was undertaken but all options were considered too risky, and although downtime is a major problem for everyone, we felt the integrity of the data was the most important factor.

The decision was then taken to get additional capacity in from the storage vendor as soon as possible so that we could then increase the capacity to a sufficient level to allow us to copy the volume and successfully restore it. We originally thought we would be able to get this today, but unfortunately it will not arrive until mid-morning tomorrow, although we have done (and will continue to do) everything we can to speed this up.

At this time we are assisting customers who need access to specific files to get this, and we will continue this as long as we can into the night as resources allow.

Tomorrow morning once the storage arrives and is online, we will copy the data across and then begin to restart the entire platform as quickly as possible, but as the system wasn't designed to restart everything at once, this will take time.

We will be offering credits against our SLA, which will be determined once everyone is back up and running, as I'm sure you can appreciate all resources are being focused on that at this moment.

I, and all my staff are well aware of the potential impact this will be causing to you our customers, and we are doing everything we can to help in that respect. We will also be undertaking an investigation to ensure additional safeguards are put in place to prevent this happening again.

Sincerely,
Tony Lucas
Chief Executive Officer
XCalibre/FlexiScale

Evolutionary Computing

As an unpractical futurist the concept of singularity (the theoretical future point of technological advancement in which the ability for software to improve itself using artificial intelligence is archived) is an idea that has been of great interest to me for a long time. In order to create a self improving cloud computing systems (autonomic computing) you first need to look at what "life" is and how it can be applied to computing.

Life doesn't necessarily have to be self-aware in order to be alive. A single cell bacteria is arguably just as alive as my dog Winston and Winston just as alive as a human. Whether an application is simple or complex isn't important either, the common thread among all life forms is its ability to reproduce and adapt. The more important aspect is that of the life cycle; birth and death, mutation and evolution. In order to enable this type of life cycle computing (evolutionary computing), we need to create a software system capable of creating its own source code and then being apply patches to itself, then repeating the process over and over. The system should be capable of seeing any quantitative changes for better or worse overtime in each iterative version. These improvements could be a kind of artificial evolutionary process where certain branches may result in dead ends and where other branches may evolve into improved versions of the software. It should also be able to examine other source code as a basis of comparison and apply certain aspects when and if needed. (As a developer its easier to go modify some else's code then to create it from scratch.)

To provide some background, the Seed AI theory referred to the concept of recursive self-enhancement and is a key aspect of superintelligence (superior intelligence when compared that of a human). But in my opinion; intelligence is not as important as the ability to be performance aware. I'd rather have a system capable of understanding that a core component isn't running in a optimal way, then attempt to apply a series of patches until it fines a better more efficient way. As humans, we tend to find solutions to problems based on trial and error, so why not give our software the same freedom. The software should also be able to understand past failures and be able to determine that certain directions may not have worked. But it also should be able to understand that certain aspects of a previous branch that failed could also potentially be useful in other successful branches.

The biggest issue other then the obvious "how", is security related. This where the story starts to sounds a little big like science fiction. Hypothetically these types of systems could become incredibly powerful and the biggest threat they will face will be human. Embedding rules of conduct such as Isaac Asimov "Three Laws of Robotics" could be easily removed because of the evolutionary nature of the system. Thus controlling the system will start to look more like partnership. This type of evolutionary, self improving, self adapting, and self replicating technology could improve almost all aspects of technology, but with great power comes great responsibilities. Once the cat has been let out of the bag, it will be impossible to ever go back.

So will it ever happen? Arthur C. Clarke formulated the following three "laws" of prediction:

1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
3. Any sufficiently advanced technology is indistinguishable from magic.

Will we archive "singularity" some day? Certainly. Will we be able to control it? I doubt it.

Tuesday, August 26, 2008

Linkedin Adding Group Discussions

As a long time linkedin user, I was quick to create a number of Linkedin user groups including my Toronto Entrepreneur, cloud storage and cloudcamp groups. But for the most part linkedin's group functionality has been, umm, none existent, forcing me to look for alternatives such as Google Groups. But that's about to change.

This Friday, linkedin will be adding several much-requested features to their group functionality:

* Discussion forums: Simple discussion spaces for members.
* Enhanced roster: Searchable list of group members.
* Digest emails: Daily or weekly digests of new discussion topics which members may choose to receive.
* Group home page: A private space for members on LinkedIn.

Expect a notice from me on linkedin, assuming you've joined one of my groups.

Is MapReduce going mainstream?

It's been an interesting summer for Google's MapReduce software paradigm. I'm not going to get into the finer details of MapReduce, the general idea is its Google's magic sauce, basically it's what lets them run their massively distributed data sets. So any company that wants to be like Google or needs to compete with Google should pay attention to MapReduce.

Last month Intel, HP and yahoo announced a joint research program to examine it's usage and now today Greenplum, a provider of database software for the what they describe as "next generation of data warehousing and analytics", announced support for MapReduce within its massively parallel database engine.

Greenplums announcement to integrate MapReduce functionality into its enterprise focused database is an important step toward taking MapReduce out of academic research labs and moving it to lucrative corporate users.

To give you some background, currently the two most popular implemtations of MapReduce are the open source Apache Hadoop project and unfortunately named Pig project. For those of you who don't know about about Hadoop, it is an open source platform for distributing, managing and then collecting computing work throughout a large computing cloud using MapReduce. Pig, a Yahoo Research project currently being incubated at Apache, is a language designed to make using the Hadoop infrastructure effectively. It has been described as SQL for MapReduce, allowing queries to be written and then parallelised and run on the Hadoop platform.

I found this quote interesting, it was mentioned in Greemplums press release.

"Greenplum has seamlessly integrated MapReduce into its database, making it possible for us to access our massive dataset with standard SQL queries in combination with MapReduce programs," said Roger Magoulas, Research Director, O'Reilly Media. "We are finding this to be incredibly efficient because complex SQL queries can be expressed in a few lines of Perl or Python code.

Also interesting to note that earlier this year IBM released an Eclipse plug-in that simplifies the creation and deployment of MapReduce programs. This plug-in was developed by the team at IBM Software Group's High Performance On Demand Solutions Unit at the IBM Silicon Valley Laboratory. So it may be a matter of time before we see MapReduce commercially offered by IBM.

So what's next? Will we see a Microsoft implementation or an Oracle MapReduce? For now, MapReduce appears to be the new "coolness" and with all the industry attention it seems to be getting I think we may be on the verge of finally seeing MapReduce enter the mainstream consciousness.

As a side note, my favorite MapReduce implementation is called Skynet. The name says it all.

Monday, August 25, 2008

Ruv's Law: Less is Less, More is More

The point is, ladies and gentleman, that more-- for lack of a better word -- is good.

More is right. More works.

Although I am not a fictional wall street tycoon, I do have a hard time buying the continuing view that server consolidation is the ultimate use case for virtualization. Call me a wide eyed optimist, I believe my company will continue to grow and as my company grows our demand for additional computing capacity will grow and for that reason I believe more is better.

If I we're a minimalist, I may believe that less is more, but unlike Ludwig Mies van der Rohe I do not believe that anything which is spare or stripped to its essentials is better. It is not. I'd rather a have more, more family, more employees and yes, even more money.

Now you know.

Dell's CloudBox (Containerized Data Centers)

Taking a tip from Google, Sun, Rackable and HP the word on the street is that Dell is preparing to enter the new and exciting world of containerized data centers. Helping further suspicions, in a job posted earlier today today on Linkedin, it seems that Dell is finally ready to make the dip. They describe the job role as "The primary purpose of the Enclosure Product Management position is to develop a strategy around Dell's entry into the Datacenter enclosure market place, implement and then sustain the execution of that strategy." Combined with the rumors about joint facebook / Dell cloud services, the pieces seem to be falling into place.

I find this move particularly interesting given Dell's interest in cloud computing. I think Dell may envision the ability to use these type of containerized data centers as a method of "rapid physical cloud provisioning". (Facebook suddenly has an increase in demand from Japan, no problem we'll have a container on the next ship to Toyko.)

In yet another interesting twist of fate, Google has received a patent from the USPTO for the concept of a "mobile datacenter" stored in a standard shipping container and equipped with multiple racks of high-powered servers with its own internal cooling system.

Also interesting to note that HP announced their Cloud-enabling data center infrastructure called POD (Performance-Optimized Datacenter) program last month.

I'll keep you posted as I learn more.

Sunday, August 24, 2008

MetaCDN, Cloud based content delivery networks

James Broberg at The University of Melbourne has written a very interesting paper called MetaCDN: Harnessing ‘Storage Clouds’ for high performance content delivery. The idea of the low cost "CloudCDN" has been of particular interest to me and is an ideal use of cloud based storage solutions such as CloudFS and Nirvanix. CloudCDN's look like a potentially very important part of the emerging global cloud computing environment. It's nice to see someone putting some thought into this area.

Abstract:
"Content Delivery Networks (CDNs) such as Akamai and Mirror Image place web server clusters in numerous geographical locations to improve the responsiveness and locality of the content it hosts for end-users. However, their services are priced out of reach for all but the largest enterprise customers. An alternative approach to content delivery could be achieved by leveraging existing infrastructure provided by 'Storage Cloud' providers, at a fraction of the cost. In this paper, we introduce MetaCDN, a system that exploits 'Storage Cloud' resources, creating an integrated overlay network that provides a low cost, high performance CDN for content creators. MetaCDN removes the complexity of dealing with multiple storage providers, by intelligently matching and placing users' content onto one or many storage providers based on their quality of service, coverage and budget preferences. We then demonstrate the utility of this new approach to content delivery by showing that the participating 'Storage Clouds' used by MetaCDN provide high performance (in terms of throughput and response time) and reliable content delivery for content consumers."

Download the paper here > http://www.gridbus.org/reports/meta-cdn2008.pdf

Friday, August 22, 2008

IBM says don't use tape, use the Cloud

IBM is preparing to tell its customers to stop using tape and backup their data to cloud-based IBM Business Resilience data centers. It's spending $300m to build 13 such centers ($23 million per center) around the globe.

Customers will have a Service Delivery Platform in their data centers which automatically backs up data to an offsite IBM Business Resilience data center located in their part of the globe such as Hong Kong; Tokyo, Japan; Paris, France; London, UK; Beijing and Shanghai, China; Izmir, Turkey; Warsaw, Poland; Milan, Italy; Metro Park, New Jersey; Cologne, Germany; Rio de Janeiro, Brazil; Mumbai, India; South Africa; and Brussels, Belgium.

It is calling its new backup to cloud disk offering IBM Information Protection Services. The combination of the IBM storage appliances and Arsenal's software is called a Data Protection Vault.

More details here.

Enomalism Google Group / Discussion List

In my attempt to better organize our Enomlism open source project, we've created a Google group for the discussion of the open source Enomalism Cloud Computing Platform.

Feel free to signup here
http://groups.google.com/group/enomalism

Enomalism's RESTful API

For those of you interested in RESTful cloud computing, we've just published our Enomalism REST API for review.

Each Enomalism REST API call will return the result in JSON format. JSON is a lightweight format that virually every programming language can easily parse.

Most of the URIs that comprise the Enomalism REST API have variable paths. A path takes the
following form:
/segment/segment/segment/
Each segment in a given path has the ability to be dynamically interchangeable. That is, in a given URI, one segment is typically a variable that is processed by the REST controller while the remaining segments are static. This is only the typical case. There could potentially be any number of variable segments in a given URI. REST controllers can also accept HTTP parameters. These look like typical GET parameters or they could also be submitted via an HTML form. This depends on which HTTP methods the controller supports. For eaxample, the end result of submitting an HTML form is issuing an HTTP POST request. If the contoller we are submitting to supports POST methods, great. If not, we will not get the desired result.

Please feel free to download a copy of our API doc at https://enomalism.svn.sourceforge.net/svnroot/enomalism/enomalism_rest_api.pdf

CloudVirt

So it would seem my cloud standardization post has hit a nerve with a few. So I would like to post a some follow on ideas I've had, since yesterday.

First of all I'm also not totally sold on the whether or not cloud computing is ready for a cloud standard just yet. What I do think we need is a reference implementation (Platform & Infrastructure) and common extensible API. "CloudVirt" This API may someday form the basis for a standard, but in the mean times gives us a uniform API to work against., so whether you're using Google App Engine or Force.com, GoGrid or EC2, Nirvanix or S3, you'll have a central point of programmatic contact. I personally don't want to have to rewrite my platform for every new cloud providers API., which is exactly what we're doing now.

Also a few people have point out the CIM (Common Information Model) could be a ideal starting point for a cloud API, for those who are unfamiliar, CIM is an open standard that defines how managed elements in an IT environment are represented as a common set of objects and relationships between them. This is intended to allow consistent management of these managed elements, independent of their manufacturer or provider.

For those interested in joining the discussion, please signup for our cloud computing group.

More ideas to follow.

Thursday, August 21, 2008

The Standardized Cloud

Over the last few weeks I've been engaged in several conversations about the need for a common, interoperable and open set of cloud computing standards. During these conversations a recurring theme has started to emerge. A need for cloud interoperability or the ability for diverse cloud systems and organizations to work together in a common way. In my discussion yesterday with Rich Wolski of the Eucalyptus project he described the need for a "CloudVirt" API similar to that of the Libvirt project for virtualization. For those of you that don't know about libvirt, it's an open source toolkit which enables a common API interaction with the virtualization capabilities of recent versions of Linux (and other OSes).

I would like to take this opportunity to share my ideas as well as get some feedback on some of the key points I see for the creation of common cloud computing reference API or standard.

* Cloud Resource Description
The ability to describe resources is (in my opinion) the most important aspect of any standardization effort. One potential avenue might be to use the Resource Description Framework proposed by the W3C. The Resource Description Framework (RDF) is a family of specifications, originally designed as a metadata data model, which has come to be used as a general method of modeling information through a variety of syntax formats. The RDF metadata model is based upon the idea of making statements about Web resources (or Cloud Resources) in the form of subject-predicate-object expressions, called triples in RDF lingo. This standardized approach could be modified as a primary mechanism for describing cloud resources both locally and remotely.

* Cloud Federation (Cloud 2 Cloud)
The holy grail of cloud computing may very well be the ability to seamlessly bridge both private clouds (datacenters) and remote cloud resources such as EC2 in a secure and efficient manor. To accomplish this a federation standard must be enabled. One of the biggest hurdles to over come in federation is the lack of clear definition to what federation is.

So let me take a stab at defining it.

Cloud federation manages consistency and access controls when two or more independent geographically distinct clouds share either authentication, files, computing resources, command and control or access to storage resources. Cloud federations can be classified into three categories: peer-to-peer, replication, and hierarchical. Peer 2 peer seems to be the most logical first step in creating a federation spec. Protocols like XMPP, P4P and Virtual Distributed Ethernet may make for good starting points.

* Distributed Network Management
The need for a distributed and optimized virtual network is an important aspect in any multi-cloud deployment. One potential direction could be to explore the use of VPN or VDE technologies. My preference would be to use VDE, (Virtual Distributed Ethernet). A quick refresher, a VPN is a way to connect one or more remote computers to a protected network, generally tunnelling the traffic through another network. VDE implements a virtual ethernet in all its aspects, virtual switches, virtual cables. A VDE can also be used to create a VPN.

VDE interconnects real computers running (through a tap interface), virtual machines as well as the other networking interfaces through a common open framework. VDE supports heterogeneous virtual machines running on different hosting computers and could be the ideal starting point. Network shaping and optimization may also play an important role in the ability to bridge two or cloud resources.

Some network optimization aspects may include;
  • Compression - Relies on data patterns that can be represented more efficiently.
  • Caching/Proxy - Relies on human behavior , accessing the same data over and over.
  • Protocol Spoofing - Bundles multiple requests from chatty applications into one.
  • Application Shaping - Controls data usage based on spotting specific patterns in the data and allowing or disallowing specific traffic.
  • Equalizing - Makes assumptions on what needs immediate priority based on the data usage.
  • Connection Limits - Prevents access gridlock in routers and access points due to denial of service or peer to peer.
  • Simple Rate Limits - Prevents one user from getting more than a fixed amount of data.
* Memory Management
When looking at the creation of compute cloud memory tends to be a major factor in the performance of a given virtual environment, whether a virtual machine or some other application component. Cloud memory management will need to involve ways to allocate portions of virtual memory to programs at their request, and freeing it for reuse when no longer needed. This is particularly important in "platform as a service" cloud deployments.

Several key memory management aspects may include;
  • Provide memory space to enable several processes to be executed at the same time
  • Provide a satisfactory level of performance for the system users
  • Protect each program's resources
  • Share (if desired) memory space between processes
  • Make the addressing of memory space as transparent as possible for the programmer.

* Distributed Storage
I've been working on creating a cloud abstraction layer called "cloud raid" as part of our ElasticDrive platform and have been looking at different approaches for our implementation. My initial idea is to connect multiple remote cloud storage services (S3, Nirvanix, CloudFS) for a variety of purposes. During my research the XAM specification began to look like the most suitable candidate. XAM addresses storage interoperability, information assurance (security), storage transparency, long-term records retention and automation for Information Lifecycle Management (ILM)-based practices.

XAM looks to solve key cloud storage problem spots including;
  • Interoperability: Applications can work with any XAM conformant storage system; information can be migrated and shared
  • Compliance: Integrated record retention and disposition metadata
  • ILM Practices: Framework for classification, policy, and implementation
  • Migration: Ability to automate migration process to maintain long-term readability
  • Discovery: Application-independent structured discovery avoids application obsolescence


Potential Future Additions to the API

* I/o
The virtualization of I/O resources is a critical part of enabling a set of emerging cloud deployment models. In large scale cloud deployments a recurring issue has the ability to effectively management I/o resources whether on a machine level or network. One of the problems a lot of users are encountering is that of the "nasty neighbor" or a user who has taken all available system I/o resources.

A common I/o API for sharing, security, performance, and scalability will need to be addressed to help resolve these issues. I've been speaking with several hardware vendors on how we might be able to address this problem. This will most like have to be done at a later point after a first draft has been released.

* Monitoring and System Metrics
One of the best aspects of using cloud technology is the ability to scale applications in tandem to the underlying infrastructure and the demands placed on it. Rather then just scaling on system load, users should have the ability to selectively scale on other metrics such as response time, network throughput or other metrics made available. Having a uniform way to interact with system metrics will enable cloud providers and consumers a common way to scale applications.

Security & Auditability.
In my conversations with several wall street CIO's the questions of both security and cloud transparency with regards to external audits has come up frequently.
---
My list of requirements is by no means a complete list. Cloud computing encompasses a wide variety of technologies, architectures and deployment models. What I am attempting to do is address the initial pain points whether you are deploying a cloud or just using it. A lot of what I've outlined may be better suited to a reference implementation then a standard, but none the less I thought I'd put these out ideas out for discussion.

-- Updates --
1. Looks like I've forgotten an obvious yet important aspect to my cloud standards. Authentication. Maybe something like OAuth or OpenID could form the basis for this as well. I'll need to do some more thinking on this one.

Tuesday, August 19, 2008

Microsoft's Cloud Thickens

Microsoft's Cloud Computing plot thickens. A new white paper, sponsored by Microsoft and written by Dave Chappell, (not the comedian) clears some of the haze around what Microsoft is planning to unveil at this October’s Professional Developers Conference.

I found the last paragraph of the document the most telling.
Cloud platforms aren’t yet at the center of most people’s attention. The odds are good, though, that this won’t be true five years from now. The attractions of cloud-based computing, including scalability and lower costs, are very real. If you work in application development, whether for a software vendor or an end user, expect the cloud to play an increasing role in your future. The next generation of application platforms is here. (And Microsoft knows it.)
Read the report here
http://www.davidchappell.com/CloudPlatforms--Chappell.pdf

New Xen Features Outlined

New Xen features outlined for 3.3 release.

CPU Portability
Xen 3.3 now enables administrators to move active virtual machines from one server to another independent of various CPU virtualization support. This new feature offers greater flexibility in heterogeneous server farms with various CPU virtualization versions.

Green Computing
Xen 3.3 takes advantage of the latest hardware support for power consumption monitoring and reduction by intelligently powering down components within an individual processor. Power savings are also gained by offering virtualization solutions built on Xen the ability to manage servers and server farms for greater power savings.

Security
Xen 3.3 delivers new solutions to better secure virtual machine start-up as well reduce possible hacking opportunities by moving critical management processing out of global space into separate virtual sessions.

Performance & Scalability
Xen 3.3 significantly improves the already impressive Xen performance by offering new memory access algorithms to reduce system wait time during critical memory requests and new scanning technology to optimize framebuffer searches. Several scalability enhancements were also implemented including 2MB page support for EPT/NPT.

Also interesting is the updated list of the major Xen contributors: Intel, AMD, HP, Dell, IBM, Novell, Red Hat, Sun, Fujitsu, Samsung, and Oracle.

More details
http://blog.xen.org/wp-content/uploads/2008/08/xen_33-datasheet.pdf

Chief Uptime Officer

I love these kinds of press releases, Rackscape has created a new executive position called the Chief Uptime Officer.

According to an email to the WHIR, Mosso says confidence in the reliability of cloud computing is a critical issue, especially with some of the recent high-profile outages we've seen lately, like Amazon's web services outage earlier this year and by having a chief uptime officer -- a role the company believes every cloud provider should have -- Mosso is ensuring that it is able to meet its 100 percent uptime service level agreement, 100 percent of the time.

The position was given to Bruce Runyan who served as VP of operations and customer care at open source solutions provider Fonality. While at Fonality, Runyan scaled service and manufacturing capacity while simultaneously reducing operating costs. Interesting to note another former Fonality alumni Adrian Otto is also key architect over at Mosso.

More details here > http://www.thewhir.com/marketwatch/081908_Mosso_Appoints_Chief_Uptime_Officer.cfm

Friday, August 15, 2008

Microsoft to allow VMs migration with one Windows license?

NetworkWorld is reporting some interesting news, on August 19 Microsoft will reduce the license demand to perform virtual machines migration.

Under current Microsoft rules, software running on a virtual machine is licensed based on the physical server. This can be problematic because of technologies such as VMware's VMotion, which can move virtual machines from one physical server to another without causing downtime. If customer that wants to migrate a Windows guest operating system from one physical host to another (for example through the VMware VMotion technology) has to have two licenses of the operating system, one for each location. The current policy says that a customer needs to wait 90 days before migrating its licensing from one physical server to another.

"As server virtualization becomes more mainstream, Microsoft will be announcing new licensing and support policies to help customers make their data centers and enterprise IT more dynamic on August 19," Microsoft said.

If this is true the ability to migrate not only between machines but also between Microsoft centric clouds that will enable a world of opportunities for companies looking to take advantage of cloud computing.

Thursday, August 14, 2008

The Next Big Thing: Cloud Attached Storage

Following up from my previous post on Amazon's Elastic Storage service for EC2. There appears to be a growing interest in what I'm calling "Cloud Attached Storage". Earlier this week Nirvanix announced a deal with Silicon Mountain Holdings (a company I've never heard of) to embed their CloudNAS™ with SMH's storage hardware focused on the SMB market.

Although the deal on the surface appears to be more of a PR gimmick then any real business opportunity it does highlight a growing shift toward a hybrid cloud model where network attached storage hardware becomes a kind of gateway to a variety of offsite cloud services.

This quote by Gartner, caught my eye.

"The confluence of cloud storage and device storage in the home and office is a definite trend," said Adam Couture, principal research analyst at Gartner. "Endpoint devices can be lost or stolen. Storage arrays and local backup devices are susceptible to data corruption or catastrophic events such as fire or flood. Makers of these devices are beginning to understand the need to protect customer data off site. We think that is a trend that will become increasingly common."

I've been a big proponent of this type of "cloud attached storage" for while and is one of the primary reasons we created our ElasticDrive platform. (A large portion of our interest is coming from OEM) Contrary to some popular cloud advocates, the shift toward cloud computing won't be a "big switch" but a gradual migration. The combining of traditional NAS hardware with the option to automatically and securely replicate to the cloud presents a huge opportunity to hardware and software vendors alike.

Wednesday, August 13, 2008

Google Insights: Cloud computing vs Grid computing

Alistair Croll over at bitcurrent has pointed out a new service provided by google called insights. With Google Insights you can compare search volume patterns across specific regions, categories, and time frames. So I thought I'd take the opportunity to compare Cloud computing vs Grid computing and found some interesting trends.

First of all it seems that Grid computing has been on a general downward trend for 4 years.

  • Four years ago Grid computing had a comparitive ranking of 100 vs cloud 0.
  • Two years ago 33 vs 0.
  • One year ago 26 vs 1.

October 2007 seems to have been tipping point.

  • October 2007 the comparision was 25 vs 10

In April of 2008 Cloud computing surpases grid in search interest for the first time

  • April 2008 the was 27 vs 28

The increase from April to July has been stagering.

  • July 2008 - 21 vs 100

In less then two years cloud computing has managed to surpass the highest levels of interest in comparision to grid computing.

Take a look at http://www.google.com/insights/search/#cat=&q=grid%20computing%2Ccloud%20computing&geo=&date=&clp=&cmpt=q

Tuesday, August 12, 2008

Amazon Block Storage coming very soon?

Last week I received an email from Amazon indicating that their new persistent storage offering may be released to the public in a matter of days.

Several blogs are also reporting that Amazon's Elastic Block Store (EBS) is about to go live. From the various posts the service seems to be aimed at simple VM attached storage volume. These volumes can be thought of as raw, unformatted disk drives which can be formatted and then used as desired (or even used as raw storage if you'd like). Volumes can range in size from 1 GB on up to 1 TB; you can create and attach several of them to each EC2 instance. They are designed for low latency, high throughput access from Amazon EC2. Needless to say, you can use these volumes to host a relational database.

I can't wait to try this out. For some unknown reason, we haven't been invited into the beta, so like everyone else we will be eagerly awaiting access.

Here's the original email.
--------
This is a notice to let you know that Amazon EC2 instance(s) associated with your account are operating on an older software version that will not be able to take advantage of some upcoming new features. The affected instances are listed below.

In the coming weeks, Amazon EC2 will be launching a new persistent storage offering. This is an advanced alert that the instances listed will not be able to take advantage of this new feature. Other instances that are not listed will be able to take advantage of it.

For more information about the persistent storage offering, please see the following blog posts:
http://www.allthingsdistributed.com/2008/04/persistent_storage_for_amazon.html
http://aws.typepad.com/aws/2008/04/block-to-the-fu.html

Back that *aaS up.

With Google, Mobileme and other cloud services going down recently. It would seem my post last month "the offline cloud" is now more relevant then ever. For the first time in as long as I can remember, Google's Gmail went down. For the hour or so it was not available, neither was I. As a sat and stared blackly at my screen, I realized that I hadn't download my google IMAP to my desk in several weeks. I'm wondering if there are services that can enable me to download my google apps data to a secondary location, like S3 or Nirvanix.

For Gmail, a simple way I've managed to backup my gmail data is by installing Zimbra on an EC2 instance and syncing my gmail IMAP account. This is a simple way I can always have a secondary copy of my email available through a snazzy ajax interface. I'm still trying to figure out a way to backup my blog, google docs, and other assorted cloud services. Guess I'll need to read the google API.

What are others doing?

reuven

Sunday, August 10, 2008

Why dell / Rackspace is a bad idea

I just received this from an anonymous source. I appear to not be the first to consider the idea.
--------------
The Dell / Rackspace tie up could be very difficult for many reasons:
- The two cultures could not be more opposite. Dell is very structured as any multi billion company must be and focused on financial performance. Rackspace is focused on fanatical support at all costs. The financial focus and the fanatical support focus would likely clash hard.

- Rackspace excels at providing services and thinks and breathes from a services perspective everyday. Dell which provides many services, is probably still best at providing products which produce revenue today. Dell revenue expectations would likely reward product revenue which is recognized today, not services revenue which is recognized over time. A product driven organization like Dell may not have the patience it takes for a services organization like Rackspace to ramp revenue over time.

- Will Dell be more interested in figuring out how a hosting business can help them drive more box sales, or will they see it as a new revenue source to grow. If Dell takes the former position, they will likely suboptimize the hosting business in order to drive box sales, as opposed to driving the hosting business as a new revenue source in its own right.

- Keep in mind that Dell had a hosting business before called Dell Hosting which for whatever reason was sold. Based on that experience, Dell may not have much interest in covering that ground again.

- Rackspace is very capital intensive and according to internet reports is looking to spend $335 million this year on equipment, data centers, and office space - far more than the $187 million they just raised in the IPO by the way, but that is a different issue. As they continue to grow, that is likely just a drop in the bucket compared to future capex needs for expanded data center and equipment needs globally. Dell is really more of an assembler and is not very capital intensive. By all accounts, Dell is very careful when it does spend capital and does so sparingly. This difference in capital spending philosophy could drive lots of friction.

- Despite the current vendor/customer relationship between the two, how good is the relationship? HP, NetApp and others have made joint announcements with Rackspace in the recent past. Maybe this means nothing, but if the Dell relationship was great, seems like those announcements would have been with Dell, not other vendors.

Seems like a good idea on the surface, but could be full of trouble.

DellSpace: Should Dell buy Rackspace?

After a relatively lackluster Rackspace IPO on Friday, it would seem that Rackspace is in a rather difficult spot. The RAX stock is trading at a fairly inexpensive $10.01 a share as of the end of Friday with a market capitalization of $1.02 billion. There has been a growing buzz in the cloud computing community that Rackspace with their large corporate hosting customer base may make for an ideal acquisition by a hardware vendor looking to adjust for the coming shift toward internet centric computing, aka cloud computing.

So who? One potential suitor may be Dell, located just down the road in Austin. A Dell + Rackspace (Dellspace) merger / acquisition would be a perfect match for a number of reasons. A long history, Dell is already a Rackspace hardware partner, and prominently displayed on the Rackspace homepage, so there is an existing relationship. On the Dell side, they recently stated they plan on getting into cloud computing in a big way, which for the most part hasn't resulted in anything tangible other then a professional services play and trademark fiasco.

There are a variety of competitive pressures on Dell. Companies such as Apple, HP and IBM are all in the midst of rolling out their own cloud computing offerings. There is a rumor that IBM is looking to open up its internal Research Compute Cloud to the public as an Amazon EC2 service clone possibly by the end of the year. Dell is now under pressure to compete in areas such as hosted cloud infrastructure, platform services as well as professional services. All are areas that Apple, HP & IBM have done a good job at establishing themselves in or at the very least the have a visible head start.

In order for Dell to continue to be competitive they will need to adjust for the shift towards a future where they may not be able to only count on selling hardware, but providing computing resources through a variety of hosted "as a services" offerings. Rackspace is the perfect enabler of this network enabled future. Simply put, Dell needs to look and act more like Apple and I think they are beginning to realize this fact of modern IT life.

I should also note that other obvious Rackspace suitors may include Cisco or even Sun, but that's a conversation for another day.

(In full disclosure, I own neither Dell nor Rackspace stock, but am friendly with both firms.)

Cloud Wars: Russia's Cyber Botnet Army

Slashdot as well as several sources are reporting that Russia has unleashed a full cyber attack using it's newly created, government sponsored compute cloud (botnet) on The Government of Georgia. What's more, word on the size of this botnet is supposedly no more then a 10,000 - 20,000 nodes. I'm also told this private "botnet" is composed primarily of AMD machines and utilizes a similar architecture to another military botnet reportedly built in a shady middle east country, which leads me to believe they maybe developed by the same unknown source. If this is true it's amazing that so few machines can have such dramatic results.

I received the following note from a few contacts last night and I have had confirmation from some eastern European programmers I know.

"Many of Georgia’s internet servers were under external control from late Thursday, Russia’s invasion of Georgia commenced on Friday. It is further requested of any blog reader the information below is further relayed to the International Press and Community to ensure awareness of this situation. Also as much of Georgia’s cyberspace is now under unauthorized external control the following official press statement is circulated without modification."

It seems Russia may becoming the arms dealer of choice for the creation of military botnets. This may be just one of many more attacks to come in this new world of Nework Centric Warfare. Maybe it's time for the UN to stop talking about Network centric peace keeping and start doing something about it. (United Nations Network Centric Operations, UNNCO)

More details:
http://rbnexploit.blogspot.com/2008/08/rbn-georgia-cyberwarfare.html

Friday, August 8, 2008

Hacking Xen

I just got this link from Chris Sears on the Google Cloud Computing Group. There seems to be an upcoming presentation at the Blackhat conference on how to hack Xen. Let's hope Amazon reviews the presentation before any hacker do.

The presentation claims to demonstrate the following Xen vulnerabilities/exploits:

- practical ways to stealthly use DMA to control all physical memory
- Xen loadable backdoor modules framework - description of a set of tools allowing to easily load compiled C code into Xen hypervisor (similarly to how Linux kernel modules work)
- implementation of a backdoor residing in hypervisor space (so, invisible from the hosted operating system), allowing for remote commands execution
- implementation of a backdoor residing in a hidden, unprivileged domain, allowing for remote commands execution in dom0

https://www.blackhat.com/html/bh-usa-08/bh-usa-08-speakers.html#Wojtczuk

Thursday, August 7, 2008

Grid is Dead

I've been in New York City for an astounding 16 meetings in two days. My meetings included a show tell with a few cloud startups to VIP passes to the Kayne West concert last night at Madison Square Gardens. (Yes, cloud computing geeks count as VIP's now). It certainly has been an interesting few days.

A particularly interesting discussion earlier today was with the director of grid infrastructure for a major wall street bank. The conversation ranged from network optimization, the pros and cons of map/reduce to the importance of utilization. During our discussion I couldn't help but think that the traditional single tenant grid infrastructure was dead and that the future lied in the use of flexible and adaptive compute clouds.

Why? It's all about utilization rates. It seems they don't actually paint an accurate picture of a grids computational performance. Typically when a bank attempts to justify why virtualizaton isn't useful in their grid deployments, they point to their utilizations numbers. The common perception is that if their grid is running at 95% utilization, then virtualization isn't going to improve their overall performance, so why bother. But it would seem that utilization numbers don't effectively show the over all system efficiencies or more imporantly inefficiencies. What they do seem to show is that of the utilization of the CPU resources and do little to address areas such as network shaping and I/o optimization which appear to have dramatic impact on overall grid performance.

One of the more exciting aspects of cloud / virtualized grid deployments are in the way you can consolidate or cluster workloads into parallel per machine processes. Sometimes it makes more sense to put 4 VM's on a quad core machine then it does to spread them onto 4 physical machines. This can be particularly important in the rendering of many smaller jobs, that relate to one another. Think risk analysis where a big limitation may be in how quickly you can reassemble the completed jobs. Orders of improvement may only be a fews miliseconds or less, but the savings provided by consolidating the job on to multiple VMs on a single server could be a really big deal multiplied across a grid of 35,000 machines. This type of optimization could mean seconds off the overall risk analysis time and potentially millions of dollars in new investment opportunities.

The problem with traditional grid workload schedulers is they don't make a distinction between a physical or virtual machine. The big opportunities for grid & cloud computing is not just the ability to optimize for scale, but to adaptively optimize for system metrics you never knew you had, until now.

Tuesday, August 5, 2008

AT&T's Vapor

Following the lead of HP, Yahoo, Intel, Sun and others last week, AT&T announced today that they too are going to be entering the "cloud computing" market with a cloud of their very own. The press release was light on details other then their intent to enter the market. Verizon has also stated they plan on offering cloud services and have partnered with Desktone to offer cloud desktops to their customers. BT, FT, and DT in Europe all have cloud projects in the works as well as China Telecom. Seems the telecom biz is quickly becoming a poster child for cloud computing. Which is great given their regionalized positions.

Regardless of whether or not AT&T offers anything of substance, the release can only help maintain a level of cloud hype already in full effect. Having the ability to geographically scale will be made a lot easier if every country's telecom provider offers localized cloud services. So for this reason alone I'm very excited by the various announcements.

More details:
http://www.reuters.com/article/marketsNews/idUSBNG2305620080805

Monday, August 4, 2008

The Cisco Cloud

With a variety of rumors flying around about an EMC acquisition by Cisco, I thought I'd pose a question. Why isn't Cisco doing more in the cloud infrastructure space?

Cloud computing can be simply defined as a form of network computing. That being said, Cisco is arguably one of the more technically advanced in the networking space, and EMC in storage & infrastructure. Cisco + EMC would seem to be a match made in heaven.

Douglas Gourlay - Senior Director of Marketing and Product Management of the Cisco Data Center Business Unit, recently posted some cloud computing ideas on his blog.

Some of his more interesting points included:
  • "Enterprises will build mini-clouds."
  • "Service Providers will move into higher revenue cloud models."
  • "Hypervisors will become THE way of defining the abstraction between physical and virtual within a server."
  • "Service Providers will scale their cloud managed application/hosting/hypervisor offerings out initially by taking ‘low hanging fruit’ applications."
  • "IP Addressing will move to IPv6."
  • "Workload portability between Enterprise Clouds and Service Provider Clouds."
  • "The Value of Virtualization is compounded by the number of devices virtualized."
  • "Someone will write a DNS or a DNS Coupled Workload exchange."
  • "Skynet becomes self aware."
Complete post: http://blogs.cisco.com/datacenter/comments/a_cloudy_day/

Here are few of unsolicited ideas for The Cisco Cloud.

1. Cloud VLAN - Adaptive virtual in cloud networking
2. Cloud Federation - Combine a router/switch with a distributed federated command and control bot for in data center cloud management
3. Wide Area Cloud - VPN services for globally disperse cloud partitioning, management, migration and security.
4. Network / WAN optimization - The network is the biggest limitation in most cloud environments, own the network, own the cloud.

It would seem inevitable that Cisco will jump into cloud computing, I think the question is when, not if.

Friday, August 1, 2008

Enomalism - Sourceforge project of the month

I'm happy to announce that the Enomalism Elastic Computing platform is sourceforge's project of the month for August.

For anyone interested in learning more about our project, please check out my interview at > http://sourceforge.net/community/index.php/potm-200808/

Don't forget the Sun Screen

Sun appears to want to be included in the cloud buzz this week. They have announced, (umm) leaked that their network.com will officially be spun out into an official "cloud unit". I'm not really sure why or if this is even news worthy, But I thought I'd post it.

The register has a pretty good overview, my favorite tidbit includes;
"Sun executives have been in meetings with both Douglas and Schwartz to try and pin down exactly what the new business will do and how it'll operate, as there's no single, clear definition of what "cloud" really is."

Good luck creating a business when you don't know what your product is. (Give me a call and I'll give you my two cents on clouds) On planet earth, the Sun is a crucial part of cloud formation, problem is if you stare too long you may get blinded.

Some more details here > http://www.theregister.co.uk/2008/07/31/sun_utility_computing_spin_out/

Dell trademarks "Cloud Computing" TM

I just received an interesting piece of news from our cloud computing group. It seems that Dell has trademarked the term "cloud computing" more then a year after it first became commonly used.

More details here >
http://groups.google.com/group/cloud-computing/browse_thread/thread/1e14463d678a38f5

#DigitalNibbles Podcast Sponsored by Intel

If you would like to be a guest on the show, please get in touch.

Instagram