Tuesday, August 26, 2008

Is MapReduce going mainstream?

It's been an interesting summer for Google's MapReduce software paradigm. I'm not going to get into the finer details of MapReduce, the general idea is its Google's magic sauce, basically it's what lets them run their massively distributed data sets. So any company that wants to be like Google or needs to compete with Google should pay attention to MapReduce.

Last month Intel, HP and yahoo announced a joint research program to examine it's usage and now today Greenplum, a provider of database software for the what they describe as "next generation of data warehousing and analytics", announced support for MapReduce within its massively parallel database engine.

Greenplums announcement to integrate MapReduce functionality into its enterprise focused database is an important step toward taking MapReduce out of academic research labs and moving it to lucrative corporate users.

To give you some background, currently the two most popular implemtations of MapReduce are the open source Apache Hadoop project and unfortunately named Pig project. For those of you who don't know about about Hadoop, it is an open source platform for distributing, managing and then collecting computing work throughout a large computing cloud using MapReduce. Pig, a Yahoo Research project currently being incubated at Apache, is a language designed to make using the Hadoop infrastructure effectively. It has been described as SQL for MapReduce, allowing queries to be written and then parallelised and run on the Hadoop platform.

I found this quote interesting, it was mentioned in Greemplums press release.

"Greenplum has seamlessly integrated MapReduce into its database, making it possible for us to access our massive dataset with standard SQL queries in combination with MapReduce programs," said Roger Magoulas, Research Director, O'Reilly Media. "We are finding this to be incredibly efficient because complex SQL queries can be expressed in a few lines of Perl or Python code.

Also interesting to note that earlier this year IBM released an Eclipse plug-in that simplifies the creation and deployment of MapReduce programs. This plug-in was developed by the team at IBM Software Group's High Performance On Demand Solutions Unit at the IBM Silicon Valley Laboratory. So it may be a matter of time before we see MapReduce commercially offered by IBM.

So what's next? Will we see a Microsoft implementation or an Oracle MapReduce? For now, MapReduce appears to be the new "coolness" and with all the industry attention it seems to be getting I think we may be on the verge of finally seeing MapReduce enter the mainstream consciousness.

As a side note, my favorite MapReduce implementation is called Skynet. The name says it all.

#DigitalNibbles Podcast Sponsored by Intel

If you would like to be a guest on the show, please get in touch.