Thursday, November 27, 2008

The Industrial Revolution of Data

As I watch the reports from Mumbai come in across my various social feeds, blogs, twitter, Facebook, flickr etc. I can't help but think one of the biggest opportunities for the next generation of news providers is that of data mining the massive amount of information being feed through the Internet. When a big news story breaks, it is now much more likely that information will be delivered through an army of citizen journalists using mobile phones and social media services then by traditional means.

In a recent post "The Commoditization of Massive Data Analysis" on O'Reilly Radar, Joe Hellerstein, described what he called "The Industrial Revolution of Data" His post went on to say "we are starting to see the rise of automatic data generation "factories" such as software logs, UPC scanners, RFID, GPS transceivers, video and audio feeds. These automated processes can stamp out data at volumes that will quickly dwarf the collective productivity of content authors worldwide. The last step of the revolution is the commoditization of data analysis software, to serve a broad class of users."

Although for most of the post Joe seems to get too caught up in the finer technical details, I think he was onto a general trend toward the large scale commoditization of data analysis. In someways I also think he misses one of the bigger opportunities for large scale data analysis, that of the social cloud.

Traditionally, business analysts have used data mining of large data sets such as credit card transactions to determine things like risk, fraud and credit scores. More recently live Internet data feeds (Facebook, Twitter, FriendFeed, etc) have become much more common place, enabling the ability to do large scale realtime knowledge discovery. For me this is a remarkable opportunty, think about how google maps revolutionized the geospatial industry by putting satellites imagery and geocentric information into the hands of everyday people. Similarly we now have the ability to do this for within several other industry verticals.

As the volume of social data increases I feel we will soon see the emergence of "social knowledge discovery services". These services will give the ability for anyone to spot trends, breaking news as well as threats (economic, physical or otherwise) in real time or even preemptively.

One such example is Google Trends, a service that has taken this concept to the next level by aggregating statistical trends that actually matter. Google's Flu trends service, which tracks certain search terms that are indicators of flu activity in a particular area of the US. The Google Flu service uses aggregated Google search data which helps to estimate flu activity in "your" region up to two weeks faster than traditional systems. What's more, this data is freely available for you to download and remix. Think about it, using this sort of data at a small local pharmacy could be enabled so that they could stock up on flu related products two weeks before a major outbreak occurs.

The next big opportunity will be in "data mining" all the unstructured data found within the "social web" of the Internet. The emergence of social knowledge discovery services will enable anyone to identify trends within publicly available social data sets. Through the use of sophisticated algorithms such as google's map/reduce, we now have the opportunity to identify key social patterns and preemptively target those opportunities.

#DigitalNibbles Podcast Sponsored by Intel

If you would like to be a guest on the show, please get in touch.