Real-time seems to be the next big thing in big data. Map-Reduced has shown how to perform big analyses on huge data sets in parallel, and the next challenge seems to be to find a similar kind of approach to real-time.

When you look around the web, there are two major approaches out there which try to building something which can scale to deal with Twitter-firehose-scale amounts of data. One is starting with a MapReduce framework like Hadoop and somehow finagle real-time or at least streaming capabilities on it. The other approach starts with some event-driven “streaming” computing architecture and makes it scale on cluster.

These are interesting and very cool projects, however from our own experience with retweet analysis at TWIMPACT, I get the feeling that both approaches fall short of providing a definitive answer.

In short: One does not simply scale into real-time.

Read the whole post on Mikio’s blog

  1. twimpact posted this

Development blog for TWIMPACT
beta.twimpact.com

Members
Mikio Braun
Leo Jugel

twitter.com/twimpact

view archive