Real-time seems to be the next big thing in big data. Map-Reduced has shown how to perform big analyses on huge data sets in parallel, and the next challenge seems to be to find a similar kind of approach to real-time.
When you look around the web, there are two major approaches out there which try to building something which can scale to deal with Twitter-firehose-scale amounts of data. One is starting with a MapReduce framework like Hadoop and somehow finagle real-time or at least streaming capabilities on it. The other approach starts with some event-driven “streaming” computing architecture and makes it scale on cluster.
These are interesting and very cool projects, however from our own experience with retweet analysis at TWIMPACT, I get the feeling that both approaches fall short of providing a definitive answer.
In short: One does not simply scale into real-time.
-
celebrateswe liked this
-
christinaert liked this
-
determinesek6 liked this
-
resorts69 liked this
-
emileeyou89 liked this
-
twimpact posted this