February 2012
1 post
1 tag
"Data Science" stack on delicious →
Mikio has created a data science stack on delicous (basically a link collection). We’ll try to add data science related articles there.
Feb 14th
January 2012
1 post
1 tag
Not the real TWIMPACT...
You might have stumbled upon the paper “Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact” by Gunther Eysenbach in which a measure called “twimpact” is proposed. Unfortunately, this work has nothing to do with us, and as it seems, his paper also contains some methodological flaws. One can...
Jan 10th
December 2011
3 posts
2 tags
Dec 13th
Dec 8th
3 tags
Some numbers on our NIPS demo
So here are is a bit of background information on the data processing we’re doing for our NIPS demo. We’re currently reanalyzing the retweet trends for all of 2010. We cannot afford the firehose (but really, who can?), but the normal stream API gives more than enough data. It seems to be capped by about 50 tweets per second, but this still gives about 4.3 million tweets per day. The...
Dec 1st
2 notes
November 2011
4 posts
First stage of our analysis of 2011 is shortly coming to an end. So far, we’ve analyzed about 1.3 billion tweets… .
Nov 30th
1 tag
Nov 26th
3 tags
Some insights from hunting for memory leaks
One of the main design decisions with our current approach to analyzing retweet activity on Twitter data is to keep all the “hot” data in memory while simultaneously bounding the amount of data we are willing to keep. This makes sense as only a tiny fraction of tweets are retweeted more than once at all, and you somehow have to bound the amount of “live” data to ensure that...
Nov 25th
2 notes
3 tags
Three weeks to go for our NIPS demo
We’re preparing a reanalysis of all of our data from 2011 to bring to Granada. The reanalysis works in two phases: First retweets are analyzed sequentially for the whole year. This cannot be parallelized well as you need to know what happened so far to match retweets correctly (we’re also matching retweets which are not generated by Twitter but by people using the “RT”...
Nov 24th
October 2011
2 posts
8 tags
Oct 11th
6 notes
4 tags
We have been accepted for the Demonstration track at NIPS 2011. We plan to do a real-time demo as well as bringing a whole year worth of history with us so that you can go back in time and relive the key events of last year on twitter. Looking forward to see you in Granada!
Oct 4th
9 notes
September 2011
1 post
3 tags
bit.ly has an interesting post on link lifetimes... →
Sep 7th
2 notes
August 2011
3 posts
Aug 26th
5 tags
Last days Virginia earthquake in the hashtag cloud
Yesterday, there was a minor earthquake in Virginia at about 1:51pm local time. Within minutes, the earthquake became the dominant topic on Twitter. In the following, we track the development of this topic based on a real-time analysis of hashtag activity on Twitter. Size of the node represents activity of the hashtag in retweets, links are set if hashtags occur in the same tweet. The hundred most...
Aug 24th
Aug 24th
July 2011
5 posts
1 tag
Hey, Google+ is not world peace.
(a.k.a “the second inevitable Google+ post”) Some people start behaving like Google+ is the second coming of Christ, a cure against cancer and world peace all rolled into one. Mike Elgan has gone on a “Google+ diet” and is redirecting all his communication (including email!) to Google+. Other bloggers (e.g. Kevin Rose) have shut down their blog completely, redirecting...
Jul 19th
1 note
4 tags
The Inevitable Google+ Post
Google has really pulled it off this time. Googler Paul Allen has estimated that Google+ has already surpassed ten million users on July 10. Google has played the “closed beta” game very well, letting in only a small number of people who nevertheless started to flood the internet with posts about Google+, comparing it to Facebook and Twitter, evaluating it, sometimes dismissing it,...
Jul 13th
1 note
3 tags
Twitter acquires Backtype
Twitter recently acquired backtype, a social media analysis company. Backtype provided all kinds of analysis and metrics to help companies analyze their performance on Twitter. With this acquisition, Twitter is continuing to incorporate services and know-how which has so far been provided by third parties. Previous examples were Summize for the real-time search back in 2008, TweetDeck for its...
Jul 7th
2 notes
3 tags
Google suspends real-time search
A lot of things happened that last week. Google opened their new service Google+ to a small set of people (don’t bother getting an invite now, they seem to have closed down the set of users for now), and also changed their layout for the search page and their calendar. As a “side effect”, Google real-time search is apparently gone. Mashable confirms this in an article and gets...
Jul 6th
3 tags
Some Twitter visualizations
Twitter is always good for some nice visualizations here are some examples: National Geographic has a visualization of tweets during the revolution in Egypt which shows the effect of the Internet shutdown during that period. Social Flow shows how the news of Bin Laden’s death has spread throughout Twitter. Twitter has an example of the spread of information during the earthquake in...
Jul 5th
1 note
May 2011
3 posts
May 25th
Slides: Cassandra - An Introduction →
Last Friday, I gave a talk at LinuxTag Berlin on our experiences with Cassandra. I put up a post with two links to the German and English version of the slides.
May 16th
May 6th
April 2011
3 posts
1 tag
Apr 12th
1 tag
Apr 5th
1 tag
Apr 5th
March 2011
1 post
1 tag
Some Links to Posts on Twimpact and Cassandra
Here are some links from our blogs to get you started: What’s happening over at twimpact Some tips on using Cassandra Cassandra Garbage Collection Tuning Twitter Changes Its API Rules, Makes Sure Monetization Strategy Works Out Following #egypt on Twitter Spike on arrest of Al-Jazeera Journalists How to use egypt.twimpact.com updated arab revolt trends with location detection
Mar 23rd