Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice. But 10 million tweets? That's a few days worth, what's the point?


The point is to be a sort of Google algorithm for Twitter. This is plenty of data to at least get a very solid idea of who the top Tweeters are based on their connections, influence, and popularity.

Also, keep in mind he scraped the TOP Twitter users (those with X+ followers). A lot of Twitters tweets likely come from those under that threshold, saving time, storage space, and effort.


There's another batch coming of tweets off the data mining feed. But yeah: the focus here was on the graph structure more than the text. We're also hoping someone pipes up with "oh gee I have 750m tweets archived do you think anyone else wants to look at them?"


I downloaded it and they date back to 2006. I guess users who only posted a few times have all their old tweets indexed, whereas those with many tweets only have the latest ones in there (i.e. X tweets each collected max).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: