Authorization Required
This server could not verify that you are authorized to access the document requested. Either you supplied the
wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
Something doesn't understand how to supply the credentials... its probably me.
Excellent! I've met with Philip (I'm redesigning his division's website here at the University of Texas) and he told me about this project. I didn't expect it to be mobilized this soon.
Awesome - I've been playing with CouchDB and since the raw data is in JSON - gonna try loading this into it and running some experimental map/reduce views for the data.
Thanks!
The point is to be a sort of Google algorithm for Twitter. This is plenty of data to at least get a very solid idea of who the top Tweeters are based on their connections, influence, and popularity.
Also, keep in mind he scraped the TOP Twitter users (those with X+ followers). A lot of Twitters tweets likely come from those under that threshold, saving time, storage space, and effort.
There's another batch coming of tweets off the data mining feed. But yeah: the focus here was on the graph structure more than the text. We're also hoping someone pipes up with "oh gee I have 750m tweets archived do you think anyone else wants to look at them?"
I downloaded it and they date back to 2006. I guess users who only posted a few times have all their old tweets indexed, whereas those with many tweets only have the latest ones in there (i.e. X tweets each collected max).
Anyway, it's 1729.