In an article about the difficulties of archiving online activity, Greg Elmer, director of the Centre for the Study of Social Media at Ryerson University in Toronto, made a surprising claim:
The Library of Congress in the United States boldly announced in 2010 that, as Twitter was becoming an important political tool, the national library would endeavour to archive every tweet since 2006. However, Elmer said Twitter has since adopted a much more “for-profit” business model and is starting to sell some of its data, rather than making it widely available for download to anyone with the hard drive capacity.
He said the Library of Congress has quietly backed away from the commitment.
The Library of Congress’ announcement was heralded, rightly, as a pretty big deal. Tweets are a part of history, and there isn’t really a good way to access old ones right now. The added neutrality and security of the LOC, versus an in-house archive, was just icing on the cake. For the LOC to “back away” from the project would be a shame, and to do it so quietly would certainly raise questions.
I asked the LOC if the project was still on track, to which a rep responded: “[T]he assertion about the Library was not the case. And [the author] should have touched base with us before it was published.”
What we don’t really know is where the project stands now. A few months ago, when I asked about the LOC’s progress, a rep told me to sit tight:
We are still working through technical issues and the material is coming in; the process of how to serve it out to researchers while still maintaining the parameters set by our agreement with Twitter is still being worked out.
Long story short, it’s going to be awhile — we can’t really put a timeline on it — before people will be able to come to the Library and start doing research using the archive.
This, as far as I can tell, is still the case. The Library of Congress has a mountain of old tweets to process, and something like 400m new ones to worry about every day. (When the project was announced, that number was just 50m).
Largely for technical reasons, the LOC’s goal was never to create a full, on-demand user-facing Twitter archive; instead, the goal was to build a more modest access tool. “The general concept is to have the material available for research, like our other collections,” the LOC told me in Februrary. “But individuals can do research here — always have.”
Twitter declined to comment on-record about the LOC’s project — in any case the company has handed over the data, and continued access, which is where its responsibility stops.
But it’s easy to imagine that there’s a technological bottleneck here, and that Elmer wasn’t completely wrong, in that the Library has bitten off a little more than it can chew. Serving up billions upon billions of tweets in even the most basic way is a hard job for a technology company, much less for a government agency whose requested budget for “Digital Initiatives” in 2013 — all of them, including web archiving, historic newspapers, the online American history archive, the veteran’s history project, early sound recordings — is under $50m, and actually lower than it was in 2011.
A full and accessible Twitter archive would be a cornerstone of modern internet history, and an invaluable tool for both researchers and regular citizens alike. It’s something we should all want. In this case, though, the Library gleefully took on a task that Twitter itself elected not to, and which netted it a quick and substantial PR boost. Maybe our expectations should reflect that.