Archiving Twitter

Nearly seven years ago, the Library of Congress announced an ambitious partnership: it would be working with Twitter to archive each 140-character tweet. Some things have changed in the intervening years. Twitter has become a venue for the President of the United States to speak to his constituents. This year tweets doubled in size. And, according to an article in The Atlantic last year, the Library of Congress’ tweet archive is still largely theoretical. While tweets get harvested, they are unprocessed in any way. The current protocol is, in the words of Andrew McGill for The Atlantic, “the digital equivalent of throwing a bunch of paperclipped manuscripts into a chest and giving it a good shake.” Not great.

Photo by Zeyi Fan, used under CC BY-NC 2.0:

The Twitter archive must be some kind of beat over at The Atlantic because two weeks ago they published another insightful article on the topic, “Future Historians Probably Won’t Understand Our Internet, and That’s Okay.” Besides again engaging with the problems of scale that a Twitter archive faces, Alexis C. Madrigal makes another interesting point about archiving the internet. Capturing data is one thing, but preserving algorithms and user interfaces, the things that govern the way we use Twitter or Facebook, is a much harder task, in part because they’re moving targets. Here’s the question Madrigal hit upon”

If you want to understand how WordPerfect, an old word processor, functioned, then you just need that software and some way of running it. But if you want to document the experience of using Facebook five years ago or even two weeks ago … how do you do it?

This is a fascinating challenge and not one that archivists will likely be able to tackle, let alone overcome in the near future. Archival work has much to do with planning for the future and internet culture is moving at a speed that defies such planning. It will be interesting to see what tools archivists adopt and create in order to reckon with this era. I suspect that we may come to a point where something like digital archaeology is required to try to make sense of the caches of old pages left on servers. It frightens me a bit to think about how difficult that could be.

A tangential bit of sad news from this week is that Storify is going away. This platform provided a streamlined way to gather tweets into a narrative form and I found it to be an excellent tool for curating tweets after conferences or making sense of breaking news. Luckily it has not vanished into the ether yet. Users have until May 2018 to archive old Storify stories. I have a couple (like the one I made about PubComm 2017) that I need to make sure to keep.

I found Storify to be the best way to capture the chaos of Twitter. I’m sure some other product will pop up to provide similar services, but if the Library of Congress can’t make sense of the murmuration of hot takes in 280 characters, who can?


Leave a Reply

Your email address will not be published. Required fields are marked *