Want to Archive Twitter? Good Luck With That

The platform’s meltdown has shed light on the steep challenge of preserving social media data. But not everything is worth saving.
Illustration of a heartshaped locket necklace with a blue twitter bird logo that is crying inside on a yellow background
Illustration: Rosie Struve

From the moment Elon Musk closed his Twitter deal, the network’s diehard users have taken steps to eulogize it. People have downloaded their own archive from Twitter. Others have started threads with screenshots of their all-time favorite tweets. And there’s an ongoing Google doc cataloging Twitter trends and memes, a guide that could serve one day to decode the hieroglyphics of the app.

Whether Twitter goes bankrupt (as Musk himself has said is a possibility) or becomes an unnavigable stream of hate speech and deceptive parody accounts, the network’s future is unknown. But there’s fear that Twitter’s troves of content, important for both historical and political impact (as well as a good laugh), could be lost. Twitter’s founding premise—the 140-character (now 280) quip—doesn’t lend itself well to archiving. That’s in part because capturing a stream of content that increases by the thousands each minute is a technical nightmare, but it’s also due to ethical concerns that not all tweets are created equally. Some are fired off by world leaders who incite violence and others by individuals who would be unknown private citizens, if not for their affinity for the bird app. Both types of tweets can go viral and have lasting consequences.

“I think it’s really important to be thoughtful about the data you collect,” says Miles McCain of PolitiTweet, a service that archives tweets from public figures and influential institutions. “When you try to archive anything and everything, you end up with a whole lot of information that doesn’t really matter.”

An attempt by the United States Library of Congress, which began documenting every public tweet in 2010, failed. Tweets evolved from short bits of text to regularly include photos, videos, and live links. The library ended the Sisyphean project seven years later and said it would only archive select accounts. In 2012, the library said it was archiving half a billion tweets each day. A spokesperson for the library did not provide a comment to WIRED before this story was published.

Elisabeth Fondren, a journalism professor at St. John’s University in New York City, says the failure of that archiving project proved a huge missed opportunity for preserving a rich data set of political discourse and communication trends. The present moment has cast a spotlight on the need to archive social media and exposed the precarity of hosting a public square on the servers of a private company.

“If it had been successful, we would now have it,” says Fondren. “It really undermines researchers’ attempts to assess the social impact of media on society.”

Smaller, third-party services have sought for years to archive more specific content. ProPublica keeps a list of politicians’ deleted tweets on its Politwoops database. PolitiTweet has a database tracking 1,500 accounts. These keep records of statements and news stories from significant people in government and politics, but the projects don’t intend to capture the mass discourse of online communication.

Twitter was designed to capture the moment, and in its early days finding or viewing older tweets wasn’t easy and didn’t seem important. But by 2014, Twitter had improved its search tool for public tweets. The move helped researchers, but it also breathed new life into long-forgotten tweets that had moved down the timeline without much afterthought. The change proved problematic for some tweeters, like those who began punching out 140-character musings as teens but had since become college students or young professionals. Their tweets didn’t always age as well, particularly as an era of cancel culture began.

Automated tweet deletion services have risen up in response. These tools clear large swaths of tweets from an account, and they can allow users to sort by a tweet’s age and levels of engagement and select which tweets to delete. Semiphemeral is one such service, allowing people to auto-delete likes and direct messages, in addition to their own tweets.

“As you watch in horror/delight as Elon burns this site to the ground you might be pondering your privacy,” Semiphemeral tweeted Friday. “Do you have YEARS of tweets, likes, and DMs? Gather ’round, friends, while I show you how to DELETE THEM ALL (or as much as Twitter’s API allows).”

Not everyone is ready to leave behind their tweets. As of Monday, downloading a personal Twitter archive was getting trickier. Doing so requires getting verification codes from Twitter—they were not working via text but appeared to still be sending to email addresses.

If Twitter does go dark, it would be perhaps the largest wipeout of social data to date. There’s little precedent for this in the age of the centralized web: AOL Instant Messenger had a quiet death years after users fled the platform, and its primary content wasn’t public to even archive. Myspace lost years of photos and songs in a poorly managed server migration. Vine, Twitter’s long-mourned, short video service, has been archived in part by enthusiasts who created compilations of the platform’s best content and reposted it to YouTube, and the videos are accessible with direct URLs.

There’s no consensus that Twitter will go down in flames. It might break slowly, crushed by the weight of activity with fewer engineers to work out the bugs. Musk might declare bankruptcy and restructure the massive debt he took on to buy the service. But the drama has exposed the danger of trusting private companies with what we’ve come to consider public records.

“I think what these past two weeks have shown us is Twitter is a private company,” says St. John’s Fondren, “and, first and foremost, is interested in making money and not so much in providing this digital heritage.”