The Media Today

The organization that safeguards the internet’s history is under attack  

October 17, 2024
 

Sign up for The Media Today, CJR’s daily newsletter.

In recent weeks, the digital library dedicated to preserving the internet’s history has been under attack by the internet itself. The Internet Archive, a nonprofit library based in California, was founded in 1996 to archive and preserve the World Wide Web. Today, it saves roughly twenty thousand URLs every second, or about a billion URLs daily. Last week, it was revealed that hackers had accessed sensitive information from millions of the archive’s users. Shortly after, a distributed denial-of-service (DDoS) attack took the site offline. As of this publication, much of the Internet Archive’s services remain unavailable while staff examines and upgrades its internal systems (though the Wayback Machine, a popular initiative of the Internet Archive, is back online). The attacks come after the Internet Archive lost a major legal battle surrounding copyright violations last month. In short, the Internet Archive can’t seem to catch a break. “@internetarchive team spirits high, but tired,” Brewster Kahle, the archive’s founder, tweeted Tuesday. 

When the internet was in its baby era, few worried about archiving it—partly because there wasn’t nearly as much Web content to preserve. Today, it’s a common misconception that publishing text on the Web is like carving letters into stone, protected in a digital cloud and immune from fires or other disasters that physical books have to worry about. In reality, the internet has been disappearing since its inception. A quarter of all webpages that existed between 2013 and 2023 are no longer accessible, according to the Pew Research Center. To preserve our collective digital history, the Internet Archive deploys digital spiders to capture snapshots from across the Web. The pages are stored in the Internet Archive’s free repository, the Wayback Machine, which allows users to see what a website used to look like (if you’re curious, here’s CJR’s first archival snapshot, from 1996). The archive also enables users to track changes to government websites, revisit defunct media sites like Gawker and The Messenger, and browse preserved cookbooks. “The idea is to build the Library of Alexandria Two,” Kahle told The New Yorker in 2015. 

While an accidental fire—bruited to have been started by Julius Caesar’s men—destroyed parts of Alexandria One, intentional cyberattacks have made some serious, albeit nonlethal, stabs at Alexandria Two (a/k/a the Internet Archive). In late September, bad actors stole a user authentication database containing thirty-one million unique records of users’ email addresses, usernames, and encrypted passwords. (The leak of passwords is especially sensitive, as people tend to reuse passwords across many platforms.) The hackers left an ominous Javascript message on the archive’s webpage: “Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering from a catastrophic security breach? It just happened. See 31 million of you on HIBP!”

The text “HIBP” refers to “Have I Been Pwned”—a website that lets internet users check if their personal data has been compromised. The hackers sent word of the security breach to Troy Hunt, the creator of HIBP, on September 30. Hunt, however, was traveling and didn’t realize the significance of the breach until almost a week later, per his X feed. Hunt eventually notified the Internet Archive and gave the organization a seventy-two-hour window before publicizing the data breach. While that crisis was being dealt with, another began: the DDoS attack—a cybercrime in which the attacker floods a service with internet traffic that results in a digital jam—knocked the library offline. It is not clear if the two attacks are related. “DDOS on a tuesday? Last time it was a monday. Geez,” tweeted Kahle, who has been providing frequent updates on X. 

Social media has swirled with questions surrounding the attack: Who would go after a digital nonprofit library? One group called BlackMeta has claimed responsibility for the DDoS attack, citing pro-Palestinian motives. “We believe that highlighting the plight of innocent Palestinian people is essential, and targeting a significant digital resource like the Internet Archive serves to underscore the importance of their story and experiences,” the group tweeted over the weekend. But some say the pro-Palestinian motives are a false flag for two main reasons: first, the archive contains many valuable resources about Palestine that are now inaccessible due to the attack; second, the library is a 501(c)(3) nonprofit, public charity, and nongovernmental organization, with no direct affiliation to the US government, Israel or Mossad, or counterterrorism, as X’s Community Notes pointed out. 

At a time when an onslaught of falsehoods and disinformation is swirling around the election, it’s crucial to maintain a record of what’s been said in its original form. In other words, we need to keep our digital receipts in a safe drawer. According to the Financial Times, this need became increasingly clear to the Internet Archive after the 2016 election, when the conversation around fake news intensified. In response, the organization launched several initiatives, including archiving Donald Trump’s television appearances and cataloguing his tweets. “It’s not about trying to archive the stuff that’s true, but archive the conversation. All of that is what people are experiencing,” Kahle told the paper at the time.

I spoke with Maria Bustillos, a writer and information activist, about the attack. She noted that the timing of the incidents, so close to the election, was “striking.” Shutting down the Internet Archive, she said, stops people from “finding stuff out.” As Bustillos previously wrote for CJR, the Internet Archive is behind Democracy’s Library, which collects government publications worldwide and makes them available to journalists, researchers, and the general public. “It’s a very fundamental form of journalism,” Bustillos wrote. 

Sign up for CJR’s daily email

For now, Democracy’s Library remains inaccessible—as are many of the other services provided by the Internet Archive. While this isn’t the first time bad actors have attacked the archive, the incidents are a reminder of how little stands between the library and a digital abyss. While other digital archives exist, none started capturing the Web as early as the Internet Archive did, making it almost impossible to replace. “The archive lacks powerful defenders,” Bustillos told me. “It’s all falling on one nonprofit.”

Sarah Grevy Gotfredsen is a computational investigative fellow at the Tow Center for Digital Journalism at Columbia University. She works on a range of computational projects on the digital media landscape, including influence operations conducted through news media and the information ecosystem. She graduated from Columbia University in 2022 with an MS degree in data journalism.