Sign up for The Media Today, CJR’s daily newsletter.
This fall, with the midterms in view, reporters across the country are focused on campaigns; data journalists, in particular, have heaped attention on voter behavior, trends, and turnout. But instead of tracking elections, Jeremy Singer-Vine, a data journalist and computer programmer in New York, is interested in what the government is actually doing. In September, he started the Data Liberation Project, which aims to identify and retrieve bureaucratic data not otherwise easily accessible to the public, then clean it up and publish it for the benefit of reporters. He’s concerned, he said, with information “that can be broadly useful regardless of what is happening right now in the news cycle.”
To start, he’s submitting about five Freedom of Information Act requests per month. “That may increase or decrease depending on the sort of the experience I get, and through trial and error,” he said. “As I submit more, they will accumulate a sort of workload in terms of tracking them—and, you know, fighting them, potentially, in court.” At the same time, he said, “I’m developing Web-scraping projects to generate data sets from government websites that should be useful for journalists.” The work has involved scouring documents that no one else has managed to look at and “learning a lot of the finer details of the government bureaucracy.”
Singer-Vine, who is thirty-six, began his career as a photojournalist. Lean and soft-spoken, with round-frame glasses, he comes across as an introvert. (“If you met me in person, you would be surprised based on my body type that I played one year of high school football,” he said.) Originally from Berkeley, California, he got a college internship at the Eastbay Express; later, he landed at the Wall Street Journal and Slate. “This was 2010, and the idea of something like data journalism had been around for a long time, but it was just the beginning of ‘the big wave’ that occurred kind of in tandem with the heightening profile of Nate Silver,” he recalled. “I started to learn computer programming and data analysis.”
He got the idea to start the Data Liberation Project several years later, while he was a data editor at BuzzFeed. It’s common practice, he observed, for journalists to look for data to complement a narrative. He thought it would be better to flip that around. Plus, accessing government data can be agonizing; foia requests can remain pending for longer than a reporter keeps a job. “I know some, through my previous work, that have taken more than five years to get resolved,” Singer-Vine said. “Anything from an agency that has a law enforcement component has been, historically, very difficult to obtain. There are exemptions in the Freedom of Information Act specifically relating to law enforcement and ongoing investigations and things like that. It is an exemption that, generally speaking, has been interpreted favorably to the agencies and not to journalists.”
To get around that challenge now, Singer-Vine said, he’s “trying to craft foias that are as persuasive as possible that address and preempt the typical pushback that you’d get from an agency.” He’s received help from the Cornell Law School First Amendment Clinic—so far just in drafting language, though he hopes that the project will be able to recruit lawyers for other kinds of assistance. No one is getting paid—at least, not yet. “I’m doing it on a volunteer basis,” he said.
The work has already turned up some interesting tidbits, including in a report from the Drug Enforcement Administration that mentioned lost and stolen substances. “The DEA publishes very little raw data,” Singer-Vine said. “Basically no raw data about these losses and thefts. But in the annual report that they publish are typically a few paragraphs about the general scans. And in one report I read, there were just two states that accounted for some huge number of thefts—far disproportionate to the size of the states.” He hopes a reporter will pick up the story. “I would want to know why: Why are these states the site of so many thefts?”
He’s steering clear of data related to the midterms. “Election data, to some extent, has been perhaps the most used data in journalism for a long time now,” Singer-Vine said. No need to add to the pile. For the Data Liberation Project, he explained, “I’m most excited and I think most focused on data sets that have some direct bearing on people’s lives.”
Has America ever needed a media defender more than now? Help us by joining CJR today.