Setting Data Free
The government launched its massive data set trove Data.gov in 2009 with a clear mission: to put information the government was gathering anyway into the hands of private sector and nonprofit Web and mobile app developers.
Once that data was out, the White House imagined, developers would set about turning it into useful products—optimizing Census Bureau statistics for marketers; Commerce Department data for exporters; and Housing and Urban Development Department information for building contractors, mortgage brokers and insurance adjusters.
When necessary, the government also would be able to prime the pump with agency-sponsored code-a-thons and app development competitions sponsored through Challenge.gov, a White House initiative that paid out $38 million to prize-winning developers during its first year, which ended in September.
But turning government data into private sector products has proved more complicated in practice.
Some agencies, such as the Environmental Protection Agency, are posting new data sets regularly and rapidly in machine-readable form, but other agencies have shown little interest in devoting dwindling resources to making data more accessible. Agency data publication schedules also are often too slow for the go-go world of mobile apps.
“The theory behind Data.gov was, let’s move forward when it comes to sharing data,” says Josh Green, chief executive officer of Panjiva, a company that crunches customs data for U.S. businesses that import some of their raw materials. “I think that’s right in terms of what would be good for entrepreneurship, but realistically I don’t think that has filtered down to the agency level.” While Panjiva relies on some Census data, which it downloads directly from the Census Bureau, the company uses mostly Customs and Border Protection data on CD-ROMs that it pays to have delivered every day by FedEx.
Panjiva analyzes that data to determine which overseas suppliers have the strongest track records with U.S. partners and then sells the results to companies looking for new suppliers.
Green jokes about the incongruity of snail mailing digital customs logs, but notes that Panjiva’s upfront work to capture imports data is a big reason the company was able to corner the market.
Another company, Brighter Planet, uses EPA and Energy Department data to build Web and mobile apps that auto-calculate the environmental impact of various activities. The for-profit
company created a product for corporate
MasterCard users to determine the environmental impact of their transactions, and it won a recent EPA Challenge.gov competition by developing a Web app that compares the carbon footprint of flying, driving or taking a train or bus to different destinations.
Brighter Planet grabs data straight from government agencies rather than from Data.gov, says Andy Rossmeissl, the company’s co-founder and product design director.
Data.gov is laudable, Rossmeissl says, but developers’ biggest hurdle with government data isn’t finding it, but getting it quickly and in a form they can use. “That wasn’t the focus of Data.gov and, in general, it isn’t the focus of agencies producing data,” he says. “That’s not because their intentions aren’t great, but they have a history of producing data in a very specific way that goes back to the Federal Register and quarterly releases.”
In some cases, agencies also publish data in difficult-to-manipulate forms such as PDFs, significantly increasing the upfront work for developers who then have to create and organize spreadsheets. And agencies often release new information in a different place—or in a different file format—from historic information, Rossmeissl says.
There is a human side to Data.gov that tries to expand the project’s mission to address formatting and publishing frequency. Data.gov director Jeanne Holm lists open data “evangelist” as a job description on her LinkedIn résumé. Holm and her five-member staff spend much of their time working as liaisons for developers seeking government data—figuring out if it exists at all in the federal sector, which agency produced it and if there’s a convenient way to release it. To make the job easier, they’ve begun standing up communities of data, organized into topical categories such as energy, health, law and, most recently, oceans.
The Data.gov team also meets regularly
with about 400 agency “data stewards” to change the way government data is initially created so that it requires less translation and reformatting on the back end.
Even for agencies committed to issuing quality data in a timely way, there are procedural delays that can take a month or more, Holm says. Part of that delay comes from ensuring scientific and technical data meet standard quality requirements. In some cases, agencies also have to aggregate data at a higher level so snoopers can’t figure out sensitive information. Departments want to avoid publishing so many details that it’s possible to determine, for instance, which student in a small school district is receiving lunch subsidies or which resident of a small town has a sexually transmitted disease.
Requirements that data be accessible to people with disabilities can slow the publication of data with embedded maps and graphics. And everything posted to Data.gov must go through a National Security Council check to ensure it doesn’t disclose sensitive security information when combined with other data.
NSC has flagged fewer than 25 of the roughly 400,000 data sets submitted to Data.gov, Holm says, but the process still is time-consuming.
While Brighter Planet doesn’t pull data directly from Data.gov, Rossmeissl calls the project important as a symbol of the Obama administration’s commitment to digital transparency—visible in other initiatives such as programs to store more government records bound for the National Archives in electronic form and to manage more Freedom of Information Act requests online.
The government also is partnering with India to release Data.gov-in-a-Box, an open source version of the open data site that other nations, states or cities can adopt. “Across the board, this administration has really made open data a priority,” Rossmeissl says. “When you see the effort they’ve put into Data.gov, you can’t argue with that.”