Features

Taking a Flier on Big Data

SA wants data aggregators to screen airline passengers, but digital records can be unreliable.

May 28, 2013

Senior Correspondent

Airline passengers might soon be subjected to probes more controversial than body scans if the Transportation Security Administration pursues plans to profile passengers based on commercial analytics.

TSA is considering letting private data brokers calculate the threat-level of fliers. Agency officials say they expect to finish exploring this approach by the end of this year. Passengers whose digital footprints check out clean wouldn’t have to strip off shoes, overcoats and belts, or unpack laptops and liquids.

Set aside privacy fears about TSA peering into citizens’ gun shopping receipts, pharmacy purchases or online dating activities. Are commercial data aggregations even accurate? No one knows. Not the Federal Trade Commission, nor the Justice Department. Not the data brokers. And not the people being tracked—who typically can’t even see their own records.

Personal information gathered in commercial forums could prove valuable for public safety, authorities and privacy advocates agree. But they also say judging citizens based on outdated or inaccurate underlying data could do more harm than good for society.

TSA’s thinking is that a company would aggregate biographic and biometric “nongovernmental data elements to generate an assessment of the risk to the aviation transportation system that may be posed by a specific individual,” states a Jan. 8 request for strategy suggestions.

The system would have to provide a “reliable method that effectively identifies known travelers, based on a sound analysis and the application of an algorithm that produces dependable results,” the work requirements state.

Fliers and most TSA officials would be in the dark about the data those algorithms are munching on. “The specific sources and types of information employed for pre-screening purposes under this initiative may not be publicly disclosed,” agency documents state, adding that the data will not be disclosed to TSA except during audits. The quality requirements are vague: The vendor must use “specific sources of current, accurate and complete nongovernmental data.”

Spotty Track Records

Increasingly, big data taps the same kinds of digital evidence for authorities as it does for marketers: social media posts, voter registrations, credit reports and clickstreams—which are Web browsing histories—to name a few.

The FTC in December 2012 ordered nine data brokers to report whether their company “monitors, audits, or evaluates the accuracy of personal data” used to target advertising. Commission officials, however, say they are not inquiring about the accuracy of personal data used to track criminals. “Our focus is on consumer privacy and commercial data practices—rather than the use of commercial data for law enforcement purposes,” says Peder Magee, senior attorney for FTC’s Division of Privacy and Identity Protection.

It’s too late anyway. Justice’s Bureau of Alcohol, Tobacco, Firearms and Explosives already uses big data to predict gun violence. Justice officials would not comment on how they measure the integrity of this information.

The consequences of relying on dubious statistics and computations can vary. Some researchers suggest that a few mistakes won’t affect results because the scope of these analyses is so huge. “We can accept some messiness in return for scale,” Viktor Mayer- Schonberger and Kenneth Cukier write in their book, Big Data (Eamon Dolan/Houghton Mifflin Harcourt, March 2013). “We’re willing to sacrifice a bit of accuracy in return for knowing the general trend. Big data transforms figures into something more probabilistic than precise.”

However, the level of precision that satisfies marketers is very different from the exactitude required by government agencies, says Jennifer Granick, director of civil liberties at Stanford University’s Center for Internet and Society. “You can have 15 percent accuracy for advertising,” which might be better than other forms of behavioral analyses, “but if you are getting 85 percent of it wrong when you are denying people government benefits or sending out police to interview them, that would be completely wasteful and dangerous,” she says.

One major concern among some law enforcement experts is that most data warehouses store obsolete records. “The biggest problem is they don’t update,” says Paul Wormeli of the Integrated Justice Information Systems Institute, a federally funded organization. A citizen’s profile is not automatically adjusted if a credit report or human resources form turns out to have been mistyped.

The Data Police

There’s no easy answer to the potential accuracy problem with big data.

Directing a government agency, or even a bunch of agencies, to regulate data quality would be nearly impossible and futile, information management experts say. Plus, the private sector has a financial incentive to tidy up a person’s entry: the aggregator market competes on the sharpness of its databases. People should have the ability “to correct it and to remove it if the info is sensitive,” says Craig Wills, a computer science professor at Worcester Polytechnic Institute. Still, fixes made to one database don’t always carry over to other systems relying on the same information, Granick notes. “There’s no right to access the profile that whatever advertisers of the world have compiled on me,” she says. “Amazon has a profile on me and what they think I like, and I can refine it, but I can’t get a copy.”

Marketing firms argue new industry guidelines that let Internet users opt out of online tracking address many of these problems. Principles adopted by the Digital Advertising Alliance, whose members include Datalogix, Acxiom and other data wholesalers, prohibit browsing histories from being used to determine eligibility for employment, health care treatments and insurance coverage. “To date it’s proven to work. We have very broad reach and people are following it,” says Stuart Ingis, counsel for the alliance.

Unlike the data mining industry, credit bureaus are required by law to correct commercial data.

Credit information is updated every 30 days, or each payment cycle, according to the Consumer Data Industry Association. Citizens are responsible for communicating name and address changes to lenders, who furnish those modifications to the bureaus. “The furnisher may have more up-to-date address information than the post office,” says Norm Magnuson, the association’s vice president of public affairs.

So for TSA and other agencies, vetting the accuracy of big data will be nothing short of a big challenge.

NEXT STORY: What Reinvention Wrought