mycteria/Shutterstock.com

How OPM Can Find Its Missing Data on the Dark Web

The best way to recover from breaches is to assume that they’re inevitable—and start looking for your data before you know it’s gone.

The data breach announced last week—more than 4 million federal employee records stolen from the Office of Personnel Management, allegedly by hackers linked to the Chinese government—was “the most significant” theft of government data ever, according to the chairman of the House Homeland Security Committee. But experts say it’s not too late to reduce the harm done by tracking the stolen information as it moves around the Internet.

Stealing data is different from stealing a Ming vase, in that the original remains behind. If cyber detectives can find a copy of this data “in the wild,” they can limit its value as a tool for fraud, help build a case attributing the hack to the Chinese government, and develop insight into how the data will be used.

How would you do that? First, make sure you can recognize the data when you come across it, using a technique called cryptographic hashing. “It’s not code that’s embedded in the data so much as a computation done on the data itself,” said Danny Rogers, one of the co-founders of Terbium Labs, a data intelligence company that tracks stolen data. By running chunks of the data through a mathematical function, you generate a hash — a number unique to each specific chunk. You can then crawl the web in search of data whose hash values match those of your original.

Last week, Terbium released a product called MatchLight that hunts for stolen data. “We compute a whole bunch of these hash values on little pieces of data, both on behalf of our clients and as we crawl. We simply compare the results of those hash functions to each other to tell which data had a similar input,” he said.

Hashing won’t prevent a breach from happening. The point is to drastically cut down on the amount of time it takes to discover that data has been stolen, by constantly crawling the web in the search for hashes, even before you know it’s gone. Early discovery can make the stolen data worth less to the people who stole it.

“You really can’t prevent every breach. With advanced enough actors, you have to assume that your organization is going to be breached at some point,” Rogers said. “The most important element of your security posture is how quickly you can detect where the data is and respond, initiating whatever remediation plan you have in place.”

The OPM breach was detected in April by Department of Homeland Security, or DHS, experts using a system called Einstein 3, which looks for malware on federal computer networks. The system, designed to predict and prevent major cyber breaches, did neither of those things here. But it was — at least partially — useful in detecting the the breach after the fact.

How long after the fact? DHS hasn’t said. But the average time between a breach and its discovery by the plundered organization is 200 to 230 days, according to Rogers. Moreover, it’s often third-party security firms like Kaspersky Labs or IOActive that make the discovery.

Of course, there’s more than one way to detect major intrusions after they occur. That’s what MatchLight is all about. “We can bring that down to 30 seconds-to-15 minutes,” said Rogers.  He says that MatchLight, though hardly the only product that can do cryptographic hashing, is the only one that can do it on the scale relevant to an organization like OPM. “We focus on the large-scale automation of that process,” Rogers said.

Of course, the data fingerprinting is only useful if the stolen data hits the Dark Web — a portion of the web unreachable through “normal” search engines like Google. Often accessed anonymously through onion routing services like Tor, the Dark Web is often associated with illegal exchanges — but is also used by activists and journalists looking to exchange information beyond the gaze of authoritarian regimes.

Is the arrival of the stolen records on the Dark Web a certain bet? The chief value of much of stolen OPM data could be the narrow targeting of very particular military or national security workers, possibly via blackmail or elaborate phishing scams.

(Image via mycteria/Shutterstock.com)