Defense

NSA program broader than previously described

Warrantless eavesdropping involves analyzing far-reaching web of communications data.

March 17, 2006

By Shane Harris

The Bush administration has assiduously avoided any talk about the actual workings of its program to intercept the phone calls and e-mails of people in the United States who are suspected of having links to terrorists abroad. Officials' unwavering script goes like this: Present the legal justifications for the president to authorize domestic electronic surveillance without warrants, but say nothing about how the National Security Agency actually does it -- or about what else the agency might be doing.

But when Attorney General Alberto Gonzales appeared before the Senate Judiciary Committee on February 6 to answer questions about the program, what he didn't say pulled back the curtain on how the NSA decides which calls and e-mails to monitor. The agency bases those decisions on a broad and less focused surveillance than officials have publicly described, a surveillance that may, or may not, be legal.

In a hearing that lasted more than eight hours, Gonzales, who didn't testify under oath, dutifully batted away senators' inquiries about "operational details" and stayed silent, under determined questioning by some Democrats, about other warrantless programs that the president might have secretly authorized. When the hearing finally ended, so did Gonzales's comments on the program.

Until 22 days later. On February 28, Gonzales sent committee Chairman Arlen Specter, R-Pa., a six-page letter, partly to respond to questions he was unprepared to answer at the hearing, but also "to clarify certain of my responses" in the earlier testimony. In the letter, Gonzales took pains to correct any "misimpressions" that he might have created about whether the Justice Department had assessed the legality of intercepting purely domestic communications, for example, as opposed to those covered by the NSA program, in which one party is outside the United States. The attorney general didn't say that Justice had contemplated the legality of purely domestic eavesdropping without a warrant, but he also didn't say it hadn't.

Gonzales's letter was intriguing for what else it didn't say, especially on one point: With exacting language, he narrowed the scope of his comments to address only "questions relating to the specific NSA activities that have been publicly confirmed by the president." Then, as if to avoid any confusion, Gonzales added, "Those activities involve the interception by the NSA of the contents of communications" involving suspected terrorists and people in the United States.

Slightly, and with a single word, Gonzales was tipping his hand. The content of electronic communications is usually considered to be the spoken words of a phone call or the written words in an electronic message. The term does not include the wealth of so-called transactional data that accompany every communication: a phone number, and what calls were placed to and from that number; the time a call was placed; whether the call was answered and how long it lasted, down to the second; the time and date that an e-mail message was sent, as well as its unique address and routing path, which reveals the location of the computer that sent it and, presumably, the author.

Considering that terrorists often talk and write in code, the transactional data of a communication, properly exploited, could yield more valuable intelligence than the content itself.

"You will get a very full picture of a person's associations and their patterns of activity," said Jim Dempsey, the policy director of the Center for Democracy and Technology, an electronic-privacy advocacy group. "You'll know who they're talking to, when they're talking, how long, how frequently.... It's a lot [of information]. I mean, a lot."

According to sources who are familiar with the details of what the White House calls the "terrorist surveillance program," and who asked to remain anonymous because the program is still classified, analyzing transactional data is one of the first and most important steps the agency takes in deciding which phone calls to listen to and which electronic messages to read.

Far from the limited or targeted surveillance that Gonzales, President Bush, and intelligence officials have described, this traffic analysis examines thousands, perhaps hundreds of thousands, of individuals, because nearly every phone number and nearly every e-mail address is connected to a person.

Patterns in the Sea

Analysis of telephone traffic patterns helps analysts and investigators spot relationships among people that aren't always obvious. For instance, imagine that a man in Portland, Ore., receives a call from someone at a pay phone in Brooklyn, N.Y., every Tuesday at 9 a.m. Also every Tuesday, but minutes earlier, the pay phone caller rings up a man in Miami.

An investigator might look at that pattern and suspect that the men in Portland and Miami are communicating through the Brooklyn caller, who's acting as a kind of courier, to mask their relationship. Patterns like this have led criminal investigators into the inner workings of drug cartels and have proved vital in breaking these cartels up.

Terrorists employ similar masking techniques. They use go-betweens to circuitously route calls, and they change cell phones often to avoid detection. Transactional data, however, capture those behaviors. If NSA analysts -- or their computers -- can find these patterns or signatures, then they might find the terrorists, or at least know which ones they should monitor.

Just after 9/11, according to knowledgeable sources, the NSA began intercepting the communications of specific foreign persons and groups named on a list. The sources didn't specify whether persons inside the United States were monitored as part of that list. But a former government official who is knowledgeable about NSA activities and the warrantless surveillance program said that this original list of people and groups, or others like it, could have formed the base of the NSA's surveillance of transactional data, the parts of a communication that aren't considered content.

If the agency started with a list of phone numbers, it could find all the numbers dialed from those phones. The NSA could then learn what numbers were called from that second list of numbers, and what calls that list received, and so on, "pushing out" the lists until the agency had identified a vast network of callers and their transactional data, the former official said.

The agency might eavesdrop on only a few conversations or e-mails. But starting with even an initial target list of, say, 10 phone numbers quickly yields a web of hundreds of thousands of communications, because the volume increases exponentially with every new layer of callers.

To find meaningful patterns in transactional data, analysts need a lot of it. They must set baselines about what constitutes "normal" behavior versus "suspicious" activity. Administration officials have said that the NSA doesn't intercept the contents of a communication unless officials have a "reasonable" basis to conclude that at least one party is linked to a terrorist organization.

To make any reasonable determination like that, the agency needs hundreds of thousands, or even millions, of call records, preferably as soon as they are created, said a senior person in the defense industry who is familiar with the NSA program and is an expert in the analytical tools used to find patterns and connections. Asked if this means that the NSA program is much broader and less targeted than administration officials have described, the expert replied, "I think that's correct."

In theory, finding reasonable connections in data is a straightforward and largely automated process. Analysts use computer programs based on algorithms -- mathematical procedures for solving a particular problem -- much the same way that meteorologists use data models to forecast the weather. Counter-terrorism algorithms look for the transactional indicators that match what analysts recognize as signs of a plot.

Of course, those algorithms must be sophisticated enough to spot many not-so-obvious patterns in a mass of data that are mostly uninteresting, and they work best when the data come from many sources. Algorithms have proven useful for detecting frequent criminal activity, such as credit card fraud.

"Historical data clearly indicate that if a credit card turns up in two cities on two continents on the same day, that's a useful pattern," says Jeff Jonas, a computer scientist who invented a technology to connect known scam artists who are on casinos' watch lists with new potential grifters, and is now the chief scientist of IBM Entity Analytics.

"The challenge of predicting terrorism is that unlike fraud, we don't have the same volume of historical data to learn from," Jonas said. "Compounding this is the fact that terrorists are constantly changing their methods and do their best to avoid leaving any digital footprints in the first place."

The obvious solution would be to write an algorithm that is flexible and fast enough to weigh millions of pieces of evidence, including exculpatory ones, against each other. But according to technology experts, and even the NSA's own stated research accomplishments, that technology has not been perfected.

The Bleeding Edge

The NSA began soon after the 9/11 terrorist attacks to collect transactional data from telecommunications companies. Several telecom executives said in press accounts that their companies gave the NSA access to their switches, the terminals that handle most of the country's electronic traffic. One executive told National Journal that NSA officials urged him to hand over his company's call logs. When he resisted, the officials implied that most of his competitors had acceded to the agency's request.

Not long after the surveillance program started, in October 2001, the NSA began looking for new tools to mine the telecom data. The agency, the industry expert said, considered some that the Defense Department's Total Information Awareness program was developing. TIA was an ambitious and controversial experiment to find patterns of terrorist activity in a much broader range of transactions than just telephone data.

But NSA officials rejected the TIA tools because they were "too brittle," the expert said, meaning that they failed to manage the torrent of data that the NSA wanted to analyze. He noted the irony of rejecting the TIA technologies -- which privacy advocates had characterized as huge, all-seeing, digital dragnets -- because they couldn't handle the size of the NSA's load.

In the fall of 2002, a federal research-and-development agency that builds technologies primarily for the NSA launched another search for pattern-detection solutions. The Advanced Research and Development Activity, ARDA, issued $64 million in contracts for the Novel Intelligence for Massive Data, or NIMD, program. Its goal was "to help analysts deal with information overload, detect early indicators of strategic surprise, and avoid analytic errors," according to ARDA's public call for proposals released last year.

In essence, NIMD is an early-warning system, which is how the administration has described the terrorist surveillance program. In 2003, ARDA also took over research of the tools being developed under TIA.

While the NSA was searching for the next generation of data-sifters, it continued to rely on less sophisticated tools. For an example, the former government official who spoke to NJ cited applications that organize data into broad categories, allowing analysts to see some relationships but obscuring some of the nuance in the underlying information. The results of this kind of category analysis can be displayed on a graph.

But the graph might reveal only how many times a particular word appears in a conversation, not necessarily the significance of the word or how it relates to other words. Technologists sarcastically call these diagrams BAGs -- big-ass graphs.

Such was the state of affairs when the NSA started looking for terrorist patterns in a telephonic ocean. So, instead of looking for a tool that could cull through the data, the agency decided to "reverse" the process, starting with the data set and working backward, looking for algorithms that could work with it.

The NSA has made some breakthroughs, the industry expert said, but its solution relies in part on a technological "trick," which he wouldn't disclose. Another data-mining expert, who also asked not to be identified because the NSA's work is classified, said that computer engineers probably started with the telecom companies' call data, looked for patterns, and then wrote algorithms to detect them as they went along, tweaking the algorithms as needed.

Such an ad hoc approach is brittle in its own right. For starters, if analysts are working with algorithms designed to detect only certain patterns, they could be missing others, the technology expert said. At the same time, the more dependent the algorithms are on identifying very specific patterns of behavior, the more vulnerable the NSA's monitoring is to being foiled if terrorists discover what the agency is watching for, or if they change their behavior. A more complex algorithm that considers thousands, or even millions, of patterns is harder to defeat.

The industry expert added that NSA officials have worried that "if you knew what the technical trick was they were doing [to make the surveillance program function], you wouldn't have to know what specific algorithms" the agency was using. This reliance on a "trick" makes the program very vulnerable to defeat and helps explain why the Bush administration is so keen on cloaking its inner workings.

"It's pretty bleeding-edge," the expert said, referring to a technology that's unperfected and therefore prone to instability. "We're talking about dumping hundreds of thousands or millions of records" into a system. In an unsophisticated system, connections among people can emerge that look suspicious but are actually meaningless. A book agent who represents a journalist who once interviewed Osama bin Laden, for example, doesn't herself necessarily know bin Laden. But she might turn up in an NSA search of transactional data. "False positives will happen," the expert said.

Gonzales and former NSA Director Michael V. Hayden have said that career agency employees decide to eavesdrop only if they have a "reasonable" basis to believe one party to a communication is a terrorist or connected to a terrorist organization.

But what determines reasonableness? In a January speech at the National Press Club, Hayden drew a distinction between the Fourth Amendment's requirement that "no warrants shall issue, but upon probable cause," and its protection against "unreasonable searches and seizures."

When a journalist in the crowd questioned his logic, Hayden heatedly replied, "If there's any amendment to the Constitution that employees of the National Security Agency are familiar with, it's the Fourth. And it is a reasonableness standard in the Fourth Amendment.... I am convinced that we are lawful, because what it is we're doing [intercepting content] is reasonable."

He said that the terrorist attacks fundamentally altered the NSA's thinking. "The standard of what [information] was relevant and valuable, and therefore, what was reasonable, would understandably change, I think, as smoke billowed from two American cities and a Pennsylvania farm field. And we acted accordingly."

Aside from the question of whether NSA employees, rather than federal judges, are qualified to determine what constitutes a reasonable search, that determination provides much of the basis for deciding whose communications will be intercepted without a warrant. If the technology the NSA is using to determine what constitutes a reasonable search is unsophisticated, the industry expert said, "you're talking about tapping a phone based on a statistical correlation."

A New Legal Battle?

Gonzales's narrowly tailored letter to Sen. Specter raised more questions than it answered. Democrats were outraged by what they saw as the attorney general's attempt to alter his testimony and to obstruct senators' attempts to fully assess the program's legal basis.

"Much of your letter is devoted to not providing answers to the questions of a number of us regarding legal justifications for activities beyond those narrowly conceded by you to have already been confirmed by the president," Sen. Patrick Leahy of Vermont, the Judiciary Committee's ranking Democrat, wrote to the attorney general in a follow-up letter.

Leahy also raised the question of what else Gonzales hadn't told lawmakers. The attorney general's letter contained "disturbing suggestions ... that there are other secret programs," Leahy wrote.

In Gonzales's letter to Specter, the attorney general had referred to "other intelligence activities" and to his inability to discuss them; he left open the possibility that the president may not have authorized these activities. Gonzales wrote, "When I testified in response to questions from Sen. Leahy, 'Sir, I have tried to outline ... what the president has authorized, and that is all that he has authorized,' I was confining my remarks to the Terrorist Surveillance Program as described by the president."

Gonzales's testimony was meant to defend the program's legality. But as more about the NSA's operations become known, new legal questions arise, including one that goes to the heart of how officials reasonably identify suspected terrorists.

Under normal criminal law, content is defined as "any information concerning the substance, purport, or meaning of [a] communication," but the definition of content under the law that governs electronic eavesdropping on U.S. persons for intelligence purposes is different and is potentially in conflict with normal jurisprudence. That law, the Foreign Intelligence Surveillance Act, states that content "includes any information concerning the identity of the parties ... or the existence, substance, purport, or meaning of [their] communication."

A phone number can be used to identify a person, said Dempsey of the Center for Democracy and Technology, who for nine years was assistant counsel to the House Judiciary Subcommittee on Civil and Constitutional Rights. Does that mean that a phone number is "content" under the law?

FISA, enacted in 1978, didn't envision today's technology, when anyone with an Internet connection can use a phone number to find someone's name, address, and even an aerial photograph of his house, Dempsey said.

"I just cannot read [FISA] and figure out what it means in the context of analysis of [transactional] data," he added. "Presumably somebody in the administration thinks they understand it.... Whether that's providing any clear guidance" to the people working on the NSA program, "that's not clear."

NEXT STORY: Senator decries agency coordination in Iraq, Afghanistan