Report calls data-mining expensive, ineffective

Government-sponsored mining of databases for information is costly, ineffective at catching terrorists and threatens citizens' privacy rights, a new Cato Institute study claims.

The statistical likelihood of false positives is so high that the practice "will inevitably waste resources and threaten civil liberties," according to authors Jim Harper, Cato's information policy studies director, and IBM researcher Jeff Jonas.

Data mining was dubbed a key tool in the war on terror after al Qaeda's attacks on Sept. 11, 2001, with federal agencies arguing that comprehensive monitoring of personal data would assist in tracking terrorists. The method relies on "pattern-based analysis" of private data from large numbers of people, but the technology needed to obtain precise results does not exist, the study found.

"Better interagency information-sharing, investigatory legwork, in pursuit of genuine leads, and better training are what the 9/11 story most clearly calls for," Harper and Jonas argued.

COMMENTS

  • As a person who, in the context of data mining for marketing and advertising purposes, is atypical for my age, gender, and household income, I find data mining for the purposes of catching terrorists a queasy proposition. Being bombarded with useless marketing messages is one thing; being suspected of being a terrorist is a completely other thing. Plus, unlike the normal judicial system that we are all familiar with (which occasionally does convict or execute the wrong person), the system that has evolved to prove or disprove that a person is or is not a terrorist seems to have a very weak checks-and-balances mechanism. I personally don't think the science of data mining is evolved enough yet to compensate for the potential harm. With the power to up-end or destroy an innocent person's life and livelihood, and the chances of false positives being high, I'm not convinced that the risk doesn't outweigh the reward. I wish that the article had given more concrete numbers so that we can have some idea of the potential risk versus reward. Are we talking about numbers like: 10 false accusations per real terrorist apprehended? Or: 1,000 false accusations per real terrorist apprehended? Or: 500,000 false accusations per real terrorist apprehended?
  • “Wise Old Owl” was correct. Data mining is a new concept and, as with all new concepts and operating systems, functionality is dependent on implementation. As much as I worry about “Big Brother” I’ve no doubt that, in the macro view, data mining has immediate and definite results. As for the micro, there lies the crux of the matter. Implementation will depend on a number of things. The first is the mere collection of data. You can’t mine what you don’t have. You will have to develop disparate sources and discrete pools of eclectic comprehensive information on the macro and micro level. Simply put, to work, the government will have to have access to the minutia of every field, business, and individual within their scope. And, ladies and gentlemen for the War on Terrorism, that scope is the world. Second, due to the sheer size of this undertaking, the correlation of the various data pools into meaningful information will be a function of software and logic. Our rate of technological evolution and my experience and training tells me that “It’s just a matter of time.” For immediate broad results, demographic trends are easily discerned and have a direct benefit to business. For specific incident interpretation, much will lie with the weight and significance given to variables; what is important to a particular search, what does this type of information or activity mean? Volume and scope dictates the initial screening will be done by computers. Flags can be set for items of interest; weight and meaning changed for variables. Bottom line? To even attempt getting any value from data mining, “Big Brother” will have to know all. Third is the action. What is to be done with the knowledge learned? The current administration says it will use all this information only to look for terrorism. I have confidence that is a primary reason for this need, but this student of history knows that knowledge is power and what is that saying? “Power corrupts, and absolute power corrupts absolutely.” Sooner or later, in a changing political environment, there will occur an abuse, at the minimum a loss of data, by this omnipotence. It still comes down to which is more important: security or freedom. Personally, I’m still weighing my response. What do you think? Tip off.
  • Data mining is a new concept and of course it is not perfect. But, it does let us narrow the field to concentrate limited resources on high risk actors. It is not a substitute for common sense. I fear the little brothers, the number of invisible people who have access to my data. Jealous neighbors, in-laws, cousins, etc. can use this for grinding personal axes.