Promising Practices Promising PracticesPromising Practices
A forum for government's best ideas and most innovative leaders.

Got Big Data? First Define Your Big Question


Big Data is big. Really BIG. Indeed, the definition from the McKinsey Global Institute, which coined the phrase “big data,” is “data sets whose size is beyond the ability of typical database software tools to capture, manage and analyze.” Big Data is so big that your organization (almost by definition) cannot cope with it.

If, however, your organization does have big software, it might be able to mine some big data for some analytical nuggets. Such data mining, to again quote from McKinsey, is “a set of techniques to extract patterns from large data sets by combining methods from statistics and machine learning with database management.”

But what kind of patterns might your organization seek to extract? If you are looking for crime patterns in your city, you don’t start with sophisticated software. For policing, as CompStat illustrated, an excellent first-order analytical tool is dots on a map. When the data are presented this way, you don’t need a degree in statistics to interpret them.

About a decade ago, I was at a party with a bunch of young quants. They were getting (or had already gotten) their Ph.D.s from MIT or Harvard in some quantitative discipline. One of these Ph.D.s had deserted his intellectual field to work for a supermarket chain. He was charged with mining all of the chain’s data on sales and product placement to determine where in its stores to display which products. For example, which ones should be given those priority spaces at the end of which aisle? To answer this question, the chain had lots of data and lots of computers.

I confess that I thought this analytical task had a very low meaning quotient. I long ago figured out that every grocery store puts the milk at the very back. Everyone needs milk. Indeed, some people come into the store for the single purpose of buying milk. And, if in doing so, they walk past cookies or soup they might make an impulse purchase.

But notice: For this chain’s effort to mine its big data, it had already defined its big question.

But how do we go mining for something that we don’t know is there? For something that we may not know exists? Before people go data mining, they have to do serious data thinking.

During World War II, the allies were analyzing the bullet-hole data from bombers returning from missions over continental Europe. The analysts were not, however, randomly mining the data. They were trying to answer a specific question: How could they improve these planes’ survivability? What parts of the aircraft should they reinforce with armor?

All of the analysts observed where the planes had been hit: primarily on the wings and the tail. So they recommended reinforcing these sections. Like Sherlock Holmes’ Watson, they could see, but they did not observe.

One statistician, however, dissented. Abraham Wald observed that the data came only from the planes that returned. These were not, however, the only planes that took off. Some had failed to return. Why?

Wald was the Sherlock Holmes of this analytical team. He noted that the returning planes did not have many bullet holes in the engines or core fuselage. Assuming that the axis artillery wasn’t very accurate—that their hits on allied airplanes were essentially random—Wald reasoned that the planes that failed to return were the ones that had been hit in the fuselage and engines.

Yes. Wald was “mining” the data. But to do that intelligently, he first had to think. And once he had done his thinking, he didn’t need a big computer to mine big data. For the important data were not the locations of the holes that were captured in some big data set. The key data were where the holes “that didn’t bark.”

As is almost always the case: Data thinking is much more important than data mining. And such thinking always starts with purpose: What are we trying to accomplish? Sell cookies and soup? Save planes and pilots?

Often, data thinking starts with small data. What patterns do we observe in a few data points? What patterns might we observe if we add more data? What did we learn from the few data points? What might we learn if we looked at different data?

What is a big number? A small number? Some short division with a few data points may be revealing. Simple, yet analytical, data thinking can reveal the size of the problem. Or the nature of the problem. Simple, yet analytical, thinking can suggest in what mine to look for what ore.

The supermarket chains are lucky. They know precisely what they want to accomplish. They have been pursuing this objective for a long time. They have accumulated lots of data. And they have people who have been thinking about these data. Thus, they know what questions their mining of their big data might answer.

Before you go mining big data, you have to think analytically with some small data. It’s data thinking that can prove to be really big.

Robert D. Behn, a lecturer at Harvard University’s John F. Kennedy School of Government, chairs the executive education program Driving Government Performance: Leadership Strategies That Produce Results. His book The PerformanceStat Potential will be published by Brookings in 2014. (Copyright 2014 Robert D. Behn)

(Image via Sur/

Close [ x ] More from GovExec

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Forecasting Cloud's Future

    Conversations with Federal, State, and Local Technology Leaders on Cloud-Driven Digital Transformation

  • The Big Data Campaign Trail

    With everyone so focused on security following recent breaches at federal, state and local government and education institutions, there has been little emphasis on the need for better operations. This report breaks down some of the biggest operational challenges in IT management and provides insight into how agencies and leaders can successfully solve some of the biggest lingering government IT issues.

  • Communicating Innovation in Federal Government

    Federal Government spending on ‘obsolete technology’ continues to increase. Supporting the twin pillars of improved digital service delivery for citizens on the one hand, and the increasingly optimized and flexible working practices for federal employees on the other, are neither easy nor inexpensive tasks. This whitepaper explores how federal agencies can leverage the value of existing agency technology assets while offering IT leaders the ability to implement the kind of employee productivity, citizen service improvements and security demanded by federal oversight.

  • IT Transformation Trends: Flash Storage as a Strategic IT Asset

    MIT Technology Review: Flash Storage As a Strategic IT Asset For the first time in decades, IT leaders now consider all-flash storage as a strategic IT asset. IT has become a new operating model that enables self-service with high performance, density and resiliency. It also offers the self-service agility of the public cloud combined with the security, performance, and cost-effectiveness of a private cloud. Download this MIT Technology Review paper to learn more about how all-flash storage is transforming the data center.

  • Ongoing Efforts in Veterans Health Care Modernization

    This report discusses the current state of veterans health care


When you download a report, your information may be shared with the underwriters of that document.