Sur/Shutterstock.com

Got Big Data? First Define Your Big Question

Simple, yet analytical, data thinking can reveal the size of the problem.

Big Data is big. Really BIG. Indeed, the definition from the McKinsey Global Institute, which coined the phrase “big data,” is “data sets whose size is beyond the ability of typical database software tools to capture, manage and analyze.” Big Data is so big that your organization (almost by definition) cannot cope with it.

If, however, your organization does have big software, it might be able to mine some big data for some analytical nuggets. Such data mining, to again quote from McKinsey, is “a set of techniques to extract patterns from large data sets by combining methods from statistics and machine learning with database management.”

But what kind of patterns might your organization seek to extract? If you are looking for crime patterns in your city, you don’t start with sophisticated software. For policing, as CompStat illustrated, an excellent first-order analytical tool is dots on a map. When the data are presented this way, you don’t need a degree in statistics to interpret them.

About a decade ago, I was at a party with a bunch of young quants. They were getting (or had already gotten) their Ph.D.s from MIT or Harvard in some quantitative discipline. One of these Ph.D.s had deserted his intellectual field to work for a supermarket chain. He was charged with mining all of the chain’s data on sales and product placement to determine where in its stores to display which products. For example, which ones should be given those priority spaces at the end of which aisle? To answer this question, the chain had lots of data and lots of computers.

I confess that I thought this analytical task had a very low meaning quotient. I long ago figured out that every grocery store puts the milk at the very back. Everyone needs milk. Indeed, some people come into the store for the single purpose of buying milk. And, if in doing so, they walk past cookies or soup they might make an impulse purchase.

But notice: For this chain’s effort to mine its big data, it had already defined its big question.

But how do we go mining for something that we don’t know is there? For something that we may not know exists? Before people go data mining, they have to do serious data thinking.

During World War II, the allies were analyzing the bullet-hole data from bombers returning from missions over continental Europe. The analysts were not, however, randomly mining the data. They were trying to answer a specific question: How could they improve these planes’ survivability? What parts of the aircraft should they reinforce with armor?

All of the analysts observed where the planes had been hit: primarily on the wings and the tail. So they recommended reinforcing these sections. Like Sherlock Holmes’ Watson, they could see, but they did not observe.

One statistician, however, dissented. Abraham Wald observed that the data came only from the planes that returned. These were not, however, the only planes that took off. Some had failed to return. Why?

Wald was the Sherlock Holmes of this analytical team. He noted that the returning planes did not have many bullet holes in the engines or core fuselage. Assuming that the axis artillery wasn’t very accurate—that their hits on allied airplanes were essentially random—Wald reasoned that the planes that failed to return were the ones that had been hit in the fuselage and engines.

Yes. Wald was “mining” the data. But to do that intelligently, he first had to think. And once he had done his thinking, he didn’t need a big computer to mine big data. For the important data were not the locations of the holes that were captured in some big data set. The key data were where the holes “that didn’t bark.”

As is almost always the case: Data thinking is much more important than data mining. And such thinking always starts with purpose: What are we trying to accomplish? Sell cookies and soup? Save planes and pilots?

Often, data thinking starts with small data. What patterns do we observe in a few data points? What patterns might we observe if we add more data? What did we learn from the few data points? What might we learn if we looked at different data?

What is a big number? A small number? Some short division with a few data points may be revealing. Simple, yet analytical, data thinking can reveal the size of the problem. Or the nature of the problem. Simple, yet analytical, thinking can suggest in what mine to look for what ore.

The supermarket chains are lucky. They know precisely what they want to accomplish. They have been pursuing this objective for a long time. They have accumulated lots of data. And they have people who have been thinking about these data. Thus, they know what questions their mining of their big data might answer.

Before you go mining big data, you have to think analytically with some small data. It’s data thinking that can prove to be really big.

Robert D. Behn, a lecturer at Harvard University’s John F. Kennedy School of Government, chairs the executive education program Driving Government Performance: Leadership Strategies That Produce Results. His book The PerformanceStat Potential will be published by Brookings in 2014. (Copyright 2014 Robert D. Behn)

(Image via Sur/Shutterstock.com)

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.