Getting predictions wrong can be costly. It’s not just weather or loan defaults that get predicted. The intelligence community in 2002 officially concluded that Iraq had weapons of mass destruction and that was part of the rationale for why we went to war. They were wrong. How can similar misjudgments be avoided in the future?
A recent article in Harvard Business Review, by Paul Schoemaker and Philip Tetlock, describes how organizations—and you—can become better at judging the likelihood of uncertain events. While they write about commercial companies’ use of prediction tools, their insights apply to government as well.
In the wake of the Iraq intelligence failure, the Intelligence Advanced Research Projects Activity set out in 2011 to determine if it was possible to improve predictions of uncertain events. It selected five academic research teams to compete in a multi-year prediction tournament. The initiative ran from 2011-2015 and “recruited more than 25,000 forecasters who make well over a million predictions.” Forecasters were challenged to respond to such questions as: Would Greece exit the Eurozone? What’s the likelihood of a financial panic in China?
Some teams used computer algorithms. Others focused on the use of expert knowledge, and still others turned to the “wisdom of crowds.” They knew they were competing against each other. They didn’t know they were also competing against intelligence analysts with access to classified information. Who was better? What did the winners do differently?
One team, the Good Judgment Project, consistently beat the others, offering “a 50 plus percent reduction in error compared to the current state-of-the-art,” according to IARPA. This was judged to be “the largest improvement in judgmental forecasting accuracy” ever observed in the public policy literature. In fact, they were on average about 30 percent more accurate than intelligence analysts with access to secret data. The results were so stark that after two years, IARPA cancelled its contracts with the other teams and focused on the approach the Good Judgment Project was using.
The Good Judgment Project team was led by several academics from the Wharton School at the University of Pennsylvania, including Tetlock and Schoemaker. “The goal was to determine whether some people are naturally better than others at prediction and whether prediction performance could be enhanced,” they said. They demonstrated, over the four-year span of the project, that yes, it could be done by applying several techniques.
In their article, they write that their approach focuses “on improving individuals’ forecasting ability through training; using teams to boost accuracy; and tracking prediction performance and providing rapid feedback.”
According to Schoemaker and Tetlock, “Training in reasoning and debiasing can reliably strengthen a firm’s forecasting competence. The Good Judgment Project demonstrated that as little as one hour of training improved forecasting accuracy by about 14 percent over the course of a year.” They found that training in basic probability concepts, such as regression to the mean, Bayesian revision, and statistical probability, were important tools for forecasters. But it also required that forecasts include a precise definition of what is to be predicted as well as the timeframe in question.
Successful forecasting requires an understanding of the role of cognitive biases, such as looking for information that confirms existing views. They noted that “It’s a tall order to debias human judgment.” Beginners must be trained to “watch out for confirmation bias that can create false confidence, and to give due weight to evidence that challenges their conclusions,” they said. They also caution against looking at problems in isolation and stress the need to understand psychological factors that may influence whether someone sees patterns in data that have no statistical basis.
How can you develop these skills? They offer one example of a financial trading company that trains new employees about the basics of statistical reasoning and cognitive traps by requiring them to play a lot of poker.
In the Good Judgment Project, several hundred forecasters were randomly assigned to work alone and several hundred to work collaboratively. Schoemaker and Tetlock found that the teams outperformed the solo forecasters. But having the right composition of talent on the teams was important. Natural forecasters are “cautious, humble, open-minded, analytical and good with numbers” as well as intellectually diverse, they said. In the course of their project, “When the best forecasters were culled, over successive rounds, into an elite group of superforecasters, their predictions were nearly twice as accurate as those made by untrained forecasters.”
They also note that team members “must trust one another and trust that leadership will defend their work and protect their jobs and reputation. Few things chill a forecasting team faster than a sense that its conclusions could threaten the team itself.”
Finally, Schoemaker and Tetlock say measuring, reporting, and making adjustments is a key element in successful forecasting. They write: “Bridge players, internal auditors, and oil geologists . . . shine at prediction thanks in part to robust feedback and incentives for improvement.” But the statistics aren’t enough. You can calculate a Brier score to measure the accuracy of a prediction and track them over time, but knowing your Brier score won’t necessarily improve your performance. The authors recommend that organizations “should systematically collect real-time accounts of how their top teams made judgments, keeping records of assumptions made, data used, experts consulted, external events, and so on.”
These three techniques won’t work on their own, however. Organizations “will capture this advantage only if respected leaders champion the effort,” Schoemaker and Tetlock note. But what about improving the abilities of individuals to become forecasters? The authors have created a public tournament, Good Judgment Open, so you can take a test to see if you have the traits shared by the best-performing forecasters. Give it a try.