Andrei Tudoran/

Measuring Success: Will It Play in Peoria?

Small-scale testing of government performance has large-scale payoffs.

Well-designed small-scale testing can help government achieve greater impact not only at a lower cost but also in more fair, understandable ways. Government should embrace “test marketing” as a business-as-usual management practice, not an exceptional event.

When I was a kid, for reasons that now escape me, you often heard people ask, “Will it play in Peoria?” Peoria, Illinois, once seen as the prototypical American city, was used to “test market” new theater acts and new products before rolling them out across America. If a show, product or marketing pitch did not sell well in Peoria, a company knew not to waste money investing on a large scale.

Peoria no longer reigns supreme as the prototypical test market and many companies have grown increasingly sophisticated, testing different products and marketing pitches in different media for different demographic groups. They run small-scale experiments in which randomly selected consumers are exposed to different products and marketing pitches to compare, for example, how similar groups of people react to one message compared to alternate ones, or to receiving no message at all. The Internet has multiplied opportunities for this sort of comparison; off-the-shelf software now enables any website to concurrently test and compare different offers, messaging and content sequences on targeted subsets of site visitors.

Government, too, has now and then embraced small-scale tests of this sort. For example, after developing, testing, and refining a campaign to reduce drivers’ use of cellphones in a few small communities and then in two larger communities, the National Highway Traffic Safety Administration rolled out a nationwide campaign in 2014. On other occasions, government has undertaken randomized control trials to inform policy formulation, such as the very interesting “Moving to Opportunity” study that continues to yield valuable and surprising findings about the effect of relocating low-income families to higher-income neighborhoods.

Few such measured trials are used, however, to inform policy implementation. That is beginning to change, as Binyamin Appelbaum’s story in today's New York Times suggests. I recently ran across three other examples that make me hopeful that momentum for this badly needed change is picking up.

Reducing recruitment discrimination. Government agencies generally try to recruit a diverse group of highly qualified employees. Concerned about performance gaps between black and minority ethnic applicants and others on recruitment tests, police in the U.K. enlisted the Behaviuoral Insights Team to identify the causes of the gap and find ways to close it. Prior research suggested that the way people are introduced to a test (priming) and perceptions of stereotype threats can affect test scores. Working with the police, BIT designed a more welcoming, purpose-focused introduction. The police administered it and the historic test to two randomly chosen groups of applicants and found that the new introductory language nearly eliminated the performance difference between minority and other candidates. A remarkable result.

Designing information to avoid biasing decisions. Government often provides information to aid individual choices. How that information is presented can have a big effect. Researchers at Duke, University of Stirling, and Columbia found, for example, that switching the gold and bronze labels on Affordable Care Act plans changed many applicants’ choices to choose the gold plan regardless of actual plan characteristics, underscoring the need for government to “test market” informational material to make sure it is interpreted correctly before releasing it.

More effective, fair third-party regulatory audits.  Many financial, environmental, and other regulatory programs use private sector auditors paid by regulated parties to monitor compliance. This system has the advantage of enabling regulatory oversight to grow as the number of regulatory parties grows. It also has obvious problems, however, because regulated parties hire and pay the third-party auditors, making the regulator party the auditors’ customer. Researchers at Harvard and MIT worked with the environmental agency in Gujarat, India, to test how changing the way auditors were assigned, paying auditors out of a central fund rather than directly by regulated parties, and auditing the auditors among other changes not only reduced false reporting by the auditors but also pollution emissions.

These examples suggest the potential value of small-scale testing not only for making government more effective, but also more productive, fair and understandable. All government program managers should start incorporating these sorts of well-designed measured tests into their operations, tapping in-house staff, agency evaluation and performance shops, and academic partners if needed. Not doing so risks a serious waste of government resources.

Shelley H. Metzenbaum is senior adviser to the Volcker Alliance. She is a former associate director for performance and personnel management for the Office of Management and Budget. 

(Image via Andrei Tudoran/