
sesame/Getty Images
The federal government has a major blind spot in how it evaluates programs
COMMENTARY | Decades of design flaws have left agencies struggling to measure the impact of their major initiatives. It is time to move beyond compliance and build real, durable capacity.
Every year, the federal government spends trillions of dollars on programs meant to improve lives. It spends remarkably little figuring out whether they work — and even less making sure those findings reach anyone who can act on them. That needs to change.
The federal government has been doing serious program evaluation, imperfectly but deliberately, since the Johnson administration. Agencies like HHS, Labor, Education, HUD and SSA have a long track record of large-scale experiments to identify what works in major social programs. Congress expanded that tradition in 2018 with the Foundations for Evidence-Based Policymaking Act — signed by President Trump in his first term — establishing government-wide evaluation requirements for the first time.
That history matters. So does an honest accounting of where things stand today.
What we got wrong — and what it cost us
The current moment has exposed three design failures that were always present in our national evaluation infrastructure.
I examine these and other structural failures in detail in a recent article in New Directions for Evaluation, but the short version is this: we built the wrong kind of national evaluation infrastructure.
The first is that we built dependency instead of capacity. When evaluation lives primarily in external contracts rather than in people, data systems and institutional culture, it is inherently fragile. The Institute of Education Sciences cut roughly 85% of its contracts last year, and evaluation capacity at one of the government's longest-standing research offices nearly disappeared overnight. IES has recently begun re-hiring and releasing long-stalled funds, but the episode revealed how little durable internal capability we actually built.
The second is that we created compliance requirements without the ecosystem to meet them. The Evidence Act required agencies to develop learning agendas, which are effectively plans to address priority research questions, but did not secure the resources, staff or connections to external researchers and practitioners needed to actually answer them. The Labor Department has seen evaluation funding reduced, including continued non-use of set-aside resources authorized under law. HHS's evaluation office at the Administration for Children and Families has been reorganized. HUD's research office is shifting. Across agencies, the pattern is the same: plans existed, capacity did not.
The third is that we never asked the hardest questions. We generally do not evaluate large, economically significant regulations. Some major programs have long been treated as effectively off-limits for rigorous assessment, even when evaluation could tell us how to implement them better and serve beneficiaries more effectively. Evaluation firms, researchers and practitioners built real capacity to study these questions, with capacity that benefits government and the public alike. The failure was not in their work; the failure was in the infrastructure designed to use the evaluations.
These are not new problems. They're structural ones, which means the current disruption, handled well, is also an opportunity to fix them.
A reform worth building on
One of the most consequential current developments in federal evaluation is happening with little fanfare. The current administration is connecting federal programs to a standardized list of outcomes they're designed to achieve, systematically asking for the first time what each program is actually supposed to accomplish. That work, expected to become publicly transparent for the first time later in 2026, would bring a kind of program-level visibility that hasn't been broadly available since the Program Assessment Rating Tool asked agencies about evaluation during the George W. Bush administration. When we can see what outcomes a program is designed to achieve, we can all ask better questions. Our questions also become more sophisticated, not just whether a program "works," but which components drive results, for whom, and at what cost.
That shift from program verdicts to component-level analysis is one of the most practical improvements evaluation policy could make for decision-makers operating on tight timelines and real budget constraints.
What needs to happen next
Progress on the Federal Program Inventory is a foundation and productive starting point. Several other changes would make it even more meaningful.
Evaluate more of what matters. Significant regulations and major programs should not be off-limits for serious evaluation. This is not about finding justifications to cut programs, though that may sometimes happen; it's about understanding how to implement them better and direct resources more effectively. Not every program will need a rigorous multi-year impact evaluation, and not every program will be ready for one. The right first question is whether an evaluation can actually improve a decision given what's already known and what decision-makers actually need. If not, other analytical approaches may serve the program and decision-makers better.
Build for durability, not just compliance. Building durable capacity to evaluate programs means investing in staff capability and data infrastructure inside the government, while also strengthening the connections to the external researchers, evaluators and practitioners who do much of this work — and ensuring agencies have the resources to actually use what they learn.
Align cost, timeline and usefulness. Evaluations that arrive years after a program has been redesigned don't serve anyone well. Rigorous methods matter, but fitness for the decision at hand has to be part of what "quality" means at the policy level. Agentic AI tools, administrative data linkages and rapid-cycle methods now make better, faster and cheaper evaluation genuinely feasible at federal scale in ways that weren't possible when the Evidence Act was designed.
The opportunity ahead
Six decades after the Johnson administration introduced rigorous program evaluation as a federal priority and seven years after the Evidence Act became law, we have the opportunity to build something genuinely better: faster, cheaper, more durable and more honest about what evaluation can and cannot do. That's a goal worth the effort, regardless of who holds power, and one that ultimately benefits every American taxpayer.
Nick Hart is president and CEO of the Data Foundation. He previously chaired the American Evaluation Association's Evaluation Policy Task Force and served as Policy and Research Director of the U.S. Commission on Evidence-Based Policymaking.




