Under the Microscope

Before doling out money, budget officials are taking a closer look at how well federal programs are working.

M

ary Cassell sat at the table patiently answering a barrage of questions about her proposal to zero out funding for the Education Department's Even Start literacy program in the president's fiscal 2004 budget. The Office of Management and Budget examiner had determined that the program was ineffective. Education officials, policy wonks and budget crunchers wanted to know how she reached that conclusion.

Cassell's review suggested that Even Start has had virtually no impact. She referred to recent national evaluations suggesting that children enrolled in the program were no more likely to become fluent readers than those not in the program.

Under Even Start, created in 1989, grant money flows from the federal government to state and local outreach programs. It started off with a modest $15 million annual budget that has since grown to $250 million. But for the past two years, the program's budget has remained flat, largely because policy-makers have had trouble determining how much impact it has had on ending illiteracy.

In her written analysis, Cassell said that Even Start is "duplicative of several other programs that perform as well, and often better, than Even Start." Her assumptions, according to sources who attended the presentation, were based on the national evaluations of Even Start as well as evaluations of other department programs. The panel of experts questioned why Cassell chose to focus on child literacy, when the program also seeks to help adults. According to participants at the review session, Cassell acknowledged that the assessment was tilted toward children, but suggested that childhood literacy ought to be the program's top priority.

Some panelists wondered whether the review was sufficiently comprehensive, noting that OMB was relying on national studies of only 300 families in five states. Other evaluations have shown that Even Start does make a difference in the fight against illiteracy. In 2000, the Education Department studied 122 separate evaluations of the program. While the results varied, the evaluations found that "Even Start children were generally seen as scoring within age-appropriate norms on standardized tests." Other panelists questioned whether it was fair for OMB to pass judgment on the program when states only recently have been held accountable for meeting goals and collecting data.

As the questions kept coming, Christopher Wye, director of the National Academy of Public Administration's Center for Improving Government Performance, leaned over to an OMB official sitting next to him and asked, "Is this what it's like with the director?"

"Yes, only it's more grueling with the director," the OMB official responded. The director, of course, is Mitch Daniels, the Bush administration's top budget executive. Cassell was a few months away from making the same presentation to her boss. On this June day, she was going through a dress rehearsal, presenting her analysis to a panel of experts pulled together by the academy. It was the first public showing for a new initiative OMB is using to gauge the effectiveness of federal programs.

Called the Program Assessment Rating Tool, or PART, the initiative is key to the administration's budget-making process. OMB put 20 percent of federal programs under this microscope for effectiveness while developing the fiscal 2004 budget. An additional 20 percent will go through the process in each of the next few years. While the PART is not the only determinant in setting funding levels in the president's budget, OMB is relying heavily on the initiative to shape spending decisions.

The PART helps examiners gain insight into how well programs are managed and performing, but it is limited by the fact that few agencies have reliable performance data. A decade after the 1993 Government Performance and Results Act was enacted, agencies still are struggling to collect information about the overall impact of their programs. Few agencies have the capability or resources to thoroughly evaluate their efforts.

The PART approach is relatively simple, designed largely to force agencies to take performance measurement seriously. It forces agencies to answer a series of questions about their programs, including how well they are achieving certain goals. And, for the first time, this is being done as part of the budget making process. "We are trying to breath some life into GPRA," says Marcus Peacock, OMB's associate director for natural resources programs. Peacock is the budget office's point person for the PART initiative and is responsible for getting agencies to do a better job of integrating budgets with performance.

SNAPSHOT

The June dress rehearsal sponsored by the National Academy of Public Administration had two purposes: to demonstrate how the PART is being used, and to provide OMB guidance on how it can be improved. Interviews with several panelists at the session revealed a healthy dose of skepticism mixed with cautious optimism.

"What I like about it is it adds some rigor to what is routinely done by budget examiners," says Barry White, director of government performance projects at the Council for Excellence in Government, a Washington-based nonprofit group. White spent 25 years at OMB. "It creates consistency across the board, so each examiner is looking at the same thing. And they are going to publish the results. That is amazing."

Still, there is ample concern that the PART does not delve deeply into program performance. Even Start, for instance, must be judged partly on how it interacts with other federal, state and local education programs. But the PART analysis of the program poses just one question on this issue: "Does the program collaborate and coordinate effectively with related programs that share similar goals and objectives?"

Critics also note that, for now, the PART only offers a snapshot of performance data. It does not track trends to show effectiveness over time. Peacock acknowledges that the initiative is not good at measuring progress yet. He speculates it will take at least five years before the PART can be used to show trends.

YES OR NO

"I'm skeptical but positive," says Mortimer Downey, a principal consultant with pbConsult Inc., which provides consulting services to leaders of infrastructure projects. Downey chairs an advisory board OMB put together to monitor how the PART is implemented. "I really think they are trying to improve the budgeting process . . . but the quality of the data has to be improved. OMB has to link this to the Results Act and show that it is not just another paperwork exercise," Downey says.

The assessment tool covers four areas, each largely derived from the Results Act: Purpose of the program, strategic planning, management and results. Each section contains questions that can be answered with a simple "yes" or "no." Overall, there are 36 questions. While there is some give and take between OMB and the agencies in reaching the answers, the final decision rests with OMB. Budget examiners interviewed for this story say decisions are based on conversations with agency officials as well as their own experience with previous budgets.

Each answer carries a numerical weight, resulting in a final score that will be published in documents released with the budget. During the grading, which takes place while budget requests are moving through the approval process, agencies can appeal their scores to an interagency task force set up by OMB. About 40 percent of scores were appealed, according to Peacock. Some agencies submitted blanket appeals, questioning every score, while others were more selective. At the Interior Department, where 15 programs were evaluated, officials appealed only three of the reviews, for instance.

Critics say that the "yes" or "no" responses are too simplistic, but Peacock is quick to point out that every answer must be supported by hard evidence. For example, to answer the question, "Has the program demonstrated adequate progress in achieving its long-term outcome goal(s)?" agencies must supply data detailing program performance. Such data can include customer satisfaction reports, independent evaluations or information collected to comply with the Results Act.

"The power of all of this is in the diagnostics," Peacock says. "It's in the evidence that one presents to get the 'yes' or 'no.' If you get a low score in one of the sections, it gives you a sense of where the weaknesses are. If there is a low score in management, then you go to that section and see where you can strengthen program management."

Nonetheless, Downey worries that the final score is what will gain headlines. Some members of the advisory panel are pushing for OMB to separate the results section from the other three categories.

The fear among many agency officials is that programs will be shortchanged based on their PART scores. This is especially true at agencies that already feel threatened by Republican administrations. Several sources at the Environmental Protection Agency, for instance, worry that their programs will be hit hard. Speaking on the condition of anonymity, a senior EPA official says a number of programs are bound to get low scores largely because they don't have adequate data to satisfy OMB's requests. EPA relies heavily on states to implement its rules and collect data-nearly 40 percent of its budget is distributed in grants. Yet data collection at the state level varies considerably.

An OMB examiner familiar with EPA's work says it's the agency's responsibility to hold states accountable for spending grant money wisely. If states are not collecting data-or the right data-then EPA should find out why.

INFORMATION VOID

OMB's goal with the PART initiative is to incorporate performance data into the budget process. Historically, few agencies have made it a priority to collect and analyze such information, because it has not come into play during the budget process. While agencies have complied with Results Act mandates to set goals and develop performance measures, little of that information gets at the core of how well a program is serving its constituents, says Harry Hatry, director of the public management program at the Urban Institute, a Washington-based think tank.

"Part of the dilemma for OMB is that the current state of measurements is weak," says Hatry. "Most programs don't have great data, so the PART will have trouble getting at results."

Hatry and other evaluation experts see OMB's effort as a mixed blessing. On the one hand, they are concerned that the initiative is being touted as a way to measure effectiveness, when, in fact, it is largely focused on management issues and lacks the sophisticated analysis needed to truly assess complicated federal programs. On the other hand, they say, if the initiative is successful, it could create a groundswell for more thorough evaluations.

Should that happen, agencies would be playing an intense game of catch-up. For the past 20 years, program evaluation divisions in the federal government have all but vanished, though they were prominent in the 1960s and 1970s. Nearly every department had an office dedicated to conducting large-scale evaluations. Those offices were hit hard by budget cuts during the 1980s. Since their studies took years to complete, agency heads saw little value in funding exercises that wouldn't be finished until long after they left office. Additionally, evaluation professionals isolated themselves.

"We committed every sin," says NAPA's Wye, who conducted evaluations at the Department of Housing and Urban Development. "We were too academic. We wanted to be deliberate in our work, so we took too long to conduct studies. And we didn't have data to answer the question of the day. We were looking further down the road. Political appointees needed answers for the here and now."

At some agencies, the decline in evaluations has been dramatic. At the Agency for International Development, for example, the number of evaluations dropped from 425 a year in 1993 to 136 in 1999. This was partly due to a 1995 policy change stipulating that only senior managers could order that a program be evaluated. Before, nearly every AID program went through some form of evaluation.

EPA's experience provides another example of the decline in evaluation capacity. During the 1970s, the agency had an evaluation office with more than 30 employees. But by the early 1990s, it was down to 15 people. In 1995, the evaluation division was dissolved and remaining staff, by then fewer than 10 employees, were sent to various offices throughout the agency.

For the past couple of years, EPA has been trying to rebuild its evaluation capacity. In 2000, the agency's Office of Policy, Economics and Innovation was given responsibility for evaluation efforts. Katherine Dawes is now EPA's chief evaluator. But with a budget of only $250,000, Dawes' efforts are severely limited.

Dawes would like to embark on long-term evaluations, but says her division doesn't have the time or money to do so. So, the office is left to conduct so-called "process evaluations." These are more management-oriented. They look at how well a program is being run, determine whether rules are being followed and search for bottlenecks in the system. Results or outcomes are generally not factored in. For example, a recent study for the EPA's water office looked at why there was a backlog of permits for various regulations. The study was aimed at streamlining the process, not at determining whether the regulatory approach was effective.

Conducting long-term evaluations of how programs are actually working "is something that people universally say they want and say is important," Dawes says. "But do you want to set aside money and resources to do it?"

Dawes is not the only one at EPA struggling with that dilemma. Two years ago, the agency's inspector general also created an evaluation office. Distinct from auditing work that focuses on implementation of programs, this division tries to get at performance. "In an audit, you are looking at the delivery of services," says Kwai-Cheung Chan, assistant inspector general for program evaluation. "On the evaluation side, we are asking the question, 'Did it make any difference?' "

The Education Department, while way ahead of most agencies in terms of the number of studies it conducts, also is entering a new era. The department spends nearly $100 million a year on program evaluation and data collection. Yet policy-makers don't feel that the data is useful. Nor do local educators looking for ways to improve their schools.

To a large degree, Education's evaluations have focused on compliance-how well states are meeting federal requirements-according to Michael Petrilli, special assistant for policy and planning at the department. Last April, the department issued policy guidance suggesting a change in direction. Now the bulk of evaluations the department plans to fund will look at performance and results.

By looking at scientific evidence, the department hopes to gain a better understanding of how its programs are affecting education across the nation. Armed with that data, officials in Washington can refine policy. More importantly, says Petrilli, the department can help state and local educators target resources where they would do the most good.

Efforts to revive program evaluation will go only as far as the data takes them. If evaluations follow the same route they took in the 1970s, they'll get the same lackluster response.

"Evaluations have to serve the decision makers," says Jonathan Breul, a consultant with IBM Global Services. Breul recently retired from OMB, where he was instrumental in implementing the Results Act. The challenge, he says, is getting appropriators and political appointees to care about this type of information.

Performance measures tell you whether your program is a winner or a loser, Breul says. "Evaluations are the story behind that-why did you win or lose? What can you do to improve so you play better the next day?"


NEXT STORY: Letters