Features

I'm OK, You're Outstanding

July 1, 2006

Under new performance-management systems, not everyone gets to be a star.

The e-mail is of dubious origin and has likely circled the Internet multiple times. It claims to contain "actual quotes" from federal employee performance evaluations. Among the zingers: "This young lady has delusions of adequacy," "He brings a lot of joy whenever he leaves the room" and "Works well under constant supervision and cornered like a rat in a trap."

The comments are amusing, but far from typical. In past years, supervisors have rated more than 99 percent of federal employees fully successful or better, according to the Merit Systems Protection Board. Experts say performance ratings in certain agencies are so inflated that they no longer carry much meaning. "It's like Lake Wobegon-everybody's above average," says Hannah Sistare, vice president for academy affairs at the National Academy of Public Administration.

Why? Human nature, for one thing, as well as a lack of support from above. "The easy thing to do is to make folks happy, so you give them a higher rating," says John Palguta, vice president for public policy and research at the Partnership for Public Service, a Washington-based nonprofit. "There are no consequences, so why not give them a good rating?" says Howard Risher, a consultant and author on performance-based pay issues.

That's starting to change, according to Doris Hausser, senior policy adviser at the Office of Personnel Management. OPM has taken aim at performance evaluations that fail to distinguish between adequate workers and real stars-including pass/fail ratings and multilevel rating systems in which most employees are clustered at the top of the scale. Says Palguta: "Managers are being told, 'You've got to take this seriously now.' "

Meaningful Distinctions

Accurate appraisals are key to pay for performance. If an agency can't identify its high achievers, then it can't reward their superior work. And if it gives too many top marks, it either has to spread rewards thin or swallow a large payroll increase. Such problems have plagued earlier attempts at instituting performance pay, including a merit pay plan for General Schedule grades 13 to 15, which was created by the 1978 Civil Service Reform Act. The majority of employees under that system received ratings of "outstanding" or "exceeds fully successful." By the mid-1990s, ratings had become so skewed that agencies were allowed to use a pass/fail evaluation, with a third category of "minimally successful" for the Senior Executive Service.

Current regulations still allow such simplified evaluations, but the Bush administration is using the traffic-light-style President's Management Agenda score card to push for greater differentiation. "If you want to be green, you can't have a pass/fail system," Hausser says. And in early May, Sen. George Voinovich, R-Ohio, announced that he would introduce legislation requiring more thorough appraisals for all employees. His bill, which would ban pass/fail reviews, is intended to beef up evaluations in preparation for governmentwide pay for performance.

Senior executives already have pay for performance, and OPM must certify each agency's evaluation system before they receive performance-based raises. To get certified, the agency must show that it makes "meaningful distinctions" among levels of performance. The number of career SES members receiving the highest available rating (in some agencies, "fully successful") fell from 83.7 percent in fiscal 2001 to 59.4 percent in fiscal 2004, according to OPM. "Agencies are getting much more disciplined about letting 'outstanding' mean something," Hausser says.

The impression that OPM is forcing agencies to lower their ratings has raised some concerns. Agencies shouldn't assume that a disproportionate number of senior executives getting high scores means the evaluations are inaccurate, argues Carol A. Bonosaro, president of the Senior Executives Association, a professional advocacy group. "Do you expect a normal distribution of height among the basketball team? We don't think this is a randomly selected group," she says.

OPM is looking for accurate ratings, not a bell curve, says Hausser. If an agency has a large proportion of top ratings, "We will then look and see, does that reflect the performance of the agency," she says. High scores aren't automatic grounds for denying certification. The National Science Foundation gave "exceptional" ratings to 66 percent of its senior executives for fiscal 2005. When OPM and Office of Management and Budget officials compared the ratings to NSF's results on the President's Management Agenda and the Program Assessment Rating Tool, they found the ratings justified. "If you've got evidence that you've had an outstanding performance year, that's OK," Hausser says.

Falling Ratings

Enforcing accuracy "requires a tough culture change," Palguta says. "Many managers are finding it's not as easy as it used to be, and they're getting more push back from their employees." At the Energy Department, Chief Human Capital Officer Claudia Cross has emphasized that only those who "have done some extraordinary, next-to-impossible thing" should earn the top rating in Energy's five-tier system. "The people who had the hardest time with this were not the executives themselves, but the people who had to rate the executives," Cross says. "We had a culture where we didn't want to offend anybody by saying they were less than spectacular."

The Government Accountability Office, which has a pay-for-performance system, had "very serious rating inflation" before Comptroller General David M. Walker arrived and made accurate appraisals a priority, says Susan Kladiva, special assistant to the comptroller general for performance management systems. On a five-point scale, the agency average peaked at 4.62 in 1998. GAO began enforcing performance standards as they were written while it developed a system that gives employees scores in multiple areas. The average of those scores is weighted to adjust for harder or easier raters, and then converted into a rating.

In 2002, the first year under the system, average ratings fell to 2.19. "We were faced with the fact that it's very difficult for people who had a 4.62 rating to get 'meets expectations,' " Kladiva says. Walker met with staff and held chats that were broadcast to all employees' computer monitors to convey the message that "If you're meeting expectations, you're doing just fine," Kladiva says.

Putting all the Labor Department's agencies on the same five-tier system has allowed for better oversight, says Suzy Barker, director of human resources, policy and accountability. The department tied standards for employees at each level to overall organizational goals. "The message is really clear . . . we're here for a reason, and here's where you fit in," Barker says. "It's easier for managers to say, 'Look, here was what was expected and here is how you performed, and so here is your rating.' "

Frequent Tinkering

Agencies that pioneered pay for performance tried a number of approaches. The Federal Deposit Insurance Corporation, which went to pay for performance in 1999, used to give each employee a summary score that translated directly to pay, but that caused employees to fixate on numbers. "That's really not what you want to be focused on," says Chris Aiello, FDIC's associate director for human resources. "Employees and managers should be identifying areas for possible improvement." Now FDIC grades employees in five areas. Employees are grouped according to how well they did relative to their peers, and those groupings determine pay raises.

The National Credit Union Administration, which instituted pay for performance in the early 1990s, took the opposite approach when it revamped its performance management system in 2001: NCUA tied pay decisions more directly to ratings. Since then, the number of employees scoring in the "exceptional" category has hovered around 15 percent. "We don't focus on inflation," says Sherry D. Turpenoff, NCUA's director of human resources. "We focus on documenting performance." NCUA continually updates its standards so "exceptional" employees will have to do even better the following year.

The Internal Revenue Service, which launched a paybanding system for managers in 2001, emphasizes concrete, measurable standards, but also takes a more direct approach to preventing inflated ratings. It assigns each of its 10 business units a total number of points based on past rating patterns and on how successful the unit was that year. The units then distribute those points-and no more-to managers. This system holds ratings to the distribution pattern from previous years (an average of "exceeds expectations") unless the business unit's overall results justify higher scores.

"You do have to draw those very hard distinctions in performance," says IRS Chief Human Capital Officer Beverly Ortega Babers. "It causes people to have to take an honest look at themselves and the performance of their employees." But, Babers notes, the process hasn't been popular with managers. "We're looking to see what [other agencies] are doing in that area now," she says.

In that regard, the IRS is not alone. Human resources executives across government say performance management systems need frequent tinkering to keep appraisal inflation in check and ensure that performance standards match up with organization goals. Many are keeping an eye out to see what works at other agencies. "I think we have morphed our system one way or another every year," says Energy's Cross. But, she adds, technical details are less important than training supervisors to communicate expectations, provide ongoing feedback and give grades that match the standards. "If we focus on it as a system or a form, then we're missing most of it," Cross says.

NEXT STORY: Money and Time