In the summer of 2006, Todd Carlisle, a Google analyst with a doctorate in organizational psychology, designed a 300-question survey for every Google employee to fill out, The New York Times later reported.
Some questions were straightforward: Have you ever set a world record? Other queries had employees plot themselves on a spectrum: Please indicate your working style preference on a scale of 1 (work alone) to 5 (work in a team). Other questions were frivolous: What kind of pets do you own?
Carlisle crunched the data and compared it to measures of employee performance. He was looking for patterns to understand what attributes made a good Google worker. This was strongly related to another question that interested his boss, Lazlo Block, vice president of People Operations: What attributes could predict the perfect Google hire?
Ten years ago, Google was infamous for its complex application process and brain-teasers—something Block recently admitted were "a complete waste of time.” Google was essentially trying to Google the human-resources process: It wanted a search algorithm that could sift through tens of thousands of people—Google’s acceptance rate is about 0.2 percent, or 1/25th that of Harvard University—and return a list of the top candidates. But after a great deal of question-asking and number-crunching, it turned out that the best performance predictor wasn’t grade-point average, or type of pets, or an answer to the question, “How many times a day does a clock’s hands overlap?” The single best predictor was: absolutely nothing.
Hiring is hard. General managers know it. Startup founders know it. School principals and casting directors know it. But for readers who are none of those things, consider America’s most public hiring processes—aside from presidential elections, perhaps—which are sports drafts.
Every year, millions of Americans watch professional talent evaluators try to predict who will be the best future athletes in the NBA and NFL Drafts. Again and again, audiences get valuable lessons in the inability of experts to divine future talent. Scouts aren’t dumb. Overall, the first pick tends to be better than the tenth pick, and he tends to be better than 100th pick. But years after the draft, at least one squad almost always looks foolish. For every top five team that can’t believe it picked a Darko Milicic or Ryan Leaf, there is a top five team that can’t believe it missed a Stephen Curry or Tom Brady.
Hiring is hard for the same reason that dating is hard: Both sides are in the dark. "The fundamental economic problem in hiring is one of matching with costly search and bilateral asymmetric information,” Paul Oyer and Scott Schaefer write in "Personnel Economics.” In English, that means hiring is expensive, time-consuming, and inherently uncertain, because the hirer doesn't know what workers are the right fit, and the worker don’t know what hirers are the right fit.
But hiring is not hopeless. Like any consequential business decision, it has been exhaustively studied. The companies with the most successful hiring practices are the ones who learn which metrics and processes reliably help to predict performance and quickly identify the metrics and processes that are complete wastes of everybody’s time.
Here again, sports are instructive. One of the most important jobs of a scout is to isolate skills (which are inherent) from outcomes (which depend on a zillion variables). In baseball, for example, it’s impressive for any young pitcher to have far more wins than losses. But wins rely on factors like the team’s fielding and runs scored, which don’t have much or anything do with pitching quality. It’s far better for scouts to focus on metrics that the pitcher controls exclusively, like strikeouts and walks. In basketball, scouts have historically been seduced by high scorers on successful college teams. But work by David Berri has shown that raw points and Final Four appearances aren’t good predictors of NBA talent, at all. Instead, subtler statistics like rebounding, assists, and field-goal percentage were much more consistent metrics from year to year.
This is a fundamental challenge in hiring: identifying the metrics that actually predict employee success, rather than relying on the most available pieces of information. The gap between these two groups is quite clear in one of the most popular industries to study the science of hiring: American schools.
A great teacher is hard to identify and hard to measure. You can observe a star athlete’s quality in his or her own performance. But a teacher’s quality is reflected in the performance of other people—and (this part doesn’t make the evaluations any easier) those other people are very often children.
Even so, the academic literature suggests that principals seem to value the wrong metrics. They hire for credentials (e.g. teacher certificates) and proximity (e.g. does the applicant live in the same state as the schools?) while discounting the factors that make the best instructors.
A straightforward yet under-appreciated fact about instruction is that smart people make good teachers. College GPA and SAT scores correlate very highly with teaching performances, in a range of studies. But several papers in the last 20 years have shown that teaching applicants with higher GPAs don't receive more offers; instead, applicants in the same state as the school do. Meanwhile, basic credentials like graduate degrees and certifications are valued in applications, but they "have little or no power to explain variation in performance across teachers,” according to one recent study published in March by researchers at the University of Michigan, Columbia University, and Harvard Graduate School of Education.
This study used unique access to applicants, hires, and teacher performance in Washington, D.C., Public Schools. Like many similar papers, it found that undergraduate performance and scores on a teacher screening test both strongly predicted teacher effectiveness. (“Effectiveness,” in this case, was measured using D.C.'s IMPACT system, which evaluates teachers based on their year, subject, classroom observation grades, and reports from principals and assistant principals.)
The researchers found that DC public school principals consistently missed the best teaching prospects. For example, they hired more people who went to college in D.C. (which had nothing to do with better teaching) but ignored SAT scores and GPA (both of which were “significantly positively related to performance”). Some principals might assume that high-achieving workers will feel overqualified teaching in public school and quit after a year. But the study found no correlation between academic credentials and attrition. Instead, their basic conclusion was a stinging indictment of teacher hiring: The attributes of the best teaching prospects were "only weakly, if at all, associated with the likelihood of being hired.”
Teaching puts one individual in charge of a classroom. That’s different than being a member of a large product team within a larger company. Hiring team members requires filtering for different hard and soft skills, so that new employees can slip into established patterns of company behavior. In this case, many companies depend on asking their employees to double as HR recruiters by leaning on referrals.
Researchers have long known that referrals surface better job candidates. Referred candidates are more likely to get call backs, more likely to be hired, and more likely to stay at the company. Furthermore, they had a pretty good theory about why referrals work. Most hiring is a blind date, and referrals are an introduction. They give both sides a little bit more certainty and information about fit. But academics couldn’t figure out why referred candidates were actually better. A May 2013 paper suggests an simple answer: Company referrals don’t work because they yield smarter workers. They work because they yield better fits.
The study found that referrals produce "substantially higher profits per worker” who are "less likely to quit," "more innovative,” and "have fewer accidents—all this, even after controlling for factors like college, SAT scores, and IQ. Team-based companies require openness, compatibility, and a willingness to cooperate. Referral programs work because great employees pass along workers who similarly match the company culture.
Although they account for only six percent of total applications, referrals now result in more than a quarter of all hires at large companies, according to a recent paper from the Federal Reserve Bank of New York and MIT. But while referrals are extremely useful, they can create their own problems. Many industries—tech and media, for starters—are infamous for disproportionately hiring white, upper-middle class young men who went to elite colleges. Relying exclusively on referrals could deepen workplace homogeneity.
What’s more, referrals help winnow the applicant pool, but that’s not nearly enough. As the New York Fed study showed, the majority of jobs are still filled without referrals and the majority of referred candidates are still rejected. More important than a strong referral program is a strong interview process. How does a hiring manager distinguish between a merely acceptable candidate and the great one, without devoting thousands of hours learning the secret talents, hobbies, and motivations of every single applicant?
Google, which depends on referrals, once administered up to 25 interviews for each job candidate. Todd Carlisle, the organizational psychology doctorate who administered the company’s surveys in 2006, thought this might be overkill. He tested exactly how many interviews were necessary to be confident about a new hire. The right number of interviews per candidate, he discovered, was four. This new policy, which Google calls the Rule of Four, "shaved median time to hire to 47 days, compared to 90 to 180 days,” Lazlo Block wrote in his book Work Rules.
But Carlisle’s research revealed something deeper about the hiring process, which has resonance for every industry: No one manager at Google was very good, alone, at predicting who would make a good worker.
Four meticulously orchestrated Google interviews could identify successful hires with 86 percent confidence, and nobody at the company—no matter how long they had been at the company or how many candidates they had interviewed—could do any better than the aggregated wisdom of four interviewers. (Okay, technically, one employee could: a data center worker, named Nelson Abrasion, who interviewed exclusively for a "very distinctive skill set.”)
There are several reasons to aggregate interview scores. First, everybody is a little bit biased in one direction or another—toward older or younger, toward extroverts or introverts. Combining scores mitigates that inevitable bias. Second, sometimes people just have bad interviews, and it’s unreasonable to base an entire hiring decision exclusively on one 30 minute performance.
Third Google’s finding suggests there are no magical hirers in the world. There are no performance oracles who just know a good candidate when they see it.
This is perhaps the most interesting an important conclusion. In a November 2015 study, researchers looked at 15 firms that used a job test, which placed applicants into three buckets: green (positive), yellow (tentative), or red (negative). These job tests accurately predicted worker performance and retention. The greens stayed longer than the yellows. The yellows stayed longer than the reds.
But sometimes, the human-resources managers ignored the data and went with their gut. Why? Perhaps the managers thought they knew better than a cold piece of data. But these "gut” picks were busts, according to Mitchell Hoffman at the University of Toronto, Lisa Kahn at Yale University School of Management, and Danielle Li at Harvard Business School. When managers thought they were smarter than the hiring systems they set up, it's the systems that ended up looking smart.
It will always be difficult to predict fit and performance, because humans are complex and humans interacting in human systems are even more complex. The right lesson is more subtle: Hiring is hard, and nobody is very good at doing it alone, whether you’re a Google boss, a high-school principal, or a sports general manager. They need help—sometimes in the form of standardized tests, and sometimes in the form of aggregated interview reports. When it comes to identifying the best future talent, groups are better than individuals, data-plus-groups is better than groups alone, and nearly anything is better than brainteasers.