Healthcare and the Moral Hazard Problem
The demand curve isn’t simple when lives are on the line.
Healthcare and the Moral Hazard ProblemChris Gash
Many companies are searching for tools to help them hire diverse, productive workforces. Even if diversity is not the main hiring goal, they may want to ensure they’re not overlooking talented individuals because of systemic discrimination, says Chicago Booth’s Rad Niazadeh.
Automated, data-driven algorithms, in conjunction with AI and machine learning, can support organizations in these efforts, suggests research he conducted with Booth PhD student Mohammad Reza Aminian and Yale’s Vahideh Manshadi. The increasing use of algorithms in hiring has raised concerns that they may reinforce human biases due to implicit bias present in the data they use. (For more, read “AI is only human.”) However, the researchers demonstrate that algorithms designed with fairness-and-diversity constraints can guide companies to interview a more diverse set of candidates and extend employment offers to a broader range of people—with a minimal cost.
Aminian, Manshadi, and Niazadeh propose an algorithmic framework for screening and hiring that includes a number of such constraints. Their framework is for sequential processes, meaning those in which candidates are evaluated one after another rather than all at once.
The researchers started by analyzing well-established “candidate priority indices”—also known as Weitzman indices, thanks to the pioneering work of the late Martin L. Weitzman in 1979. According to classical economics, hiring managers should use the Weitzman indices to devise an optimal strategy for interviewing and hiring candidates. Through an in-depth theoretical analysis, Aminian, Manshadi, and Niazadeh find that to make hiring outcomes fair and diverse, managers need to adjust these indices by increasing the priority of candidates from disadvantaged populations in a specific way and modulating the priority of other candidates.
An organization’s exact goals will drive the specific constraints and adjustments, the researchers argue. For example, if an engineering company wants to hire more high-quality female candidates, it’s not enough for it to simply interview more women. A more refined diversity constraint would prompt the company to include more high-quality female candidates in its interview pool. This would avoid tokenism, which is a way of gaming systems that require diversity and inclusion.
To demonstrate the applicability of their framework beyond theory, the researchers ran simulations of the algorithm, imposing various fairness and diversity constraints. In these simulations, hypothetical job candidates were marked as members of either a disadvantaged or a privileged demographic group.
The researchers estimated the candidates’ “quality” by assigning each person a short-term and a long-term score. The short-term score reflected formal qualifications (such as educational background). These scores were unequally distributed across all the candidates to reflect the impact of privilege on access to high-quality education and other resources. In contrast, the long-term score estimated the true quality that a person provides over time, due to characteristics—including intelligence, work ethic, and ambition—that the researchers assumed to be equally distributed across demographic groups.
The algorithm could see only the short-term scores, mirroring how real-world recruiters assess candidates on the basis of their résumés or interviews. But the researchers then used each candidate’s true-quality score to measure which of the hiring practices yielded more benefit in the long run. Which approach, by these simulations, truly led organizations to hire candidates who had the highest long-term scores?
The research indicates that by countering biases, quotas for minority candidates can lead companies to hire people who would benefit the organization in the long run and might otherwise be overlooked.
The findings suggest that automated, data-driven algorithms incorporating fairness and diversity constraints can lead companies to hire people who appear, on paper, to be less qualified than candidates brought on through a process that ignores demographics. But even in terms of employee quality on paper, the cost to a company is likely minimal to achieve a fairer outcome, according to the research.
“If you force a company to hire, on average, 10 women for every 10 men, you might reduce the number of top candidates they hire, such as those with the highest GPA or a degree from an Ivy League school, simply because you added an extra constraint to the search,” says Niazadeh. “But, in reality, you might not hurt the utility of the search by much.” He explains that there may be several optimal ways of hiring people, and while a demographics-blind policy yields the best results in terms of short-term scores, other methods are still reasonable.
The simulations also suggest that this kind of nondiscriminatory practice benefits organizations in the longer run: imposing quotas, even when one group boasts stronger qualifications than another, produces a better workforce (as measured by the hypothetical candidates’ true quality) than hiring on the basis of short-term scores alone. An organization will find better employees if it recruits, say, a 50/50 male-female team in which 16 out of 20 boast Ivy League degrees than if it hires 20 Ivy League graduates, the majority of whom are men. “Imposing socially aware constraints such as demographic parity or [a] quota can even make the search more efficient in terms of true unobserved qualities,” write the researchers.
The exception comes when extreme constraints are imposed in settings where systemic discrimination has created vastly disparate groups in terms of formal qualifications—for example, if the demand were that 10 Black STEM PhDs be hired for every 10 white STEM PhDs, despite the fact that, according to a report commissioned by the Alfred P. Sloan Foundation, only 5 percent of PhD holders in the science, technology, engineering, and math fields were Black as of 2021. Under circumstances such as these, the simulations reveal, positions often go unfilled, reducing the long-term utility of a team because the team itself is smaller than it should be.
Many people have thought about algorithmic fairness in decision-making, says Niazadeh. “When it comes to designing machine-learning algorithms for high-stakes applications such as loan decisions, computer scientists and economists have studied algorithms that favor disadvantaged groups. This is in response to evidence that demographics-blind ML algorithms discriminate due to skewed data,” he says. But the “fair” ML algorithms have tended to make straightforward choices based on one-time signals—for example, deciding whether a loan application gets approved on the basis of a potential borrower’s credit history.
Hiring decisions are often more complex in nature. Here, it takes time and resources beyond scanning a résumé to find out if a candidate is any good. Markers of quality are dynamic, since a hiring manager’s opinion of each candidate may change after a first interview, a second interview, and a site visit.
“That’s the technical challenge,” says Niazadeh. “Hiring a person is more complicated than opening Door 1, 2, or 3 and seeing what you get.” The researchers argue that the complexity calls for a Markovian scheduling framework. (A Markovian model, named for its creator, the late Andrey Markov, describes a sequence of events in which the probability of the next depends on the outcome of the previous one.) This framework goes beyond static ML problems and even Weitzman’s indices.
While the researchers’ algorithmic approach has the potential to influence hiring in many countries, especially when it involves a sequential search process such as in executive recruiting, Niazadeh predicts that US organizations might balk, given the political and legal questions around diversity and inclusion. Even those open to quotas may find the inner workings of the tools uncomfortable, he adds, because they rely on a degree of randomness: when two candidates appear to be equally qualified, the algorithm essentially flips a coin.
But he says that some policymakers have agreed to use randomization in selecting citizens for assemblies or juries and in distributing legislative seats. This approach, he says, helps achieve the optimum outcome under the fairest conditions, on average.
The demand curve isn’t simple when lives are on the line.
Healthcare and the Moral Hazard ProblemResearch finds China’s digital coupon programs were a cost-effective way to boost spending.
Why China’s Pandemic Stimulus Worked Better Than the US’sA study of demolitions in Chicago finds they raised housing costs, hurting renters.
Infographic: How Demolishing Public Housing Increased InequalityYour Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.