What If Your Coworker Earns More than You?
Chicago Booth’s George Wu analyzes a challenging workplace conundrum.
What If Your Coworker Earns More than You?Martín León Barreto
The history of the social sciences has included a succession of advances in the ability to make observations and carefully test hypotheses. The compilation of massive data sets, for example, and the huge leaps in computing power that make analyzing big data possible have opened up new avenues of scientific inquiry. But when it comes to generating hypotheses to test, things haven’t evolved as much.
Machine learning may change that, according to University of Chicago Harris School of Public Policy’s Jens Ludwig and Chicago Booth’s Sendhil Mullainathan. Predictive algorithms find patterns in complex data that could lead to testable hypotheses, they suggest.
Imagine, for instance, an algorithm trained to predict whether high-school students will drop out, using a text analysis of their academic essays. The algorithm may detect a correlation between some subtle feature of student writing and dropout risk, allowing it to predict with some accuracy which students will leave school early. The relationship it observes could then form the basis of a hypothesis that social scientists could subject to empirical testing to confirm causality.
The problem is that the algorithms that pick up these patterns are a “black box” to humans. We can see the predictions but not the connections in the data that led to them. What we need, the researchers say, is a way to translate the opaque signals detected by algorithms into ideas that humans can understand and test.
Ludwig and Mullainathan suggest a method for doing that, outlining a process for creating a “communication vehicle” between the algorithm and human scientists. This vehicle is a second algorithm whose function is to create synthetic data similar to those used as the input for the first algorithm. The communication algorithm generates sample input data and then tweaks—or morphs, as the researchers put it—these synthetic data in ways it expects would change the first algorithm’s predicted outcome.
The two sets of data that the communication algorithm produces—the sample data and the morphed version of them—are, therefore, identical along dimensions that don’t matter for the first algorithm’s prediction, and divergent in the areas that the first algorithm finds relevant. Scientists can then show these paired data sets to people and ask them to name what’s different about them. Humans may lack algorithms’ incredible ability to spot patterns, the researchers say, but they’re adept at spotting differences between two things.
If humans can consistently identify what feature the algorithm is picking up on, that feature can then be explored as a hypothetical explanation for the behavior being studied.
The researchers demonstrate this process using an algorithm trained to predict whether a judge will release a criminal defendant on bail or make the person await trial in jail. They limited the algorithm’s input to just the defendants’ mugshots. While the algorithm worked better with more data, Ludwig and Mullainathan find that even when fed those additional inputs, as much as 45 percent of the model’s predictive power came from the mugshot alone.
Because the prediction algorithm took mugshots as its input, the communication algorithm generated pairs of artificial mugshots: one photorealistic face of a digitally produced “defendant,” and a second face similar to the first, but different along the lines the first algorithm found useful for making its prediction.
Ludwig and Mullainathan then had each of 54 study participants examine 50 pairs of these “morphed” mugshots. Asked to predict which of the images depicted a higher-risk defendant and given feedback on their answers, the participants quickly became adept at identifying the correct mugshot. After some practice, they were able to do so about two-thirds of the time. That success rate was well above the rate for participants asked to make similar predictions on the basis of real pairs of mugshots.
Nearly 40 percent of those shown the morphed image pairs homed in on a similar difference between them: how well-groomed the people in the images were. When the researchers held grooming constant between the morphed image pairs, participants identified “heavy faced” (that is, faces that were rounder, wider, puffier, or otherwise bulkier) as a second salient difference.
To confirm that the algorithm successfully communicated with the humans comparing the morphed images—that the patterns it detected were the same ones the human observers identified—the researchers asked other participants to rate actual mugshots according to how well groomed and heavy faced the subjects were. They find that the ratings for both correlated with the algorithm’s prediction of detention risk, suggesting those traits were meaningful to the algorithm. Participants were able to recognize and name characteristics the algorithm had noticed as important patterns in the data.
The division of labor created by this process plays to the strengths of humans and A.I., Ludwig and Mullainathan argue. “Our goal is to marry people’s unique knowledge of what is comprehensible with an algorithm’s superior capacity to find meaningful correlations in the data,” they write.
Although in this instance the researchers applied the process to an algorithm that used an image as input, they suggest that the procedure could work whenever researchers want to analyze statistically predictable behavior and have “unstructured, high-dimensional data” such as text, video, or location that can be morphed.
Such data, they note, are being generated constantly for nonscientific purposes. “Data on human behavior is exploding: second-by-second price and volume data in asset markets, high-frequency cellphone data on location and usage, CCTV camera and body-cam footage, news stories, the entire text of corporate filings and so on,” they write. “The kind of information researchers once relied on for inspiration is now machine readable: what was once solely mental data is increasingly becoming actual data.”
If scientists are able to work with algorithms to observe meaningful patterns in data, it will become all the easier to turn the data into fuel for useful insights about human decision-making and activity.
Jens Ludwig and Sendhil Mullainathan, “Machine Learning as a Tool for Hypothesis Generation,” Working paper, March 2023.
Chicago Booth’s George Wu analyzes a challenging workplace conundrum.
What If Your Coworker Earns More than You?‘Demographic stickiness’ affects how new members of some groups are chosen.
When a Black Judge Retires, How Often Is the New Appointee Black?Chicago Booth’s Richard H. Thaler and Harvard’s Steven Pinker discuss whether people behave irrationally, and if so, why.
How Irrational Are We?Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.