We often base our decisions on predictions of future outcomes that never come with absolute certainty. If ML is a black box that outputs a number, that number is one in a range of values within which the truth is most likely contained. To improve machine-assisted decision-making in real life, statisticians are trying, with theory and data, to quantify the uncertainty of ML predictions. They measure uncertainty with probability in much the same way we measure temperature with a thermometer, Booth’s Veronika Ročková explains. How this is done depends on the type of ML method involved, of which there are a variety, including decision trees, random forests, deep neural networks, Bayesian additive regression trees (BART), and the least absolute shrinkage and selection operator (LASSO). (While these sound complicated, and can be, decision trees are essentially detailed flowcharts that start with a root, create branches, and end with leaves.)
If we understand why a black-box method works, we can trust it more with our decisions, explains Ročková, one of the researchers trying to narrow the gap between what’s done in practice and what’s known in theory. Booth’s Christian B. Hansen, often working with MIT’s Victor Chernozhukov, was one of the first to show how LASSO could be used to solve questions related to inference and causal inference, giving people more confidence to use and trust it. BART is a widely used ML method, available as free software, and Ročková’s research was the first to show theoretically why it works. Her research on uncertainty quantification for Bayesian ML won her a prestigious National Science Foundation award for early-career researchers.
ML, she says, is “a wonderful toolbox, but for people to feel safe using it, we need to better understand its strengths and limitations. For example, there have been studies where ML was shown to suffer from a lack of reproducibility. There is still a long way to go before we can delegate our decisions entirely to machine intelligence.”
Researchers are also testing their theories empirically. Chicago Booth’s Max Farrell, Tengyuan Liang, and Misra—in a paper about deep learning, a subset of ML that gives a particularly precise picture of even complex data—theoretically quantified the uncertainty involved in a predictive problem. Studying deep neural networks, they used deep-learning methods to explain how it works.
To illustrate their findings, they compared their theoretical results with an actual corporate decision. A large US consumer products company sent out catalogs to boost sales. When it sent out 200,000 catalogs, an average of 6 percent of people made a purchase within three months and spent an average of $118 each. The researchers compared these results with those of eight deep-learning models and evaluated how close each came to predicting what actually happened.
“We have a truth out there, how people behave. We want to approximate people’s behaviors,” says Misra. “The first question is whether the deep-learning model can approximate the truth. The second is, how close is the approximation?” They measured closeness by determining the smallest number of data needed to produce the right answer, within a margin of error.
How many data did they need to arrive at an answer that is, with reasonable certainty, correct? The answer depends on what a company is trying to do and find out, but the researchers provide theoretical guidance. “We’re confident now that the amount of data we used in that application is enough,” says Farrell. “There’s still uncertainty, but we’re happy with what we found.” (For more, read “How (In)accurate Is Machine Learning?”) The results, the researchers assert, will help companies compare and evaluate ML strategies they may want to use in their various decisions.
How close is too close?
Ultimately, ML is a tool that supports decision-making. Part of what makes it complicated to assess the uncertainty involved in ML is that the line is blurring between who or what is actually doing individual tasks, and how those tasks contribute to the final decision.
Many nonmathematical tasks inform a decision. Think about the search for a vaccine, which involves many separate tasks. Not long ago, researchers hunting for relevant information in scientific literature might have read it themselves. Many still do, but some have assigned ML this task. Similarly, ML can identify protein shapes of interest. Thanks to the complexity of the human body, there’s currently no substitute for testing a vaccine with a clinical trial. But ML may one day be able to comb through data from all clinical studies ever conducted and use them to predict health outcomes, potentially accelerating the clinical-trial process. “You’re running up against the boundary of how people think about using statistics to make decisions,” says Farrell. “Machine learning may be moving that boundary in ways we’re not 100 percent sure of. By changing the tools available, we’re changing how those decisions can get made.”
There is uncertainty involved in every task. When reading through thousands of scientific papers for relevant information, is ML overlooking anything important? When identifying protein shapes, is ML missing any? While the study of ML makes it increasingly accurate, every time ML is assigned a task it hasn’t done before, this introduces new possibilities for uncertainty.
Moreover, as ML moves closer and closer to delivering accurate answers, many questions in the field have to do with whether ML is too accurate. Privacy advocates, among others, worry that ML could be so effective at finding patterns that it could discover information we don’t want to share, and use that to help companies discriminate, even accidentally. Say a company can tell from cell-phone data the angle at which a phone is being carried, and that could be a proxy for gender (as many men keep phones in their pockets while women carry phones in purses). Could it then use information about gender to perhaps charge men and women different prices? In 2019, a developer noticed that the credit limit offered him by the Goldman Sachs–issued Apple Card was far higher than that offered his wife, with whom he filed joint tax returns and shared finances. In a tweet, he called the Apple Card sexist and suspected a biased algorithm. The New York State Department of Financial Services opened an inquiry.
Is it possible to prove mathematically what ML can and cannot learn? And if a business wants customers to know it’s behaving ethically, and to trust it, how can it convince them it’s unable to use ML to violate their privacy? These are issues that companies are trying to sort out before regulators step in and write rules for them.
It’s also an issue that straddles mathematical and social realms—and complicates the bigger question about ML accuracy. Aside from how close ML is getting to delivering solid answers, how close do we want it to get? That’s a subjective question for which there can be no entirely accurate answer.
In the case of Misra, who was in a restaurant in Italy, wondering whether to trust his son’s health to a translation app, what he wanted above all was accurate information. And there were certain words he knew that matched the ML translation, which pushed him toward trusting it. “We did end up using the app, and it was fantastic,” he says. In that restaurant, and others during their travels, he and his son were able to translate menus, write their concerns in Italian to make sure the staff understood, and avoid an allergic reaction. “We wouldn’t have been able to function, in terms of making informed choices about food, as well as we did without the app,” says Misra.
In the moment, he recognized and trusted ML. But as ML becomes increasingly pervasive, many people may not even recognize that ML is behind the application they’re using. Unlike Misra, they won’t make a conscious decision about whether or not to trust it. They simply will.