What we found is something that we’re finding again and again in all of our A.I. work, that every time you see that an algorithm has done something really bad, there’s no engineering error. That’s very, very different than the traditional bugs in code that you’re used to: when your computer crashes, some engineering bug has shown up. I’ve never seen an engineering bug in A.I. The bug is in what people asked the algorithm to do. They just made a mistake in how they asked the question.
In this case, we said, “OK, look, it’s way off. How do we figure out what it’s doing wrong? Well, let’s figure out what people wanted it to optimize.” They wanted to find the sick people, but how did they measure sickness? They measured it using the data they had: claims.
So sickness was measured by how many dollars patients generated, which is very-subtly different. Sickness doesn’t equal dollars. They’re highly correlated, but they’re not exactly the same thing. And it turns out that if you look at total dollars spent, you don’t actually see any racial bias in the algorithm. At the same risk score, the Black patients chosen and the white patients chosen have the same average dollars spent.
Again, costs are highly correlated with health, but not across races. At the same level of health, we spend less on African Americans. So when the algorithm went to predict cost, it obviously did not find the sickest African Americans to be as appealing as the sickest white patients.
I should note, this is not a dumb thing to do. There were about five or six such algorithms built, and they all had this bug. Some were built by private companies, some were built by nonprofits, some were built by academics, but this bug was pernicious and it was everywhere because of the product-management side of it.
The way these algorithms are built is that a bunch of data scientists go in and tell these health systems, “We can build a risk score. What is the thing that you want risk on?” The health systems say, “Well, we want to find the sickest patients,” and they provide what data they have.
The health systems don’t realize the mistake. The data scientists don’t know much about the health-care domain. So between the people who know a lot about the context and the people who know a lot about the coding, something falls through the cracks. The translation from the domain to the coding is where we see problem after problem after problem.
The fact that computer code scales, however, also means that solutions scale—this is the great thing about it. Once we realized this was the problem, we built an algorithm trained to predict health. And now that is being scaled: by the end of this year, we’ll have this thing fixed for 50 million people. We’ll probably have the whole problem fixed by next year.
I’ve never worked on any social science like this, where you find such inequality, some problem at this scale, and suddenly you can then fix it. This project tells us what should frighten us about algorithms, but also what should give us enormous hope for them.
How can regulation help address problems of algorithmic bias?
We should separate out two kinds of bugs. The bug that I just described was bad for the business and bad for society. So there, instead of a regulator, there’s probably just a pretty good arbitrage opportunity for people with the human capital to find these problems and properly formulate solutions. For the bugs that are privately bad, there’s a huge moneymaking opportunity or career-making opportunity.
Let’s come to a different kind of bug, a bug that is privately not that bad or maybe even slightly good, but socially very bad. Should there be a regulator that will look at these algorithms and audit them? I think the answer is yes.
Think about the case of employment. The US Equal Employment Opportunity Commission is almost never able to get lawsuits through the door because it’s very hard to prove that one person discriminated. Even when we have statistical data that says the whole system is discriminatory, producing evidence that one action by one person was discriminatory is difficult to do. They’ll just say, “Sure, I didn’t hire that person, but that’s because there was this other person who was better.” But who’s to say who’s better? It’s a very complicated thing.