Can Mandatory Reporting Help Clean Up the Environment?
Professor Christian Leuz discusses his research on fracking amid the broader debate about transparency in ESG disclosures.
Can Mandatory Reporting Help Clean Up the Environment?Tali Griffin: Okay, well, in the interest of time, I'm going to start, as I'm the least exciting part of tonight's event. Again, good evening, good morning, depending on where you're tuning in from, and welcome to today's Social Impact Leadership Series event hosted by the Hong Kong Jockey Club Program on Social Innovation and the Rustandy Center for Social Sector Innovation at the University of Chicago, Booth School of Business. I'm Tali Griffin, the Senior Director of Marketing, Programs and Partnerships at the Rustandy Center, and I'll be moderating today's event.
For those of you who aren't familiar with the Rustandy Center, we are the social impact hub at Chicago Booth for people committed to tackling complex social and environmental problems. We play an important part in the University's social impact and entrepreneurship ecosystems and our work promotes innovation, advances research, and develops the people and practices that can accelerate social change. For those of you who may not be familiar with the Hong Kong Jockey Club Program on Social Innovation, it provides resources and programs to help Hong Kong's NGOs, nonprofit leaders, and social entrepreneurs do their best work. The program offers a range of opportunities, including scholarships, social entrepreneurship, workshops, and trainings for NGO board of directors and board members. Our workshops and events this year will cover topics like impact investing, capacity building for NGOs, and nonprofit board service. After today's conversation, we'll briefly touch on some upcoming events and how you can get involved. But now onto today's event.
Our social impact leadership series explores trends and research related to social impact, and today we're excited to examine how artificial intelligence and machine learning can be used for greater good. We're hopeful that these insights will help you reflect on how AI can help address the issues you care most deeply about.
With that, I'm pleased to introduce today's speakers and offer a brief roadmap for the event. First, Professor Sendhil Mullainathan will share a present on using AI for good. Sendhil is the Roman Family University professor of computation and behavioral science at the University of Chicago Booth School of Business. His current research uses machine learning to understand complex problems in human behavior, social policy, and especially medicine. I encourage all of you to read his full bio on our event page to see why he is considered a leading academic and thought leader on the applications of AI and social and public policy.
After Sendhil, practicing physician, Dr. Paul Lee will speak to today's topic from a local lens by sharing his experience with his healthcare platform startup, Tree3 Health. Dr. Lee is the convener and chief executive director of the Association of Doctors for Social Responsibility, and the co-founder and COO of Tree3 Health. The Association of Doctors for Social Responsibility serves over 10,000 underprivileged citizens in Hong Kong with public health campaigns and pro bono medical services. Tree3 Health, which you'll learn more about during this conversation, is a health tech startup that is a mobile based solution to harness data on smart phones, to uncover the health risks of users.
Following their presentations, I will moderate a conversation with Professor Mullainathan and Dr. Lee. We'll open the floor for any questions you may have. So please, please, please, throughout the presentation, submit questions for our speakers via the Q&A chat, and we'll try to get to as many questions as we can before the event ends. So with that enough of me. I'll turn over the floor to Sendhil, and we are really looking forward to your presentation. Thank you.
Sendhil Mullainathan: Thank you, Tali. I think Tali mentioned to me before how spartan my background is, but I want you to all know that all the money that's being saved on faculty salaries at Booth is going towards improving the student experience. So with that in mind, let me dive in.
So I'm a professor at Booth. I also am the research director for something called the Center for Applied Artificial Intelligence that we have just started, and I hope this gives you a flavor for, this talk gives you a flavor for why I think that there's really quite an amazing opportunity to do good using these tools. Before I start, let me just also say how terrific it is that we have such an opportunity to sort of, you know, I think business schools were 20 years ago, a place where people just thought money-grubbing people went to make more money, and I think in the last 20 years that, thanks to people like you on this call, that image has really changed. I'm amazed at how many of the students in my classes actually want to make meaningful change and improve the world around them and recognize that the skills taught at Booth are a meaningful way to do that. And whether they choose to do that using market forces or through social enterprises or working at nonprofits, it's really just, I think it's an exciting time to be at a business school doing good. So I'm really, it's my honor to be here in front of this audience.
So let me start with AI for Good. I think it's pretty clear that AI is everywhere. And so let's talk about how to do good with it. I'm just going to give you one caveat. In order to align with Paul's work, which is really interesting work about health, I tried to choose more of my examples from the sector of health. But I know many of you might not be in healthcare, so I don't want you to think, oh, are all these just health examples? So what I'd like to do is to use the Q&A to just force us to broaden the discussion. How does this apply to my sector? And please make us do that. I think we'd both be excited to do that.
So how should you feel about AI? On the one hand, I think you should be excited about it. So an example is that if you want someone to read an x-ray right now, it costs a lot of money. Yet I think we're at the point where we could have a poor person in India who would not be able to like get a good person to read an x-ray, we're almost at the stage where they can get an algorithm to read that x-ray for them better than the best clinicians and radiologists in the world, and in fact, better than the consensus opinion of the five best radiologists in the world. That is an unbelievable advance we're sitting on. The expansion of access that is about to be opened is sometimes lost in the discussion of job loss and things like that.
By automating, algorithms allow us to give people access to knowledge that sits in the heads of incredibly highly paid people and is not available to anyone else. I think that's just, you just have to understand that that is a huge gain.
Here's another reason you should feel pretty amazed. This book is a very interesting book for those of you who don't know. It's called "The Diving Bell and the Butterfly". And if you're like, me and you are like, oh my goodness, a big book, good news. There was a movie made out of it. The movie is, the book and the movie are about this fellow, Jean-Dominique Bauby. He was an editor of this magazine and he had this unfortunate health ailment where, long story short, he was completely bedridden, but not just bedridden, he was locked into his body and he could not move any part of his body except his eyelids.
Now he wrote this book. How did he write this book? By blinking eyelid by eyelid, one blink at a time. You see the woman next to him? she's transcribing the blinks into a language. So to communicate one letter of this book was four blinks, depending on the nature of the blinks. It's a Morse code of blink to write this long book. And it's a story of his life, and the story of his experience. I tell you this because we're now at the point where you can actually go, if Bauby were alive, actually, I don't know, he might be alive, I'm not sure, you could put this EKG, the EEG thing on his head and you know how he would type now, by thinking. It's not science fiction. We can take the EEG signals coming off of the head and he can type by just deciding what he's thinking. You can buy this.
Algorithms can help us do the unimaginable. The reason we can build that is not the physical hardware of the EEG. The physical hardware of the EEG has been around for like 50, 70 years. It's that the AI algorithms are helping us convert the signals of the EEG into, effectively, mind-reading at least in this minimal way, what the person wants to say.
So these are two reasons I think you should be super excited. Algorithms can do the unimaginable and they can expand and automate what we already know. That's the good news.
On the other hand, these algorithms have some problems. Let me tell you a set of stories that will illustrate the problems. There's a friend of mine at Google who tells the story, because it involves one of his, one of the tools that he built. So this is around a time with this pathology, this imaging team, this computer vision team at Google, very early, before these things were thought to be possible, they had this unbelievable success where they managed to build an algorithm that can detect in these x-rays pathologies, as well as the doctor. And this was like 2016-17, so It was like, holy, it was like a massive leap, and they were like, I'm just going to like, this is victory. So successful, out of a sample. And in a way they got a little too curious. So they started asking the question, gee, I wonder what the algorithm is looking at.
So my friend at Google, Mohken, they brought him in, he's an expert on interpretability, and they said, "Can you use your interpretability techniques to just show us what the algorithm is seeing?" And so then what they found was, oh, look, on this particular one, here's where the algorithm is looking to decide where there's a pathology. They're like, oh, what is going on there? Let's zoom in. Let's hit contrast. What the, those are pen marks. This is not a pathology detector. This algorithm was a pen mark detector.
It turned out unbeknownst as a team, in their dataset, the x-rays, people had put pen marks where there were pathologies. So they'd built an elaborate algorithm not to detect pathologies, but pen marks on x-rays, which is of course, easy to find and is an entirely useless algorithm, because guess what, most x-rays don't have a radiologist who already put the pen mark. So in that sense, this was a near disaster. That is, if they had deployed this algorithm, it would've looked good on paper, but when it deployed, it would've been dreadful.
And these type of near disasters happen all the time. And so that's the danger of algorithms. I think to properly describe them, I want to help you understand why we have some phenomenally big disasters with algorithms, and to do that, I want to give you a sense of what they are, and once I do that, I'm going to basically give you a very brief sense, I mean, this is like two slides worth. but once I do that, I think it'll help you see what the Achilles heel of AI algorithms are, and that in turn will help you identify pitfalls, but also will give you a sense of where you can do good. So that's my goal. It's a little tutorial on AI, just enough to sort of, okay, so what is AI?
I think if you try to read up on what is AI, you can find a bunch of like Wired articles that have photos like this. You can be pretty sure you're not going to learn much from there. You can go to the next layer and find articles on Medium with things like this. You can be pretty sure you're not going to learn much there, or rather it's just a lot of technical stuff. Or you could go to the next level and go after, there, you can find articles like this. So this is your trade off. If you want to understand AI, you can understand at this superficial level, or you can decide I'm going to get a new degree in math and understand it at this level. That is a false dichotomy. You don't need to know any of this stuff. I'm going to tell you the one thing you need to understand to understand AI algorithms, that if you understand this really well, you will be able to identify what the problems are. So what is AI? I'll do one more try. First, the thing you have to know is ML algorithms, at their core, are not built like other algorithms.
Most algorithms you encounter are coded. Somebody wrote down a piece of code. It's true, there's code underneath ML algorithms, but that's not the important part. That code part is basically a commodity. You can just buy that off the shelf. The important part is that code interacts with data. So you can almost say the key building block of the ML algorithm is the data.
So then let's go back and ask, what is AI? This is AI. If you understand spreadsheets, you can understand what an algorithm is. And if you don't know the spreadsheet underneath an algorithm, you don't know what the algorithm is doing, no matter what people tell you about. You don't need to know the code, you don't need to know what a convolution net is, you don't need to know what deep learning is. You need to know what spreadsheet someone with the algorithm was given. Cause the algorithm is just learning to copy whatever's in the spreadsheet. Now there's one nuance here which I want to just clarify. The real big AI innovation was, you know, you used to spreadsheet. You might say, oh, I have a data set with tumors, the age of people, smoker, BMI. The real innovation was that we found a way to take other things and treat them exactly like you would in a column in a spreadsheet. Specifically, I now would include a column, the x-ray.
That's it. If you just think of it as, I have a spreadsheet, one of my columns happens to be the x-ray, one of my column happens to be the doctor's notes, you understand everything you need to know with AI. AI is saying, here's this variable. I want you to predict this variable with whatever is on the right hand of the spreadsheet. There you go. One of them happens to be the x-ray.
So the big innovation is that a little bit, but here's a huge innovation. The idea, it takes me, it takes my PhD students a lot of time to just stretch their minds and be like, I know how to run a regression, but oh, I see, now I'm going to do the same thing with an image. That's kind of interesting. But otherwise you can think of AI as exactly the same as if someone said to you, I correlated Y with X. That's all they're saying. It just so happens, X is some complicated thing like an x-ray that's it. They're not doing anything special. There's no artificial general intelligence, these things aren't going to get out of the box and come and take over your home. It's just this, it's just people correlating Y with X. I mean, it's amazing, but it's just this. So if you spend your time thinking about super intelligence and Elon Musk, you're going to get very distracted. If you think spreadsheet you'll have a practical understanding to move forward.
So if you want to understand AI and understand the problems, you need to understand what exact variable in the spreadsheet was the algorithm trained to predict? What will you give it, what inputs was it given, or on the right hand side, to make that prediction. What were the rows of that spreadsheet? In a nutshell, if you can answer those three questions, you'll know and all the problems of AI come from that. That's it. I know you don't believe me. In fact, I teach this MBA class for nine weeks. Basically the whole class is me telling them this in the first lecture, and then for the next nine weeks, convincing them this is it, and at the end, they're like, oh my goodness, that is it. But let me try and see if I can do this in the 18 minutes I have left. So I'll give you an example of a failure and I'll help you see that it was just a spreadsheet failure.
This is TayTweets. This was a chat bot introduced by one of the world's most successful companies with one of the best AI teams, Microsoft. They introduced a chat bot into the world to show the power of their AI systems to engage in conversations. So they just open a Twitter account and said, Tay will talk to you. What happened?
This is one of the early tweets by Tay. You'll notice the timestamp, March 23rd, 2016. "Hey, can I just say that I'm super stoked to meet you? "Humans are super cool." This is just TayTweets talking. This was within about four hours, "Bush did 9/11", says TayTweets. "The Holocaust was made up." And it gets worse and worse. As one person put it within 24 hours, this algorithm went from effectively just, oh, humans are super cool, to saying some of the most outrageous things on Twitter in a very public way. Microsoft was very upset, and so should we, these are outrageous things for this thing to do.
So what went wrong with Tay? Here is the thing you must understand. In one sense, the engineering worked fine. The coders did their job. The actual code was written correctly. Tay didn't crash. This wasn't the blue screen of death where your computer just doesn't work. It wasn't code running too slowly. In fact, the Chinese version of Tay had worked perfectly. The reason this caught the entire team off card is, for like about a year, they had been running this at scale with millions of people in China tweeting, not tweeting, but texting with Tay, and people were loving texting with Tay. So the team was like, what went wrong? Well, what went wrong was let's understand what the data it was trained on. The data it was trained on was things that looked like this, one-on-one conversations. So in some sense, the rows in the spreadsheet are pair-wise conversations by people looking to have a meaningful conversation with a chat bot, so users texting with Tay. In other words, this was the training data.
Then they deployed it into the world of Twitter. Now, I don't know if many of you know what Twitter is like, but it is certainly not Mr. Rogers. It is closer to this. They trained in a world that was friendly users texting, and they deployed into a world which was the right hand side. You trained on one data set and hoped it would apply to a different kind of data. That is never going to work out for you. That's all that happened. The rows and the things that the data encountered were different than the rows in the spreadsheet in a very meaningful way. Twitter, which are users actively working to make Tay look bad, was not the same as the people chatting with Twitter on this texting platform. The algorithm was trained on the wrong kind of examples.
Now you might say, oh, okay, but you know, is that a common kind of, this is the biggest, one of the biggest companies in the world with the best software teams in the world. These are not dumb people. These errors are easy to make. And here's a place where we're making that error at scale. Healthcare is getting a ton of AI. What kind of data are we using? Well, it turns out there was a very nice study in 2020, and of all the AI algorithms that are put out there, here's something you might notice, 39% come from data on people in California, 27%, now you might notice there are things missing here. There are a whole percentage of the world that are not represented by people in California, Massachusetts, or New York. Even in the United States, these are insanely non-represented. The entire algorithmic basis of what we're doing to automate medicine is being built off of this weird population. And not just people. These are academic medical centers. They don't practice care like anybody else.
So the wrong kind of examples we saw in Tay is about to happen in healthcare. At Booth, we're trying to resolve this to a degree. We have a project if you're interested called Nightingale Open Research, which is trying to fundamentally change the healthcare data set that is underlying the work that's going on.
Let me tell you a second data problem. If any of you go into Google and just type, unprofessional hairstyles, you'll get this. Well, they may have fixed this in the last year or so. You'll notice all these people have a certain ethnic makeup. If you type nurse, you will notice they all have a certain gender. If you type CEO, you'll notice they all have a certain type of gender. Now this one, Google, cause this is in a paper, Google went and by hand fixed it. So today, if you type CEO, you get this. But literally someone went in and tried to fix it by hand cause the algorithm was doing this.
So what's going on here is, well, we know these algorithms codify all these other biases. So for example, there are algorithms that de-pixelate images. I don't know if you all recognize who that is, but that is Barack Obama. I think you know once I tell you that that's clear who that is. But give it to a de-pixelated algorithm, and it thinks it must be this guy. In fact, in general, if you give these algorithms to clarify the image, they tend to pick white people. So if you give it AOC, you end up with a white person. If you give it her, you end up with this white person. They're biased towards the training data they have, which is disproportionately white. Think of how problematic this is, the database.
We are building algorithms that are inherently biased. Populations that are underrepresented in the United States. Blacks are grossly underrepresented. Asians as a whole are ridiculously underrepresented in all of our AI data sets relative to the global population. You just have almost no representation. And that's not a small problem. This isn't a, oh, isn't it too bad that we're not represented. It means the algorithms are just going to be ridiculously unable to deal with patients.
So algorithms are biased based on the training data. Let me give you one last example and then we, I will hand it over. Yeah, let me skip this one. In the interest of time, I want to make sure Paul gets his chance to talk.
So let me give you an example of how, once you recognize these problems and build things carefully, we actually have a chance to do an immense set of good. I want to tell you about one project we just finished and one project that we are about to finish. And they're all going to be about the careful, meticulous construction of the right data set. So the first project is about something called sudden cardiac death. Every year, about 65 million people in the United States go in to get an annual checkup and they're told, Great, you look fine. We looked at your ECG, looked at everything. But some of them, especially males, especially males roughly in the age group between Paul and me, drop dead with no warning. Their heart just malfunctions.
This is actually a really big problem. It's an under-discussed problem, and it is a medical mystery. We look at the ECG and we say, we didn't see a problem. Their heart looks fine. So what further matters as you look past these things. You look at their BMI, you look at their cholesterol. You're like these people looked in the prime of fitness. In fact, if you ask around, you probably know someone or know someone who knows someone to whom this happened, and the story's always the same. I don't understand what happened. He was in such good shape. It's particularly tragic. If this were unavoidable, that's life, you know, that's the risk we face being alive.
But there is a very simple intervention that would've resolved this. We could have installed this device. It's almost costless to install and it would've prevented this death. But we can't install it in every human being. We don't know who these people are, is this going to happen? If we could just predict somehow, say, with at least 3% chance that this is going to happen to you, we would just install this. It's cheap, it has almost no side effects. But we can't, they all look normal. Well, we can't. Maybe an algorithm can. If we form the right dataset, if we form the right dataset, we might be able to make this prediction. And that's what we did.
My colleague Ziad Obermeyer and I went and put together a data set where one of the rows, it's about 400,000 people who had an ECG. We then matched it to say some of these people went on to die, we found their death certificates and we know that they died due to sudden cardiac death. So here's your ECG. Some people died, some people didn't. This is the kind of thing algorithms are good at. And it turns out that when we did this, in fact, the algorithm does a pretty good job. Here's the algorithm's predicted risk. Here's the actual one year rate of cardiac arrest. And look, it's finding a pretty big set of people with risk well above that 3% to 4%, 5%. These are people the algorithm would've flagged and said, install a pacemaker. We could have installed a pacemaker and saved all of these people's lives. And you can see here, this is the threshold at which we would've installed a pacemaker in this population, we could have saved this huge number of lives to have done that. And that's the paper that we're almost finished. And this is a thing where I hope you see, it's crazy.
This type of algorithm could be installed in every EKG device that's made. It could flag for the doctor automatically, and at a minimum, they can look into this, see what they think and then decide whether to install a pacemaker. But we have a chance to turn sudden cardiac death into something that we can truly do something about at least with 20%. 20% of 300,000 it a lot of people.
Let me give you another example. This is one that we're working on right now that we're almost, we don't have an answer yet, but in the United States, the leading cause of death for women between 20 and 44, these are the five most causes, unintentional injuries, cancer, homicide. The fifth leading cause of death is being killed. Almost all of these is actually domestic abuse. Domestic abuse is a huge problem that is not described. And this is just death. Severe injury, huge.
Here's the problem: For every 50 people who visit the clinic with some sort of sprain, fracture, or dislocation, one of them in this Lancet paper, but let's just take this 2% as a base rate, is a woman who is actually being abused by her partner. That's how the, that's how the sprain or whatever's happened. 49 or not, so the doctor doesn't know what, I mean, how is the doctor supposed to know which one of these people is being domestically abused? But if we knew that, and in fact, if you take every woman who was killed and look back in time, she showed up in the emergency room at a time when you could have done something about it. We just didn't know how to find the needle in the haystack. There's this sort of invisible epidemic. We don't know how to reach it. But you notice this is like the sudden cardiac death. What do I need?
Well, I need a data set of x-rays, of people who've shown up at the emergency room. And now working with the crime lab at the University of Chicago, we've managed to match these data sets to labels, which women end up reporting being domestically abused to the police, or are actually being found dead or being found seriously abused. So we now have an algorithm training to ask the question, just by looking at the x-ray, can I say if this x-ray was due to domestic abuse or not. I don't have an answer for you, but again much like with the EKG, if an algorithm could do that, then you can imagine a world where that's plugged into an x-ray machine, and we're able to put up a flag when an x-ray like this crosses the doctors' thing. The doctors don't know what to do. It's hard to differentiate accidental from manmade. But the algorithm might be able to find a signal.
Actually the early results look, I hate to say optimistic on such a grim topic, but let's say the earlier results look like something we can do something about.
So to conclude, I think if we're careful to avoid the pitfalls that I raised at the beginning, I think algorithms have the potential to transform our lives for the better. Not simply to, you know, lose people's jobs, but to actually make them meaningful. So let me stop there, and I will hand it over to Paul who will tell us about the awesome work that you're doing that's much better than any research, cause you're getting stuff done in the world. So over to you, Paul.
Paul Lee: So thank you, thank you, Sendhil, for the wonderful talk, because you just pointed out what I'm doing every day. What job I'm doing every day is doing spreadsheets and adding a lot of columns and seeing correlations and working with my engineers. So basically it's my daily work, a part of my daily work. And the domestic abuses case and also the cardiac stuff case is really interesting and it really points out one of the pain points of medicine, or medical job is, about unknown or uncertainty out of the clinic, out of the hospital. What's going on? No one knows and they don't have time. They want to find out, to explore, and that is where AI or like machine learning or algorithms kicks in. So I really appreciate the talk now and it really triggers my (speaking faintly).
So let me share my screen first. So hi guys, I'm Paul. I'll introduce myself a bit first. I won't introduce myself as a doctor or whatever, but then I'm just a guy that keeps finding out problems and trying to solve them. And I was a medical student or a clinical doctor in hospital, in clinics, and I always have a question because I really want to chat with the patients more, because I want to know about their daily lives and more about their usual behaviors, but then, sorry, within the time limit. The nurse will be just out of the door and say, Doctor, you've only got one minute, you've got a 100 cases outside waiting for you. Please do so. And that's why due to the time constraints, it's very hard for us to like, know more about a patient.
So then I raise the question, why? Is it just due to the time, because of the time limit that we cannot find out more, and it really affects the outcome of patients, to be honest, because the compliance of the management and also knowing the behaviors and risk behavior, and it all contributes around 50% or even more about the patient's outcome, apart from the medicine we give. Because if we don't know the compliance, if we don't know what is going on of the 99% of their lifespan out of clinics, then how can we do that, do the job of the doctor.
So I raised a question like, can we solve it by having more time with patients? And that's why I tried to solve this problem, and I founded a local medical charity and gave all the spare time, all the spare time with doctors, medical professionals, trying to spend more time with the patients in the community. But then it raised another question.
So we've got more time with the patients in the community, but we've got no right tool because the patient always tell lies. I'm not like discriminating or doing some bad mouth, but they tell us because, it's not because of that they are being dishonest. It's just that they don't know. They don't know about their life. They don't know if their behavior is actually risky. They don't know how to take a drug. They don't know if they're doing it wrong. So they tell lies, with being honest, then they just tell lies to the medical doctors or medical professionals so that the medical community cannot do the right judgment or the right things to do. So it comes, another question.
So what should be the tool, if you've got the time, then what should be the tool? And that's why I gather some technical friends, some engineers from the field, and that's why we started some projects before, some pilot projects, and then we tried to find out the tools by taking photos of their daily lives. Oh, but sorry, because it said it requires the patients' input, so they don't want to input anything, but they want maximal output from the algorithm, from the apps, applications. But it is not possible because AI or algorithms or machine learning is all about data. Data is a key of all the algorithms. It's garbage in, garbage out. If you know, if you don't even have garbage, then there won't be anything out. And that's why we try to think of a way how we can minimize the input of other users, but then we can maximize the detection or the outcome or knowing about their daily health. And that's why I quit my clinical job last year and I'm fully dedicated now in my startup, my co-founders, some technical co-founders, and some business co-founders as well.
So what Tree3 Health is doing for, what we are trying to solve is that we're trying to use a medical, sorry, a mobile application that we try to collect the mobile data from the device, because everyone now is having a mobile phone, a smartphone. It is a joke that it is all, it is almost the anatomy of a human being. The phone is being the anatomy of the human being, because everyone is carrying a phone, one or two or three, carrying them everywhere, when they sleep, or just to go to the toilet, or just go to work, for a couple of drinks. They always carry a phone. And that's why it is the best sense, the best detector of their daily behavior, and that's why we are trying to do it with this device, the device everyone carries every day, and we try to get the mobile data inside because it carries a lot of mobile data. For a single day, one user can generate around like 10,000, 100,000 technical observations inside, and it is real time, 24/7.
And that's why we try to collect data and run our, because we've got the data, and also because we've got medical professional inside our health tech startup we can get at the ground truth. And with the raw data and with the ground truth, we can start the Excel table, we can start the spreadsheet, column by column and try to find out correlations with the help of machine and also algorithms. And once we do this job, we can find out a lot of interesting correlations about their daily life behavior and also their health status. And we can try to give the image for the user that is with minimal input. You don't really have to input anything, just carry your phone. It's done. Just don't delete my mobile app, then it's done, and then we can give more health insights to the user themselves and also to the medical professionals and try to optimize the management.
So I always say that it is healthcare, but it's not health-cure, because health-cure is just trying to treat the disease by medicine. But then we are doing healthcare, we are considering the comprehensiveness of the whole lifelong health journey of the users. and that's why we're trying to track the users' health data, and then we push alerts to our practitioners and also push alerts to users themselves. And that's it. And we try to initiate more precise and tailor-made professional medical service within the service loop. Because it's hard for us to ignore the clinical surface part by just collecting data, because in real case scenarios, to apply such data collection and data intake, the users won't just give you data without anything in exchange or in return. They want service inside the service loop as well. So we cannot just ask them for data. We have to give them the quality of service, how fast my medicine can be delivered to my home, how fast can I see a doctor, within 10 seconds? If not within 10 seconds, they would just call up our CS and complain about why, why is there no doctor seeing me when I press the button?
So when we apply in the market, there would be a lot of market consideration, a lot of service product defining, delivery within this science project. So it requires a lot of team power, manpower, from also the technical team, we've got a marketing team, we've got this team to try to work out the whole science project together. So what we are foreseeing is just not, it is not just collecting data and analytics and give some help insight. We are trying to create a preemptive community help alert for the emergencies, for the chronic disease, for the infectious disease management, for example, COVID right now. There are a lot of typical behaviors of COVID users or patients they're generating from the mobile apps.
So this is a starting point, and this is the right, really good timing for us to try to review the usage of data in community health, and in general. Personal health, community health, and also the delivery of healthcare management.
And the second goal that we are trying to do is the true medical ID. Because in a clinic or a hospital, all the doctors are just trying to catch the general health status by snapshots. They do investigations, they do physical examination, history taking, but then they're just doing the snapshot investigating. They're just doing the snapshot, asking questions or doing the (speaking faintly). But then for the, like every, or even like most of our, most of the people in world, they're trying to spend 99%, 95% of the lifespan out of hospitals, out of the clinic. So what are the true medical ID, the missing part, there's 99% of missing parts. So can we try to compensate and complete the whole true medical ID by using data, by using the daily health behavior data to make it the real true ID?
For example, in some theme parks, they said, if you have a heart disease, if you have a stroke before, whatever, then you cannot go on the rollercoaster. They may really do not experience, or they don't know they have stroke underneath or cardiac disease underneath. So it's not a true ID because it's just, they're not realizing they have the problem. They're not going to the hospital yet, but then it's not the true medical status of the user themselves. That's why, can we use data to complete the whole medical ID of everyone? It will be one of our goals.
And also we try to incorporate some data driven approach for our medical professionals as a start, because in Hong Kong, it's very conventional. The traditional healthcare system is very, it's a bit rigid. It's not very rigid, but it is a bit rigid compared with other places. For example, in Asia Pacific, like Singapore, they're applying more a sandbox scheme. There are a lot of different policies supporting the application of data-driven approaches in medicine. However, Hong Kong is still catching up, and that's why can we propose some benefits for our medical practitioners? That's why, and they will adopt the data approach. So these are all we want to do.
But then there are a lot of challenges that we have. We need talents, we need resources to develop all the data collection, the product service. We need to educate our users about the importance of data, we need to educate our service provider about the data and the work of us, and also about just now, I've mentioned about the rigidity of the conventional infrastructure in Hong Kong, about the healthcare system, because in conventional healthcare, it's always about doctor-driven, physician-driven. So whenever the medical textbook asks you to do that, do that, and then you have to follow the instructions, you have to follow the seniors instruction. But then I'm not saying that they're not scientific because they're all based on research, it's evidence based.
But then we have to be more creative. How can we explore the unrealized part of the ground truth? It's all about data, it's about data-driven. So real data-driven is equivalent to evidence driven. So in the medical world, everything is evidence driven, but then data is also the ground truth of the world. So that's why I dedicate my time, my effort and my career into this data driven medicine, and we hope that it can give some insights, or I hope you guys will find this sharing interesting or useful. So maybe this is the end of my presentation, and I can open the time for Q&A stuff. Thank you.
Tali Griffin: Paul, Sendhil, thank you so much. This was such an interesting conversation. I truly appreciated hearing first from Sendhil that the research that you're doing, the practical applications too, of how AI can be used in healthcare, and then Paul, hearing from you, how you're trying to put it into practice in Hong Kong with your startup. I want to thank our audience members. The questions are starting to come in fast and furiously. And because of that, I'm going to do my best to both ask the question, let you guys answer them, and start to scan through the other questions that are coming through.
So, you know, the first question I want to put out is, and thank you to Samhita, who had two great questions as the conversation was going on, is around this idea of how algorithms should be tested, or should they be tested before they're introduced into a patient care environment? You know, Sendhil, for example, you brought in an expert to find those pen marks, but how should that be regulated? I turn it over to you. Thanks.
Sendhil Mullainathan: And I'd love to hear Paul's thoughts on this. I think there's two layers that I think is important to keep in mind. I think there's the regulation layer, which is the government decides what can be allowed to be so. And maybe it's not the government, maybe it's the American Society of Radiologists. I think that is way too late.
The problems we have are in the building process already. If you need the government regulator to be the goalie to keep the crap out, something is deeply broken in the innovation system. It's not good for anybody to invest tons of time and resources to have the goalie reject it. That's just waste. So the first, we definitely need a goalie, and we can talk about that, but that's not the first order problem we have right now. The first order problem we have is there's a few people doing really interesting work, and I'm sure you've faced this, Paul. There's a lot of bad stuff out there and the bad stuff is crowding out the interest in the good stuff. So I think what we need on top of the regulation, I couldn't have agree more, we need that, but we also just need to raise the level of discourse and understanding so the people can see the difference between a thoughtful presentation like Paul's and an all hype presentation that has lots of problems. To me, that's going to be the biggest thing, because there's a lot of thoughtful people. They're going to have to decide, Hey, is this a company I want to join, and there's a lot of over-hype.
So I would almost frame it as that's the big problem. Regulatorily, we actually have very, very strict guidelines that are in place that look a lot like clinical trial guidelines. And so I'm not saying those can't be improved, they can be improved, but we're starting at a pretty good place for the regulation. But for the public understanding, we're start way down here. If you just pick a random article on Wired, you are not even sure to get reasonable stuff. That could be just as much hype as reality. So I'm curious what you think, Paul, about that.
Paul Lee: Yes, I agree with... because, so first question is what is the ground truth, because like what, what is the standard, what is the golden standard of, or test or whatever, like how much sensitivity or how much specificity will be the golden standard? Is it 100% because there is no one single test can be 100% accurate, even though the test case or the lab test, they're not saying 100% accurate. So what is the gold standard?
This is the regulatory part, because if you reach a 93 or a 94, 97, then it can be used as testing, formal testing. So it is the regulatory part. But then from the public understanding part, they always strive for the 100 accuracy goal, but then it's not really possible or feasible. But what we are trying to do is to facilitate, to improve and to try to give some insight that can raise awareness, raise the concerns, and raise the public concern awareness, or raise the practitioner, raise the doctor's awareness about is it really a normal ECG? And after the algorithm, so that's why I give a second look, okay, this is really an abnormality. So we raise the concern, raise awareness in itself. We regard it, we judge it, it is wrong, it's correct. It's not, it's just facilitation using algorithms and the power of data.
Tali Griffin: That's great, thank you, thanks to both of you.
A few questions came in, specifically, Paul, about patient privacy, confidentiality concerns. And to extrapolate, you're not, your startup is not the only one that is built upon the incredible wealth of data that cell phones can provide. And so how do you think about privacy concerns as you build your startup, and how do you think for apps generally that might use machine learning based on what we're seeing, or based on Apple Health or other health platforms like that, how to think about that?
Paul Lee: Thank you for the question because privacy and confidentiality is always the question when we engage in healthcare like data. So we have got to approach, one approach is using the medical standard level to handle data, (speaking faintly) different data infrastructure level to handle the security part.
And the second part of all, we always separate the two databases. We separate the user information, whatever, in their own mobile phone, they're identified in the mobile phone only, but then we extract only by characteristics. For example, I don't have to know Paul having this set of data. I just need to know the demographics. This is a, a 29, 30 year old gentleman with smoking, drinking habits, and then having this, this set of data and then going to a clinic, see a doctor, diagnosed with chronic heart disease. Okay, this is the only set of data, all the data I need, but not, this guy is called Paul. So that's why we always separate the whole thing and we do the data and analytic part without using the user identifier. So I think this is the general approach for all the research structure design, because we are not investigating the person itself, but the data behind.
Tali Griffin: Thank you. Sendhil, anything to add to that. You know, I think privacy and research is, is fundamental to the work that you do.
Sendhil Mullainathan: Yeah, I think that the one thing I would add is that I think privacy, and this is just building on what Paul is saying, is that there's two kind of issues that we shouldn't confuse in privacy. So the first issue is can someone find something out about me? And I think that issue, and there are safeguards for that, sometimes they work, but that I would call that one kind of privacy.
The other kind of issue. I would almost call not privacy, but about, wait, if my data is being used for something, even if they're not finding out about me, like who should be allowed to use my data for what? And that's almost ownership to a degree. And I think very often when people talk about privacy, I think they kind of confuse these two. And I would just encourage people to clarify in their own minds. I, for example, personally, yes, I'm worried about privacy, but the truth is, no one's that interested in finding out much about me, I'm not that like, I'm not, there's no one that interested in finding out about me. It's not like, like if I were Barack Obama, I'd be really worried about privacy. Lots of people are trying to get his like... So that's, but this issue can affect all of us, is is my data being used for purposes I'm comfortable with.
And I think I would actually encourage people to differentiate those things. I think we have good regulations for privacy, but not yet a good answer to the question of who's allowed to use what data for whom, with what permission?
So to make that very clear, I think we're tending towards a bad place on the second issue, I think we're tending towards a place that looks like an opt-in model, where if I do don't opt in, like the truth is, use my medical data for research that benefits the world. I don't want to be asked to opt in for every one of those. I mean, of course, like I, yes, use it for anything. I mean, I think that most people feel like, yes, I want you to find life saving technologies that might save my life for somebody else's. So we're not moving towards the world where, which I think I worry about, I think we should be moving toward a world where there's a broad accepted social use, and the assumption is your data's going to be used for that. It's in the social use, that's the way it is. Privacy is always preserved. But right now, because people don't know where to draw the line, some places are drawing the line at I think an unfortunate place, that unless the patient is explicitly opted in, it can't be used to answer this question that's going to save lots of lives. I just don't think that's what most people would want. Maybe I'm wrong, but I think those two issues have to be separated.
Tali Griffin: Sendhil, I think the follow up event will be data ownership for good. That will be our follow up event. And I think in addition to privacy and ownership being two issues that, you know, could use some nuanced teasing, data ownership for social good versus data owner for profit, for capitalism, et cetera, is another very nuanced topic that we don't have time for today, but I think it's becoming a very noisy space to untangle those two questions. So thank you for that.
You know, I want to, one of our speakers, one of our participants asked about how to avoid bias in building algorithms. And I know that in the States, we're thinking about algorithms for things like social justice, for how to better set bail, topics like that, but there's still a lot of bias in our criminal justice records. And so I'm interested in how you think about avoiding, perhaps, to this idea, Paul, that you said of 100%, that's too much to hope for, but how do we decrease bias in our algorithms?
Sendhil Mullainathan: So I want to just start by pointing out, like, I think one of the bad parts of the conversation has been people, by fixating bias into the algorithms, have failed to recognize this problem is, has nothing to do with algorithms. So I'll give an example that affects a lot of people on this call.
So where do we get our judgements about what level of blood pressure dictates you have some cardiovascular problems we've dealt with? I mean, those are guidelines, right? Like as Paul was saying, like, you look that up in the textbook, the doctors say, Well, your BP is above this blah, blah. Where do we get that? Turns out, we get that largely from something called the Framingham Heart study. As you may know from the name, where is Framingham located? In Massachusetts. Guess what people are in the Framingham Heart Study. This is a guideline diffused throughout the world. White people. So one of the very interesting puzzles that affects people like me is South Asians die at ridiculously high rates of heart disease. It's always been a wonder why. Are the treatments less effective for us, et cetera.
So there's a great study about three years ago called the MASALA Study. They simply redid something like the Framingham Study, but for South Asians, and what they found was, oh, no, the thresholds for South Asians needs to be lower. That's it. The reason we die of high heart disease is nothing more than a stupid underrepresentation problem in a data set that has been percolated throughout the world. Was there an algorithm? I mean, I don't even call it an algorithm. It wasn't an algorithm, just a terrible data set. So I think I'm saying this because we just need a lot more attention to representation in the data that we use. The more people buried in the algorithm, the more it's easier for people to be like, oh, algorithms are bad. It's nothing to do with the algorithm. And once you think of a representation of data, we have, this isn't a mystery anymore.
If someone runs a poll to figure out what is the average opinion rating of say Biden in the United States, we know what it means to have a representative poll, to have a non-representative poll, we have a whole science of that. But somehow that science is just forgotten. You know, when you do a clinical trial, you know what people never ask, who's in the trial?
Here's an amazing fact. If you have a medication for heart disease, you know who's not in the trial for the medication for heart disease, people with conditions like diabetes, because they don't want people dying of some other condition. That means every one of your heart disease medications has not been tested on people with diabetes. We just do not think about representation in any meaningful way in our data sets. I think the way we deal with bias is just by thinking about representation, and we have a science for it. It's not, I mean, it's not rocket science, we're just not doing it.
Tali Griffin: Great, thank you. Paul, I do want to give you just a little bit to respond to that, and then I'm going to close this out to keep us on time for tonight.
Paul Lee: I agree with, you know, because it's all about data representation. If we can involve the, all the data, every single point of data of every single one in the planet, we still have bias because we don't know if this data can represent masks, aliens or whatever, because it's always limitations. There's always limitation. There's always bias. So then we can try to avoid and reduce it.
Tali Griffin: Thank you so much. Paul, Sendhil, thank you so much.
We really have just scratched the surface. Some of the questions that came in that were so interesting were around predicting versus analyzing data. How do we use AI for things we haven't seen before, like COVID for example? There are questions about how do we, how do we use AI in the sectors we didn't discuss for many of our NGOs in Hong Kong. That's around education, it's around care for the elderly. It might it be around for disadvantaged populations. There's so much.
I think also for NGOs, we're thinking about we're often cash strapped. We sometimes struggle to attract talent. How can we tap into the power of AI for good? How can we partner with researchers, with startups, to bring this powerful tool into our day to day services? And I really look forward to continuing this conversation and to continue to explore it. So again Sendhil, Paul, thank you so much.
Before we close out today's event, I want to take just a few moments to preview a couple of upcoming events with the Hong Kong Jockey Club Program on Social Innovation. We have two, what we're calling, NGO Core Competency workshops. It's a series of four workshops. If your organization already missed the first one, it's not too late. These are really terrific, research-backed workshops. The next one will be on innovative leadership, and then, and that's March 15th. And then on April 12th, we'll do another one using operational tools. Each of these workshops involves a one and a half hour-ish research presentation and discussion with Chicago Booth faculty followed by local insights from NGO leaders, and we hope you'll join us there.
If you want to stay up to date on all of our events and programs, I encourage you, sign up for the Rustandy Center monthly newsletter. Please check out our events page, and you can check us out on LinkedIn as well through these QR codes or through our event follow up. Again, thank you so much for joining us. I wish you all a good day. If you're coming, calling in from a different time zone, I hope you have a good night, and thank you so much again to Sendhil and to Paul and to all of you for joining us, have a wonderful evening, or morning.
Tales of A.I. gone wrong often become viral news stories, whether it’s a Twitter bot gone rogue or a problematic search engine, but was the problem really A.I. or did it lie elsewhere? What lessons can be learned so A.I. can better be used for good in the future?
These questions and more were explored by Chicago Booth Professor Sendhil Mullainathan and Dr. Paul Lee, the cofounder and COO of health-care startup Tree3Health, during the latest Social Impact Leadership Series in Hong Kong. As the Roman Family University Professor of Computation and Behavioral Science, Mullainathan brought his extensive research experience to the table, while Lee’s health-care background provided insights he gleaned while building his app and practicing as a physician.
The Rustandy Center for Social Sector Innovation and The Hong Kong Jockey Club Programme on Social Innovation co-hosted the event, which was moderated by Tali Griffin, Senior Director, Marketing Programs and Partnerships at the Rustandy Center.
Below are four key takeaways from the event:
The key to understanding how A.I. can be used for good—and how to fix its problems—is to first understand how it works, said Mullainathan. The words “artificial intelligence” may sound daunting, but at its core it functions much like an ordinary spreadsheet based on the premise of “what variable is the algorithm trained to predict.”
“The real innovation is we found a way to treat other things exactly like you would treat a column in a spreadsheet,” Mullainathan said, whether that’s images, text, waveforms, X-rays, or satellite images.
“Much like a spreadsheet, A.I. only works as well as the quality of its data and the way the algorithm is written and tested,” Mullainathan said. “Problems typically occur when A.I. is trained on one type of data, and then applied at scale to another,” he said, or when human error or bias has been introduced into algorithms and data sets. In one famous example, a Google image search of “CEOs” prioritized white men, which highlights the common problem of data sets that fail to be representational.
Another famous, or infamous, example is “Tay,” Microsoft’s Twitter chat bot. Twitter users were able to quickly manipulate Tay into saying outrageous statements because the chat bot’s A.I. had been trained in a friendlier, politer environment rather than the more unpredictable and strident English-language universe of Twitter. “They trained on one data set and hoped it applied to another kind of data. That’s all that happened,” Mullainathan said.
While designing A.I. is not without its challenges, health care is one area where it can be used for good. Appropriately trained A.I. could be used in remote telemedicine, for example, to analyze the X-ray of a patient in rural Asia or interpret bloodwork data for another, said Mullainathan. Apps like Lee’s Tree3Health make use of the millions of data points gleaned from cell phones and smart devices and interpret them for users and health-care professionals. Push alerts can also notify both that an issue may be brewing that is not immediately apparent.
“What we are foreseeing is not just collecting data and annotating and giving some health insight, we are trying to create a preemptive community health alert for the emergencies, for chronic disease, and for infectious disease management,” said Lee.
Relying on A.I. to manage health-care data inevitably raises questions about privacy and consumer protection, but both Mullainathan and Lee said those issues are not insurmountable. One solution is to factor privacy in as a key issue during app design, said Lee, whose app Tree3Health bifurcates user data and health metrics. “Data privacy and confidentiality are always important issues to tackle when we engage in health-care data. On one hand, we comply to international guidelines concerning electronic health record data handling; on the other hand, we separate the storage and handling of the sensitive user data and health data so that our data analytic is only based on anonymous data sets to ensure privacy,” he said, concerning user data and health data.
Another solution, said Mullainathan, is to reconsider how privacy is debated in an A.I. context. Sometimes, he said, when users talk about data privacy, what they are really concerned about is how their data will be used.
“I think we have good regulations for privacy but not yet a good answer to the question of who’s allowed to use what data for whom, with what permission,” Mullainathan said, adding that he was also concerned about how privacy restrictions could keep data from being used for good purposes like medical research.
“I think we should be moving toward a world where there is a broad accepted social use, and the assumption is your data’s going to be used for that,” he said, rather than always assume that a patient must opt in to share potentially life-saving data.
The Hong Kong Jockey Club Programme on Social Innovation provides resources and programs to help the city’s NGOs, nonprofit leaders, and social entrepreneurs do their best work. Operated by the University of Chicago Booth School of Business, the Programme offers a range of opportunities, including scholarships, social entrepreneurship workshops, and trainings for NGO boards of directors and board members.
Disclaimer: All the content presented is independently produced by the organizer, creative team, or speaker, and does not reflect the views or opinions of The Hong Kong Jockey Club Programme on Social Innovation or The Hong Kong Jockey Club Charities Trust.
Professor Christian Leuz discusses his research on fracking amid the broader debate about transparency in ESG disclosures.
Can Mandatory Reporting Help Clean Up the Environment?This session sought answers to the question, “In the broader conversation about diversity in business, what is the context for religion?”
Innovating for Social Equity: The Intersection of Religion and BusinessResearch and insights from Alexander W. Bartik, Marianne Bertrand, Feng Lin, Jesse Rothstein, and Matt Unrath.
Week 5 and 6: Labor Market Impacts of COVID-19 on Businesses: Update with Homebase Data Through May 9