Dismantling Algorithmic Bias with Patrick Ball (HRDAG), Brian Brackeen (Kairos) and Kristian Lum (HRDAG) | Disrupt SF (Day 3)
7:53PM Sep 10, 2018
Megan Rose Dickey
Has anyone seen Black Mirror where you walk around and you've got that little social school going on about you? Well, that's all controlled by algorithms. Well, what happens when decisions are made about us in this way? Well, here to discuss this really crucial issue, especially for the next few years will be Patrick ball with HR DAG and Brian brackeen from Kairos and Kristian Lum from HR DAG
to discuss this. These people are human rights data scientists. Yes, I know I didn't realize that was a job myself until recently, Megan rose Dickey from TechCrunch will be leading the discussion. Please join us big round of applause.
Thanks for joining me up here. I am excited to delve into what is a very important topic. algorithmic bias. revolut accountability. Brian, I wanted to just kick things off with you. You're the founder of Kairos, a facial recognition software company. And you're actually doing a demo a little bit later today on the main stage. So quick plug. But
so I yeah, so it was a it was a few months ago, you you publish this post on Techcrunch. com about how your company will never provide its facial recognition tools to law enforcement. Why is that?
Yeah, we could concerned about algorithmic bias actually is like the core reason why. So in for regular use cases, we do facial recognition for like the fortune 100, fortune 500, these are people like banks retailers. So if we're wrong, at worst case, maybe you have to do a transfer again on to your bank account. If we're wrong. Maybe you don't see a picture accrued during a cruise liners like that. But when the government is wrong about facial recognition in someone's life and liberty at stake, right, they can be putting you in alignment, we shouldn't be in there could be saying that this person is a criminal when they're not, then with all the issues around this law enforcement generally, then just kind of spirals out of control.
And how does I want to talk a little bit about the different types of biases because what's the specific type of bias that leads to that leads to some of the the misclassification. So in the,
there's all kinds of bias and will probably go through different types. But in our world, fish recognition is all about human biases, right? And so you think about AI, it's learning it's like a child and teach it things and then it learns more and more what we call like, right down the middle, right down the fairway is pale males. It's very, very good. Very good, good. Identifying somebody who meets that classification. pale. Male, male male that's like that's right down the fairway.
I know some of them. Okay. Yeah.
Yeah. never actually heard it called. I like that. pill Mills walking around. Yeah. Yeah.
And then as you get further you know, further for the fringes, right. You get, you know, women folk, so different ethnicities, darker skin folks, even within let's say, African Americans, you have different shades, right? The further you get from pale male, the harder it is for AI systems to get it right or at least the confidence to
get it right. And Christiane, maybe you can speak to this, but But why is that?
So why is
Why is it that it gets harder the further away you get from pill Mills?
So you know, I think there are a couple different varieties and bias that we're talking about. And I think the one you're really getting at is sampling bias, right? So you have a lot of examples of, and it's funny because I hadn't heard that term before. So I'm probably left now, you know, pale males. And so your, your training algorithm can learn a lot about those those types of faces, right? When you have other categories, or maybe you have fewer examples, there's less data from which to learn. And so it's harder to make those classifications are harder to
correctly identify people or whatever your application area is. and Patrick,
there's another type of bias there's, there's selection bias, what is that? And how is it different from the other types of biases
in a lot of contexts in which we want to use machine learning, we go out to find data that we can use to train whatever model it is, we're going to apply to make predictions? Well, where do we get the training data training data in a private sector context probably comes from business process. And if it comes from business process, you can be reasonably confident that the data you've captured is representative of your business process. So the only assumption you have to make is that the business process in the future will be similar to business process in the past, and you're good to go. social data, particularly in the public sector isn't like that at all. The way we get data from public sector processes is through administrative interaction with people. So think about policing data, how did the police get data on people? Well, police get a lot of data about people that they police intensely, whether or not those people are criminal, they have frequent interactions with the police. And every one of those interactions becomes a data point a database, people who rarely interact with the police have much less data about them in those databases. If you use those databases, to train models about who is a criminal, or who should be released on pretrial release after they've been accused of a crime, the model will tend to think that people who have frequent interactions with police are more criminal.
Okay, so the consequence of more frequent interactions with the police is that you're more likely to have a more dangerous prediction, from a model making a prediction about criminality, this can be catastrophic. What we're doing, essentially in a model like this is reproducing the bias on the input into the prediction. And this is a very common problem in public data, it affects credit scoring, it affects all these criminal justice applications that I mentioned. And it affects national security applications of risk classification. It's it's a giant, giant problem. And it's getting much, much worse. It's not getting better.
So I mean, a lot of people call this algorithmic bias. But is it not just human bias that gets perpetuated by these algorithmic systems?
Yes or No, I'd say it's human. But I would move it away from bias at the individual level and call it bias in institutional or structural level if a police department is convinced they have to police a certain neighborhood, the individual racism which may or may not be in the heart of any specific officer is irrelevant. what's relevant is that the police department has made an institutional decision to to over police that neighborhood thereby generating more police interactions in that neighborhood. thereby making people with that zip code more likely to be classified as dangerous if they are classified by risk assessment algorithm. So it is there is a mix of this kind of human and structural bias but I think that our emphasis should be concerned should be focused on the structural bias of a parallel example
so in our world a lot of algorithms are created University and universities do tend to skew pale male as well right and so what a professor will do is say okay I'll give you 10 bucks to come in give us 20 selfies that kind of thing to any students right hungry college students right and so their original train data comes from that group and therefore the bias implicit though isn't to your point the individual professors goal
and then Christiane, you do you do a lot of work around around fairness? And so first, I guess first of all, what what would affair algorithm look like? Like what what does that mean? Yeah,
you know, that's a great question and something that's been coming up a lot over the last couple of years, this growing fairness accountability and transparency area in machine learning. And unfortunately, there is not a universally agreed upon definition of what fairness looks like this this debate, I mean, it predated this particular article but I think it really took off with this pro publica article about risk assessment where they found that the rate at which black defendants were
were black vintage not go on to reoffend were classified as future recidivists was twice the rate of white defendant to do not go on to read event refund. Excuse me.
And so that was one definition that was put forward, right, the false positive rate for one group so that those sort of the burden of the errors was higher on one racial group than another racial group.
Other people have looked at it from other sort of dimensions they've looked at it from if you instead, first think about the classification, look at what percent of people who are classified as high risk, then go on. reoffend. This is a little hard to wrap your head around. But it's basically how you slice and dice the data can depend on what can determine whether you ultimately decide that the algorithm is unfair. And so a lot of the work over the last couple of years has really revolved around different mathematical definitions of fairness, which are sometimes and often mutually incompatible, you can't have both at the same time. And so I think, you know, what makes an algorithm fares highly contextually dependent, and it's going to depend so much on the train data that's going into it, the training data will off, it'll often be a little bit circular. So the training that will determine how you evaluate whether the algorithms fair if we have these problems, for example, where say, one group is over classified by the humans sa being being recidivist because they're arrested at a higher rate, it's even hard to make those determinations, where the whether the algorithm itself is fair, because you're comparing to some sort of biased and imperfect gold standard, right? And so yeah, it's really going to be contextually pendant, you're gonna have to, under and a lot about the problem, you're going to have to understand a lot about the data. And even even when that happens, I think, you know, there will still be disagreements about which of the sort of mathematical definitions of fairness are most appropriate, in which context? Because if I could
imagine these guys are so good. I'm gonna keep just piggybacking on that. Yeah, please do.
So one of the issues with AI, and that people are really working a lot. And we're going to show this in my demo later, is they don't know why I came to a conclusion sometimes, right? And so you go, you train it, it goes in this magic black box, and then you send the data and you get an output. So more and more people are starting to ask the guy to explain the logic which will help us to figure out you know, if that's biased or not,
yeah, and I mean, for you, how, how easy would that be for you to explain how your software landed on this person.
So easy. Now,
I would say up until earlier this year, we could look at all in fact, again, later, I'm going to show that now we can we've cracked it. But essentially, we've had to create a system and use partners as well, to show us a vision of what the computer is seeing so that we can then verify how the AI sees essentially.
Okay, and is that is that transparent? Or is that I mean, is that transparent to like the end user? Or is that just for your company's own
or let them make a transparent it only was kind of invented in the last six months.
Okay, got it. And then Christiane, just just going back to fairness a little bit, I mean, I guess I just thought of when the ACLU they they ran some facial recognition software, it was an Amazon recognition that they ran to identify Congress members, and they only got the pill Mills correct. Essentially it so I'd like that's an example of, okay, clearly there's something wrong with this algorithm. But like, if you were to be able to, to go in and like look at that specific algorithm? And that that process, how would you conclusively determine, like, with the stats with the data to back it up, that that was not a fair process, you know, I think
a lot of people are probably not in this room. But many people think that when we're talking about biased algorithms, that we're talking about problematic algorithms that, you know, the sort of human component of the bias really derived from somebody sitting there and being like, oh, if, you know, if not pale male, then Miss classify, or something like that, right, where there's some sort of malicious intent that that went into it. And by and large, I, at least in the examples I've seen, that's not true. It's not like, there's somebody sitting there, you know, if female don't hire or something like that, right? That's just not how this works. And so, that would probably be the easiest sort of thing to audit, right? If you actually get a look at the code, you could look at, and see, okay, did somebody you know, maliciously try to discriminate against a certain group. But, but I mean, fortunately, almost unfortunate, unfortunately, from the point of view of the audit. But fortunately, from the point of view of, you know, our, our vision of humanity, that's, by and large, not what's happening. And so in order to do that sort of audit, I think, you know, when we're talking about transparency, we have to really go back to the beginning. And, again, sort of keep coming back the same points I think all of us are making over and over, we really need to get back to the input data and to see the training data that's,
you know, training the model. So, for example, if you have, I think this is the first question you that we discussed was, if your data set is over representing, say, the pale male demographic, then there's reason to believe that your classifications might be poor
on you know, that come out of the model that's been trained with that data. And so that's probably where I would start looking at a system like that. But there, of course, a lot of other points at which you might want to do some evaluation. Unfortunately, a lot of cases, it's easier to get access to something like an API or something like a model description, or a paper that was written about the development process, as opposed to the data itself, or in many cases, sort of the what is perceived as the boring details of the data, sort of like the cleaning and all these decisions that are made from the very beginning, like, how are you even classifying the data that's being it's being put into it sort of end up being just obscured? Because it's for a lot of people not that interesting, but in my opinion, probably one of the most important components of understanding what types of biases you're likely to see coming out of a model that's learning patterns in that particular data set. Yeah, if I could jump in
there, I think there's a couple of questions that users have systems like this can ask about how the model was trained, that would inform whether or not it's likely that the outputs of the model would be biased. And I keep saying model rather than algorithm, because the model is the result of combining an algorithm which could be a perfectly good algorithm with training data, which may be appallingly bad. So the question is, first to think about what is the input data after it's gone through the cleaning that Christian focus on and asked what's not there, what in the universe to which I intend to apply, this model is missing from the input data. And it can be that simple, or do we have too many of one category and not enough of some other category? Well, the model may learn more poorly about the less well represented category, and have a higher error rate for that category. Think about those dimensions in a way that is rigorous, and it's fortunate that we have 100 years of Applied Statistics that tells us how to do this, this is the language here is statistical representative at and sampling. So how did the sampling work for your training data, if it was just a pile of crud you found somewhere
that may not be your best foot forward. But if you thought really carefully about how you sampled the space to which you intend to apply the prediction in order to create your training data, then that gives you a much better shot at having a less biased outcome. In the old days, people used to say about computer predictions, or computer analysis, garbage in, garbage out, let's move on, I think that the appropriate statement we should use now is if bias in then buy us out. So think hard about the input data,
I think one clarification there is, or maybe just addition is really important, which is that, you know, the word bias is used in a lot of different ways. And this is something you're talking about a little bit earlier, where you might meet, you know, sort of, maybe even implicit bias or unintentional bias, or, you know, how you use it colloquially or in sort of the media. But if you have a data set, like we're talking about that exhibit selection bias. So some people are more likely to end up in the dataset than other people that doesn't necessarily have to come from malicious intent, it could come from the fact like Patrick is saying, you found this dataset somewhere. And maybe this data set is very useful for what it was created for. So it doesn't necessarily have to dovetail so perfectly with
sort of the negative connotations you might have about the word bias, we're sort of you were sort of mixing a lot of,
you know, heavily loaded words, a technical definitions, and so I think it's important to think very carefully about selection bias, even sort of divorced from what, whatever emotions might come up with that word.
I think those last comments to also tie home really your first question the public versus private, right. So facial recognition in a public sense should be auditable by outside groups to make sure to your point, the model right is correct. All that goes into it.
Facial recognition in the private sense, you know, for, for us, that's intellectual property, right. And so, you know, we've process but now billion faces. And so all that data creates a stronger algorithm, that strong algorithm is why people choose Kairos over something else. Now, if you're using a police use case, you should be able to audit that information, right? Because it's that important. And again, if you're for us, for our business customers, you know, it that's IP
Yeah. And I mean, in in the private sense, I mean, the, the cost of failure isn't necessarily that that huge, and like, even if you think of, okay, a basic algorithmic model on Google, like, okay, maybe Google's algorithm gets it wrong, but maybe you're just shown a weird search result, like it's not going to change your life. Patrick, could you talk a little bit about the, the cost of failure in these in these models?
Yeah. And here, I really want to talk to people reach out to people in this audience who are building machine learning applications. What happens if your predictions are wrong, who pays that cost, I'm pretty sure it is unlikely to be you, right? If you serve the wrong ad, and somebody gets a sneaker ad, when they were looking for a boot dad, man, not big cost, right? The customer doesn't really suffer, the manufacturer doesn't suffer, and you don't really suffer, it's just not that big a deal. It's a little bit of noise. But if you build an algorithm, which deploys police into a neighborhood over and over again, because the police keep arresting people there, and you over police that neighborhood, you create a crisis of public legitimacy between the community and the police. Now, we have a spectacular problem, notwithstanding all the people who got arrested in that neighborhood for trivial crimes. And notwithstanding the people who maybe should have been arrested in some other neighborhood, but weren't because the police were over focusing on the first neighborhood.
Criminal Justice, in my opinion, is probably the sharpest example of a very, very high cost of failure in machine learning applications. And we have dense application of these tools in criminal justice applications, and they're getting it wrong. And we're seeing people suffer, this is a big problem. So if you're thinking about building a tool, spend some time with a red team, thinking about what can go wrong with this tool who will suffer
and if you can build the tool, maybe make a little money on it, but not bear the cost of being wrong. If some other community bears the cost of being wrong, maybe as an ethical concern, you should take on that extra analogy as part of your calculation about whether the tool should be built at all, because the cost of failure can be catastrophic. And it's not hypothetical anymore. And if people want to talk afterwards, I'll be happy to read the list of vendors who are enthusiastically making these mistakes. I'll just say IBM, because they're in the news today. And this is a big, big problem.
Yeah, and I mean, maybe for those who haven't who haven't seen that news Do you just want to talk
a little bit about that. So yeah, IBM and the New York Police Department have developed a tool to classify the race of someone in an image that is captured in surveillance footage, or body worn camera footage. A
classifying people by race has a long and sordid history of massive abuse,
not only in our own country internationally. And this should be incredibly obvious, I think, to anyone with even a passing idea about what police departments are likely to do with that information. So we need to be really, really careful not to build those tools. Because once they get built and integrated into administrative procedures, it's really, really hard to remove them, fix them or reform them. So we need to stop them at the point of not building them. But the larger generic point that I'll close with is think about the application of your tool, what can go wrong. And in public contexts, especially the bias data, a lot can go wrong, just want to close with another 15 seconds. I am not a machine learning hater. My two biggest projects that I work on in my day job right now are both machine learning centric in one I predict where in Mexico, we will find disappeared people in mass graves. And in a second, I link different kinds of databases about war casualties defined the repeated mentions of a single individually all across databases that include hundreds of thousands of people. So I love machine learning tools, I am not an anti machine learning person at all. I am very, very concerned about the credulous and naive use of machine learning and contexts where it's inappropriate and the cost of being wrong is very high
Christiane, you do you do a lot of work and pre trial and predictive policing? I mean, have you seen any, any good examples of machine learning in in those areas.
So sure, I,
you know, I was reading an article not too long ago about use of machine learning to go through past records of people whose records should be expunged because of marijuana convictions, right. So in that case, I think that's a great use of machine learning, because it sort of helping to make more efficient this policy that we that we need to get, you know, push through, we need to get people who have sort of suffered from past abuses off those right, their records, you know, cleaned up, so they don't continue to suffer from those of us as I think when we're looking at cases where we're making predictions about someone's behavior in the future, often where there's not a whole lot of difference between the two groups. So you say, someone's high risk versus low risk,
I think those are a lot more dangerous, particularly because, you know, sort of speaking to the maybe business applications, if you're able to get some sort of marginal increase in predictive accuracy, maybe in click through rate or something like that, maybe you make a ton of money, and you're really happy. So you get some sort of small separation between this person, this person in their probability of future rearrest, which again, would be different than future re offense. But even even sort of ignoring that difference
is, is that sort of marginal difference enough to make a decision about someone's life and liberty, like we were talking about earlier. So really recognizing the differences and application areas and not thinking that something that maybe even though it will make you a million dollars, or I mean, maybe even more, right, by getting some sort of tiny increase in predictive accuracy or machine learning was, it was a useful tool in that case, that it doesn't necessarily translate to other cases where we're making very different types of decisions.
Let's say that, that these algorithmic models were 100% accurate. Had no somehow didn't have any bias whatsoever, which, of course, it's probably not ever going to happen.
What was up, right? The dream? Right, right. Exactly.
Um, in in that scenario, which of course, would happen would like Brian with you, I mean, would you still be anti law enforcement using facial recognition? Well,
even though a joke, it actually is mathematically possible.
It is. Yeah, absolutely. for there to be. No. Yeah, you figure so if you can
do something for the pale me, all right, you can be 99.99%, right? You couldn't be that right. For the other groups, you just have to have enough data now for the other groups. And so in 100% world what I'd be okay, would say no, and here's why. Again. So if, if this convention center was one of my customers, and they wind up looking like an easy pass language a lot of our customers do, yeah, he's gonna want to walk in, they'll searching those scanning just kind of walk in,
because we've got a picture you that you gave us earlier. that's a that's a good use case. But the convention center doesn't know who every single person's audiences unless they give them their picture with government. They have all of our passport photos, have a driver's license photos, they could put a camera on Main Street and know every single person driving by, and we just recently in the last month, turn down a government request under why they keep asking us to do stuff. And we say, we don't government, by the way, not reading TechCrunch, but they came to Homeland Security came to us asking for facial recognition for people behind moving cars. And this isn't the NSA or people outside of us. This is Homeland Security, right? So these are using facial recognition on Americans driving by right. That's for us. That's completely unacceptable.
Yeah. And I mean, I guess what's scary, is that I mean, sure you said no, but the probably just go to a different company. Somebody Yes. Yeah.
Yeah. legislation in this area. We're all for legislation. And can I have one more thing
to that actually, you know, even if we could have 100% perfect accuracy and predicting something, the question is at predicting what, right. So again, sort of back to the context that we keep talking about. Usually, the thing you're trying to predict in a lot of these cases is something like rearrest. So even if we are perfectly able to protect that we're still left with the problem that sort of the human or systemic or institutional biases are generating biased arrests, right. And so you, you still have to contextualize even your hundred percent accuracy with is the data really measuring what you think it's measuring? Is the data itself generated by by a fair
process? And if I can close that, what would it mean for the police to have perfect information about every crime committed constantly, and society in order to build a fair machine learning system, we would need to live in a society of perfect surveillance so that there is absolute police knowledge about every single crime so that nothing is excluded. So that there would be no bias. Let me suggest to you that that's way worse even than a bunch of crimes going free. So maybe we should just work on reforming police practice and forget about all of the machine learning distractions because they're really making things worse, not better.
So before we introduce machine learning into law enforcement, let's first fix law enforcement. That's kind of issue
number one. Yeah, yeah, Fer fer predictions, you first need a fairer criminal justice system, which we have. We have a ways to go to produce the fair data with which to train All right, that's a bigger problem. Yeah, we have we have a
long way to go, unfortunately. But thank you so much, everyone for your time.