"Does Big Data Threaten Our Democracy?" Why? Radio Episode with Guest Cathy O'Neil
6:18PM Sep 29, 2020
Jack Russell Weinstein
Disclaimer: This transcript has been autogenerated and may contain errors, do not cite without verifying accuracy. To do so, click on the first word of the section you wish to cite and listen to the audio while reading the text. If you find errors, please email us at email@example.com. Please include the episode name and time stamp where the error is found. Thank you.
Hi, I'm jack Russel Weinstein host of wide philosophical discussions about everyday life. On today's episode, we're asking Kathy O'Neill how big data threatens our democracy. A few thousand years ago, the philosopher Protagoras declared that the world was made up of math. He, like his predecessors thought that all of reality could be reduced to one substance would flesh clouds at root, they were all the same thing. Philosophers before him had suggested that the universe was made of water or fire, but he insisted that the primary stuff was the very numbers we learn about in arithmetic, the relationship he thought could be discovered through geometry and music. We can sympathize. One plus one always equals two. He is always the same thing as MC squared. But it's not just our theories, even our experiences defined by numbers, we see one person 12 eggs, 101 Dalmatians. We gasped when movie aliens blow up the White House and when the Khaleesi controls her dragons, even though all of this is just computer imagery, zeros and ones and sequence that create a reality that doesn't actually exist. They look so real. These days. Of course, we know about matter atoms and quarks, we understand that the physical world is amazingly complicated. And I don't think we ever really got away from Pythagoras this idea that numbers are more real than everything else. data has power to us. Statistics make us think that we know something about the particular even when all we have are probabilities, and price well, price is the amount that allows us to take everyone's knowledge and determine whether a single object is worth buying. Price is the number we all understand best. Here's the problem. As humans, we tend to conflate knowing something with controlling it. Once we realized that numbers informed us about the world, we decided that they would allow us to remake it as well. Seven became lucky 666 became evil and the golden ratio defined beauty. That's right, apparently 1.618039887 is beauty.
We invented arbitrary numbers of years for adulthood, like 13 1618, we decided that a simple majority could justify almost any political decision. And we came up with quotas to tell us how many African Americans women are Jews are enough so that we don't have to concern ourselves with the quality anymore. We invested numbers with so much power that we decided they were objective that they couldn't be challenged, and that they applied universally to everyone, no matter what. This is a fiction. Not all numbers have the same power. The number of protons in the nucleus may indicate a naturally occurring element. But the percentage of heterosexuals in a community is not make homosexuals unnatural. We may be able to establish how many calories an individual person needs to be healthy. But meeting that threshold doesn't entail that providing the poor with only the bare minimum allows them to flourish. We equivocated, we decided that numerical facts about the world revealed our moral obligations, we forgot that the choices we made rigged the game from the start. we ignored the fact that we determined our outcomes in advance. Stephen Jay Gould tells the story of the 19th century scientist who thought human skull size correlated with intelligence. He compared the amount of ball bearings that would fit in European Asian and Black Scholes and determined that the European skulls held the most. Of course, this is nonsense, right brain size is not equal to intelligence, but it sure sounds like it could be. And if that weren't enough, it turns out that the scientist doctored the results. When faced with a fraction of a ball bearing, he would round up for European scale and round down for an African one, the European numbers were always larger, and the Africans were always smaller, giving false credence to his pre established conclusion that Caucasians were naturally smarter. It all looks so good on paper because it was all data carefully recorded and categorized. But it was really just racism on a 19th century spreadsheet. This is what today's episode is about the way that math appears to give us objectivity, but really cooks the books. Our guest has looked at this for years, she's examine how big data and algorithms determine in advance who wins, who loses and how these biases undermine our political equality. She'll walk us through the most egregious examples, but in the process, we will all have to reconsider what we regard as reliable evidence, fair descriptions of human interaction, and acceptable methods of asking how individuals fit into society. Many people like to claim that statistics can be manipulated to support any position. This is probably not true. But more important, it's overly simplistic. It is not hard to miss trust all the data to throw away the baby with the bathwater. It is however, incredibly difficult to sort out which numbers give us reliable information and which don't. Our task is to tell the difference, not reject all of it out of hand. In the end, we are right where pathetic Was 2500 years ago. We're asking about the relationship between numbers and reality. But ironically, Pythagoras had an advantage that we do not. He didn't have computers, ledger sheets or statistics. He wasn't already controlled by the data he needed to question as we are. Pythagoras lived in a world that contain numbers. We, on the other hand, are raised among numbers that contain the world. The very tools we rely on to help us sort things out are the same ones that we have to doubt. And now our guest, Kathy O'Neill is a mathematician turned data scientist. She's the author of three books, including weapons of mass destruction and hosted math blog called math babe. She's worked as a college professor in high finance and was active in the occupy movement. Kathy, welcome to why.
Thank you, jack. I'm so glad to be here.
We've pre recorded the show, so we won't be taking any questions. But if you'd like to send your comments, tweet us at at why radio show post a comment on facebook.com slash why radio show or visit our live chat room at why radio show? org? So Kathy, I've been thinking about how to start and and as a philosopher, I see the world as a series of questions. I hear Taylor Swift. And I wonder about the difference between being good and being popular. I read the newspaper and I lament the absence of justice. Does, does mathematics affect your personal experience? in similar ways? Does being a mathematician affect your worldview?
I mean, I don't know. Because I'm only me. And I've been a mathematician since I was five, essentially. But I imagine it does. I, when people ask me the extent to which the training of being a mathematician has helped me think about these things around ethics of artificial intelligence and its effect on democracy, what I often say is that it has in the sense that being trained as a mathematician, means being trained in, in testing your assumptions, and sort of looking very carefully at what you're assuming and what you're inferring. Number one, and number two, being trained as a mathematician means admitting, like learning how to admit you're wrong, because and you can tell me whether this is true in philosophy, but I know it's very rare in most academic fields, and most fields period. to sort of be happy to be told you're wrong. I mean, as a mathematician, like, if you try to explain a proof to somebody, and they might make a, they point out a mistake, and your reasoning. You, you thank them, you say, oh, gosh, thank you. Because now I've, you've saved me just a bunch of time wasted time here. I feel like that doesn't happen a lot in other places. But it really does happen in math. And so it, it is helpful, because when you when you are not afraid of being wrong, and you are in charge of testing and sort of interrogating your assumptions, then you just you have a kind of nimbleness, I think that you're that you sort of you can go places, sort of intellectually with thought experiments that most people don't go because of their because they're living in a box that they don't even see. So, yeah, so I think in that sense, which is kind of a philosophy, philosophical sense, really, on my training has been helpful.
Just that. How, how deep does that extend years ago, I wrote an article where I talked about the how all the philosophers like to say that they're, you know, they believe that they know that they know nothing and and, and that there's this noble cast of philosophy. And I call that Socrates thing, because people don't really believe it. When they when they write their articles, they think they're right, they know people gonna say they're wrong, but those people are wrong. How much of that is like mathematics? How much is that that tenacious, need to really stick to a gun to see how far it gets you or because proofs and and mathematical problems are. So I don't know what the word is. But but so clear and transparent in front of you line by line by line, if you understand what they're talking about, does that tenacity not work anymore?
I would say in the vast majority of cases in mathematics, you really do want to be told if you're wrong, because it is something you want to correct. And, and being wrong is really just it's almost always completely clear. If you're an expert, if you're a sufficient expert now, I want to make a couple of caveats to that that ridiculous statement. Um, the first one is, you know, there's a and I've been blogging about this for years. There's like a brouhaha in mathematics going on right now because this Japanese mathematician mochizuki, and has claimed that he has a proof to this very famous ABC conjecture, but nobody understands his proof. And so I came out years ago saying, well then that's not proof. I mean, proof isn't isn't an abstract thing. It is a, an argument with which you convince your colleagues. And he hasn't done that. So therefore, it's not a proof. And so there's sort of philosophical questions about what constitutes a proof and blah, blah. And so he sort of sticks to his guns and says he has a proof and other people say, No, you don't. And it's like this kind of ongoing, very slow motion feud. But that's a very rare example. Like most of that, I should say that there's lots of people who claim that they approve something that everyone totally ignores. I get emails still from crackpots telling me they solve for Maslow's theorem in three lines. Right. So and I don't want to so I don't want to make it seem like this is outside of social construction, because it certainly is not. But what I'm saying is that within the social construction, that is the mathematical community, we really don't have feuds about what's true. We have feuds instead, about what's important. What's interesting, what's actually progress mathematically. And who deserves jobs, of course, mean that we everybody has turf battles. But the turf battles are much more about that kind of question. Like what what's actually important than it is about what's true.
Does what's important change radically when you go from being an academic mathematician to a data scientist? I mean, obviously, being wrong, is equivalent to losing money, right? In that context, but but how does? How does the overall conception of things change? When you move from that? What's the difference between a mathematician and a data scientist? And how does it change the way you think about the problems and the numbers and the way they all hang together?
Well, I'm going to disagree with what you just said a little bit. And in fact, I would say this is probably the entire point of my book and this conversation, which is that being wrong as a data scientist doesn't just mean losing money. It means all sorts of things. And the more I explore that question, which is a very important question, what does it mean to be wrong as a data scientist? The more I explore that question, the more I realized that being wrong as a data scientist could mean, destroying democracy offhandedly, because you're in the pursuit of money, rather than the pursuit of long term sort of positive societal goals. I don't want to sort of oversimplify the the goals of a mathematician like the goal of a mathematician isn't simply to write down something that's true, right? We also want to be respected by our peers and write down something that's considered interesting. And that is a very complicated game to play depends completely on the local landscape of your math department. And, of course, the ranking of your math department with respect to other written math permits. It's just as complicated as anything else. But as a data scientist, first of all, what is data scientists like that it's not even, there's no standard definition. I came from math to become a data scientist, quite a few mathematicians, I know have ventured into data science. Lots of other data scientists have were originally trained to study statisticians, others as computer engineers. So mostly some some some from physics, especially as when I was a quantum finance. And by the way, I should say that a quantitative analyst, and finance is basically a data scientist doing that kind of the kind of data science that we call finance. So I don't really distinguish between those fields so much, as soon as you're applying statistical mathematical techniques to solving a real world problem. I call it data science. But I'm just making the point that we are, we're coming from all these fields, in which we are trained to think there is one way to define success one way to be right. And sort of, so therefore, we have this very narrow conception of whether we got it right or wrong. And that is one of the biggest problems and one of the reasons we're seeing so such a slow response to obvious problems that are being sort of created and perpetrated by algorithms, like if you even just look at the the response of Facebook to the recent scandals, and it's it's more of a data privacy issue than it is an algorithm issue. But I would argue that it's, you know, it's it's a similar type of response. These are people who were told to focus on a certain definition of success, which is, of course engagement. And they did that and they thought they were doing a good job, and they're really surprised that they're getting in trouble.
I'm not entirely sure is this question, but are they misled by the idea that the phrase data scientist seems decontextualized and here's what I mean by this, when you're describing a data scientist makes To me, and I think of Adam Smith, the founder of modern capitalism in the Wealth of Nations, where he is looking at hundreds of years of corn prices. And he's looking at banking regulations and the way that these and change things. And that, of course, fits the definition of data science, although it's not digital, he considered himself a moral philosopher. And then later on, we called him, among other things, an economist. Can you talk about data scientists, independently of a data science scientist in finance, a data scientist in housing for government? Does does the term mean anything outside of its context? And if not, is that part of what leads people astray?
It's a great question. I wrote a book about this question. That's what why I wrote this book, I would call it doing data science, because I really wanted and this was at the very height of the the data science hype wave. And I was like, Is that a thing? Is it even a thing? And I wanted to, like answer the question to myself, like, Is this just pure hype? Is it complete bull? Or is it is it a field? And so you know, with a colleague, we sort of organized a Columbia class to have a bunch of data scientists who call themselves data scientists, like people come in and explain what they did. And then I show sort of each chapter corresponds to one of those, one of those lectures, and I ended up deciding, well, this is definitely a thing, like people are calling themselves data scientists and doing these things. And these things are interesting. And they're, they're, they're important to, and they possibly destructive, and but possibly awesome, depending on depending on the time of day. But it is, it ends up being a thing. So number one, and number two, I would argue that, that Adam Smith, wasn't really data scientists, for that matter. I talked to librarians and I say, you know, librarians are data scientists, too. They, they're better data scientists than many of the data scientists. I know, in fact, because they have this sort of ethical code, and they understand the concept of dirty data or data you can't trust, they understand about integrity of information in a way that we data scientists do not get trained to think about. They might not be as good at machine learning processes, machine learning techniques, but they are in some sense data scientists, in some real sense, data science, data scientists, and so was Adam Smith. And yeah, I agree with you, you don't need it. It doesn't need to be digital. And, and probably a lot of economists are also data scientists for that reason. And maybe and for that matter, like maybe data science is just a bad term. Maybe it's too broad the way I've defined it. And it's absolutely before I finished like it is absolutely misleading, because and this is like getting to my current goal of processing proselytizing, which is that data science isn't a science, we have to make it into a science, right. And we don't bother to do that. And that is a problem. Like in mathematics, our job is to prove things rather than just to say thing, we think something's true. And we like we do have standards for that. And we have a lot of, you know, community peer pressure to make sure that proofs are complete. I'm not saying it's a perfect system, but we we at least know what we're going for. Whereas in data science, we we have such a badly defined field, with so many different diverse techniques and different approaches, that we don't even have an agreed upon definition of what what's scientific about it. And it ends up not being scientific at all. And that's a very, very big problem.
And this this is something that economics has struggled with for 150 years and the late 19th century, the attempt to convert it into a hard science really changed the way they talked and the way they thought about things the way they present things. I like to argue on my good days, I argue that economics is a social science on my on my more curmudgeonly days I argue that it's a humanities. But um, but in in, it's interesting that you're talking about the maturation of this as a science, because one of the things that I'm hearing from you is that there's also no ethical education alongside it, that there that it sounds like the data scientists come together to do their task, but they don't necessarily there's no commonly accepted. ethical standard, which you said there there is for libraries. There's no sense of of where the moral boundaries might be. Does that ended up being completely idiosyncratic for person to person or is there a developing sense clearly, I mean, this is part of your project, obviously. Is there a developing sense of what ethical debate Data Science looks like or is it really just in our kitchens and and people just going everywhere?
Yeah, it's everywhere. It's chaos. Although there's progress being made in the sense that like, I just got interviewed by a journalist yesterday or a couple days ago about this question of the Hippocratic Oath for data scientists. And he mentioned that, you know, I wrote about such a concept in my book in weapons of mass destruction. And I sort of took my version of it from Emanuel Derman version, which he wrote for finance quants after the crisis. But, you know, at this point, according to this journalist, there's like, a dozen different versions of this big being floated around, and at the same time, there are still people who work in really important jobs, you know, I'll just name one person Yan Lacan, who's the head of research at Facebook. As recently as a year ago, when asked about ethical considerations for his job, he would say, that's beyond my paygrade. Right. So he was just denying the responsibility whatsoever of having an ethics. Um, so yeah, it's, it's, it's a chaos. And it really is sort of not equally distributed chaos. So like the people who are less likely to acknowledge that they're they have ethical obligations, are the people making more money from not having ethical obligations? So it's a problem?
Did people look at your book, in the book, you make a lot of claims, and you and you offer a lot of evidence to show how, and we'll talk about this after the break how the models and promoting racism and division and decide in advance who who is being punished or even just looked at it? Has there been a negative reaction to the book on the part of some people who say, you're asking us to consider things that are not in our paygrade? That you're bringing in ethical considerations? And that's not our job? Or did the data science community, whatever that means, look at it positively, and saying, Yes, you've identified a real problem, and we accept it, and we're working on learning from you.
I would say, I've gotten much more silence than either of those descriptions.
Welcome to writing a book.
I mean, I've gotten an unbelievable response from from people who read the book, don't get me wrong, but from the data scientists who work on this stuff that I'm calling out in the book, I have not gotten a lot of response. You know, I'll tell you the most worrisome response I got altogether, if you'd like, which is a third option that you didn't bring up, which is, I was my book was reviewed positively on in Breitbart. And it was really interesting because it was reviewed. Listen, only the first half of the book was reviewed, essentially. So you've read my book, you know, what I say is, we need to do much better than this, because this isn't okay. And it's, it's leading to all sorts of, you know, negative externalities and unintended consequences that undermine the relatively positive goals of these algorithms. And instead of improving the world, we're making it worse often, or at least we suspect we might be. And we have no reason to think we aren't, because we're not even measuring the failures of these algorithms. And then I go on to say, we can do better. And, you know, here's how you would do better. Well, of course, Breitbart didn't, you know, didn't cover that second part, they covered the first part, which was, we shouldn't trust anything. And that is their goal to undermine our trust. And what I worry about, and this is a philosophical, sort of existential risk of data science, really, what I worry about is that that is what will happen with the public, the public will hear about the bad things and stop trusting this, because it is not diligent. It's not scientific. It's not we're not holding ourselves to high enough standards as a data scientist. Or, in other words, we're not being scientists, we're not showing evidence, we're not admitting uncertainty. And we're, we're sort of representing ourselves as as showing up exposing objective truth. And when we're found out as liars, at least some of us, then people are just gonna throw it all away, as you said, throw the baby out with the bathwater. And that I think would be our fault. It wouldn't be the fault of the people that, you know, quote unquote, should know better and should trust us data scientists, because actually, I don't think we're behaving sufficiently admirably to be trusted right now.
When we come back from the break, I'd like to look at the process. We'll talk about models. We'll talk about how they work, we'll obviously go into the myriad of examples of the ways in which the models set up the problem wrongly lead to unfortunate consequences. And so please stick with us. You're listening to Cathy O'Neill and jack Russel Weinstein on why philosophical discussions about everyday life, we'll be back right after this.
The Institute for philosophy and public life bridges the gap between academic philosophy and the general public. Its mission is to cultivate discussion between philosophy professionals, and others who have an interest in the subject, regardless of experience or credentials, visit us on the web at philosophy and public life.org, the Institute for philosophy and public life, because there is no ivory tower,
back with jack Russel Weinstein, on why philosophical discussions about everyday life, we're talking with Kathy O'Neill, about big data and algorithms and whether or not it undermines our democracy. You know, I've, I'm a full professor, I'm actually the highest rank at my university that I can be a Chester for its distinguished professor. And there's a lot of nice things about that, including job security. But what people outside academia don't know is that being tenured and being a full professor is liberating in an entirely different way. I no longer have to be governed by student evaluations, I no longer have to worry about the data and the things that students say. And this isn't good because I get to stop caring about my students. In fact, their opinion is very important to me. It's that the student evaluations are incredibly flawed questionnaires that ask and answer the wrong questions. It's there's a lot of evidence to suggest that it's biased against teachers who are women, that it's biased against teachers who don't have perfect English. But there's a whole other set of things that that always bothered me, in addition to that, so for example, on the student evaluations, there's often a question, does the instructor know the material? Well? And this is precisely the question that the student can answer because the student is there in the class to learn the material. And the student is not in a position to determine whether the faculty knows what's true, or what's not. A contrast in question, which is a good question is, do you feel respected by the professor, which is a question that art department looks at consistently. And so what happens then is that people's tenure people's raises people's reputation at the school and at the department are based on these faulty questionnaires that then make the wrong thing. The priority. Professors respond by not giving tests not giving homework, trying to be entertaining, all these sorts of things that move away from education. I will say before I lead to a question to Kathy, my favorite example of this is early on in my career, I taught what I thought was the best class I ever taught. It was an introductory class, and there weren't a lot of student evaluations, but one person wrote on it, jack should iron his shirts. And, and it was kind of funny, but incredibly frustrating because it made my work invisible. And so Kathy, all of this is to ask you, to what extent when we develop these criteria, when we look at these models, to what extent are the questions wrong? How often do people get the basic structure of the research so wrong, that it actually creates more problems than it solves?
That's a real depressing, it's,
um, you know, and there's, there's lots of ways to be wrong. So I, I'll be precise about, you know, in a given example, I'll try to be as precise about what I mean by that. You know, so for example, the one that's close, pretty close to what you were talking about, because it deals with colleges was the book that in my is sort of second chapter of my book, I think I talked about the US News and World Report, college ranking model. And there the wrongness is not so much the question as it's posed to as it's framed to the to the target audience, namely parents, which is what's the best school I get? That's a great Question. I'd like to know that too. The problem is how they define quality of school. They define quality is secretly with proxies for, you know, all sorts of things about how and when
it's ready for just a second because you talk about proxies a lot in the book. And it's really important. So yeah, talk a little bit about what a proxy is. And then and then continue with the US News and World Report example, because it's incredibly interesting and powerful.
Yeah, yeah. So, proxy is sort of a substitute. Basically, if you, you want to collect data on something, but you don't have a direct measurement of that something. So you take a substitute that you think it'll do, it's close enough. And you almost never have the actual thing you want to measure almost never. So you almost always take proxies for whatever it is. So you want to be you want to know whether somebody is healthy. And instead of knowing that you say, Well, how far can you run? That's a proxy for help, or how much do you weigh. So there's lots of proxies for any given thing, and none of them are perfect, and you just make do with them. That is what we do as humans. It's not even a data science thing. But it is, it is a problem when the approximations are not very good approximations. And, and one of the things that's interesting about algorithms, if I go back to the US News and World Report's is that as your proxies fail, as as your Sir, I should say, as your algorithm becomes more and more powerful, the failure of your proxies becomes worse and worse. I think there's actually a name for this, if you know it, jack, something's law. You know, it's not the law that all conversations eventually lead to Hitler. It's the other. It's the other version.
It's not Godwin's law. I'll think about that.
Okay, you're right, there's a different law. And it's basically like, you know, if an algorithm if a metric is really, really important, then it will become perverted. And that's exactly what happened with the US News and World Report. So basically, the story is, they use metrics like, you know, the admission rate, how many kids got in that applied, and then how many people how many kids actually came who got in, and was the average LSAT score, and all sorts of metrics, things that they could measure. And that sort of indicated in their mind that this was going to be a high quality school, they also just asked, they did a survey of college presidents like what's a good school, that was part of it. And the most important thing to know about the US News and World Report, although we don't know the actual, we don't know all the details, but we do know these sort of broad strokes, the most important thing is they did not include the price. So they did not include the price, remember that kind of come back to that. And then what happened was, it was a hit, like it came out in the 70s. And people just loved it. People love lists. I just, they love rankings, they love scores, I just, that's some kind of human thing, which I don't really understand. But I think you touched upon that with your introduction, jack, about, you know, people love sort of giving power in dowing power to these numbers that are way beyond what they deserve. And that's exactly what happened. As as parents became more and more enamored with these rankings, that college administrations realized that that mattered a lot. So they started gaming, the rankings. And they did that by lying by cheating, by not lying or cheating, but simply by, you know, becoming more restrictive on who they would admit, just because they wanted their metrics to look good. It really transformed the entire landscape of applying to college, as many people can attest to. And I think the last thing I'll say is that the fact that price was never part of the original quality score of a college meant that price was, you know, that, that the administrators could sort of ignore the price increases as they were gaming this algorithm, because it didn't matter. You know, they could make it they can make it cost as much money as necessary to increase to improve the ranking. And the scoring system would not even see it. So it was blind, just price. And that is what perverted the most. I think that was the most perverse part of the whole college rankings and gaming system.
So So first of all, I remembered it's it's good hearts law, I think, yes, it gets named after after Charles Goodhart, if I remember correctly. And here's an example of something that you're talking about. Right. One of the things that that US News World Report asked and I learned this from your book is alumni donations. And so they thought that if alumni are willing to give donations, then it must be a good school. So then a school spends millions and millions of dollars to get more donations, and then passes that off to the students. And even though they're earning more money, in some sense, they're actually raised tuition to pay for it? Does it have a good Wellness Center? Well, our own university a few years back, built this amazingly gorgeous Wellness Center. And at the same time that it raised tuition in order to pay for student fees, really, it was firing faculty. So they moved up on the US News and World Report list in theory, but there were fewer faculty and the student teacher ratio went went south. And that's the kind of example right that you're suggesting that leads to this, what did you say 500% increase in intuition over over the last few decades, in order simply to move up a few notches,
right, and I don't want to pretend that I have a causal proof, you know, that, that all of that is because of this one model. But I do think it has an enormous amount of influence overall. And yeah, and I just wanted to go back to the good hearts law, because the, the real point is, like, whatever quality, like whatever quality score for a college should be, it should definitely have to do with the, you know, number of professors, and like whether they're, they're happy in their jobs, or whether they're freaked out that they might get fired, or that tenure might be removed, at any moment. But what we've done is we've sort of flattened the concept of a quality education into these, these sort of game of old metrics, and that's certainly not not a satisfying, a satisfying accomplishment. And but it's a typical thing that you see with these weapons of mass destruction, as I like to call them these algorithms that just go wrong and that undermine their original purpose, the original purpose being actually sort of exposing what what's a great college to send your kid to now we're just we're, we're doing something else entirely.
Let me let me push you on this because you talked about something in the book that I found incredibly interesting and caters to, to my tendency as as as a philosopher and as a person who of course, always gets everything right. As the Flutie effect. And what I want to ask, and then I'll explain what that is. What I want to ask is, is the US News and World Report misleading or flawed because ultimately, people make decisions for stupid reasons. And what I'm asking is you talk about how when Doug Flutie won this big game with a big Hail Mary pass Texas Christian University where he was playing for their enrollment rates skyrocketed, because this guy won an exciting game. Now, I'm not a sports fan. My long term listeners know that. But even if I were, I would get the fact that that's a pretty dumb reason to choose a college, that a big, exciting single game is not why you want to choose this place for an education. So couldn't someone come along and say, well, the US News and World Report is what it is, because people make decisions about colleges on reasons that really have nothing to do with education. They talk about sports, they talk about partying, they talk about aesthetics of the campus, they they you know, I have a lot of family friends now, who are their kids are looking at colleges and the kids choose based on the brochures, right. So whether they have an intuitive feel whether the college advertising speaks to them, and that's a pretty stupid reason to, to pick a college. So suppose someone says, or I'm asking, What if the US News World Report is what it is not? Because the data or the model is flawed, but because people make bad decisions? Is that an unfair defense of US News?
Yes, it is. By the way, I should mention that my my oldest son is applying for college right now. So I am very, very deep, deep into this. This thing. I think, I think a better just sort of back up a little bit. A better way of talking about the college application process is that it's just impossible to do it right. It's impossible to have the right information is to many colleges, it's it would take hours to collect the information on a given college to know whether you actually think that he would want to go to if you're, if you're the student, or if as parents you'd be either approve of. So it's, it's a like I'd say it's a very asymmetrical marketplace where the you know, buyers just don't have enough information about the product they're buying. Now, what you're saying is so people are dumb and I agree with you. My son didn't like the website for one of the or the way the the application website for one of his colleges and that was a bad reason not to want to go to that college but that was his reason. That doesn't mean that just because We're stupid that we should, we should instead refer all of us, all of us refer to the same other stupid system. Right? So, in other words, like, what what US News and World Report did was that they knew they recognize that this is an asymmetrical marketplace, and that people just want to be told something that they can trust. And they so they turned to the sort of people's love and fear of mathematics, their intimidation, and trust of mathematics. And they said, here's a ranking system. It's based on a very mathematical complicated model, but you can trust it, because it's math. And now, you know, and it was, it was false. And it continues to be false. I think it's probably worse now than when it started. Because of all the games that the administrators have played. I'm not saying it doesn't have any information in it. But the going back to my biggest critique of it, like, I do think, you know, as you said, people choose for stupid reasons. But I do think at the very least, any system that's telling us what, where to send our kids to college, should incorporate the question of how much does it cost. And that doesn't. So you might be able to talk me into a system of college ranking, that is better than the average stupid person, person's like individual system. But it's going to have to include price. At the very least now, I should say that the Wall Street Journal and The New York Times have both separately built new systems to understand colleges, and the Obama administration, at the end of his terms, second term, developed its own system online, which I think is now gone, for sort of parents to sort of, and kids to decide what they cared about to actually say what the matter to them and wait, all those the things that mattered to them. And then the rankings would be performed based on their choices. And I think that makes more sense. And it probably would not include, was there a spectacular Hail Mary pass at the end of the game? So it, you know, obviously, it would be edited. In other words, it was sort of it would somehow nudge people into making choices along certain lines, like price like, graduation rate, like how many students get a job after graduating, etc. which might not be the things you actually care about. But it's I think it's, it's going to be better than some random thing that the US News and World Report magazine built.
So so let's move from this example. I could talk about colleges all day. But but that's because I'm a college professor. So let's move on, from the nudging to the shoving. And I'm really curious to talk about the your discussion of crime and the discussion of the broken windows policy and the computers that that police now use to isolate problem areas. And the way that that sets up people to fail. Would you talk a little bit about what broken windows is? What's the I forget the term of the of the the computer program that you cite that that isolates individual squares of concern and areas? And how that not backfires is the wrong word, how it establishes in advance who they're going to move center to prison. So talk a little bit about that, because that also was incredibly important and was even more powerful consequences than just choosing a college.
Sure. So there's, there's really two different kinds of algorithms. There's policing algorithms, and then there's separately there's sort of downstream from policing, there's recidivism risk algorithms, which happen with tapping at the court, in the court system. But to back up a little bit. Yeah, so I'm going to back up even more than to talk about broken windows, I will talk about broken windows. But one step further back. I just want to make the point that most with that we don't have crime data. We simply simply have never had crime data.
And explain that.
Well, I you know, most crimes just don't lead to arrests, they don't lead to police interaction, they don't lead to reported, you know, reported crimes either. And when I say that, like just imagine what it would look like, if we did have data on every crime. I mean, what would that require? As a society that would mean we had a video camera in every room, and we had artificial intelligence that could recognize crime when it happened, it would be outrageous, it simply doesn't happen. So we don't have crime data. What we have instead are proxies to crime data. And what what we take as proxies are arrest arrest data, or reported crime data. And that's a really, really important, important thing to understand is that we are using proxies and they're terrible proxies, how terrible are they? It's hard to say, because what I'm talking about is data we don't collect. And I'm saying there's a lot of data we don't collect. But just to give you some indication, even with murders, where you kind of know there was a crime that was committed, only about half of murders lead to an arrest. With rapes, we happen to know that most rapes don't get reported. In fact, the the question of how often do rapes get reported is really, really hard and really interesting. Just to give you an indication of how culturally defined that is, since Trump was in office, the number of reported rapes in the Hispanic community in Texas has gone down by 40%. We don't have any reason to think the actual rapes have gone down by 40%. But it just, I'm just making the point that we don't have data on this stuff. And it's more even one more step beyond that. We don't have data on most pot smoking. Just to be clear, like if you if you know, someone who's smoke pot, you might have smoked pot. Did you get arrested? I doubt it. How often do you get arrested when you smoke pot? Okay, so if you look at it, from that point of view, you realize, Oh, yeah, not all crime leads to arrest, obviously, right? But flip it over, say, okay, of people who do get arrested for smoking pot? Who are they? Where do they live. And I should say that I'm choosing that particular example, because we have statistics on pot smoking, whites and blacks admit to smoking pot at basically the same rate in the same age groups. So we have pretty consistent pot smoking rates, among different races. But the number the rate of getting arrested for smoking pot is five times higher for blacks and for whites. And it actually depends on the on the precinct, like where you live, sometimes it's 10 times higher, sometimes it's two times higher. So it's much more a product of how the police behave in a given area. How much more likely you are to get arrested if you're black for smoking pot than it is about who smokes pot, because as I said, the rates of smoking pot are relatively stable and, and uniform. So that's all to say that the data situation in the world of crime is terrible. It's just it's very, very bad. We don't know what the ratio is, of bias outside of smoking pot, like I just mentioned, it's about five to one. Blacks versus whites. for smoking pot. We don't know what it is for rape. Because people it's really hard to find out about people who didn't, you know, basically, the smoking pot thing relied on us asking people, Hey, have you smoked pot and not getting arrested? But you can't ask that same question. Have you raped someone and not gotten arrested? You can ask that question, but you're not expecting to get reliable answers. So long story short, we don't have crime data. We have terrible proxies for crime data, we have missing crime data and the missingness is not equally distributed. The missingness is much more missing in the white communities and in richer communities. And the reason for that goes to the question of broken windows policing, broken windows policing was this theory of policing, which was very explicit, it said, if we arrest people for low level nuisance, victimless crimes, then it'll prevent us prevent those people from becoming violent criminals later on. And so they actually implemented this in the following way. They said, we're going to arrest people for stuff we would we would ignore in white communities. But we're now in poor black communities, and we're going to arrest them, in order to prevent them from becoming violent criminals. later on. That was the theory. But what it led to was exactly what I just described, which is that you have way more arrests in, especially in poor black communities. for low level crimes like smoking pot,
you give a really good example in the book of this where you're talking about in these circumstances, police are much more likely to arrest people for open containers for drinking on the street corner. And you said that, in this instance, black and Latino men are being arrested for crimes that happen every single night or you say Friday night in a frat house that that white kids are never going to be hassled for and that so there is the same exact crime, but because of the decisions that they've decided to make about the proxies and about who they're going to enforce it. It gives an entirely misleading picture of who's an alcoholic for lack of a better term who's who's a scofflaw for to use an old fashioned term. It sets it up when we know that everyone's drinking open containers, but only some people have been arrested for.
Right and I I'll jump in with another example. I don't think I wrote in my book but it's been bugging me. I live in in Columbia housing, my husband's a matter competition at Columbia. So I'm, you know, I hear about the sort of very consistent drug raids that happened on frat row at Columbia. And they could, you know, like, once every two or three years, they find that there's like dens of drug trade going on right around the corner on one 14th and Broadway. And I know that that kind of thing, when it's when it happens in Harlem, which is like three blocks north, you know, is is considered a gang bust, it's, you know, they they're talked about as gang members. And there's extra penalties because of it's related to this gang, which you know, which is somehow somehow become became more becomes more criminalized because of that word. And I just feel like, you know, what makes those people a gang, because there is it because they're black, and they're not in college versus the frat houses, who are largely white. I don't know what it is. But it's one of those things where you're just like, this is the same crime, but it's being treated very, very differently. And it's going to be in the data, it's going to be written down, because that's what I focus on, it's gonna be written down as a different level, a different category. I want
to interrupt here just for our listeners, because that that's a really powerful example. I want to personalize it for just a second because I grew up just north of that I grew up in Washington Heights on 100 75th Street. And I grew up in the crack distribution center of the East Coast, during the crack epidemic. And then during the Giuliani administration, Mayor Giuliani when they started this kind of policing, the early broken windows and things like that. And what you described is exactly right, a couple miles south from where I grew up was Columbia. And it was a college town. And there was occasionally this thing, but I grew up and there were police everywhere. Well, for a while there were no police at all, absolutely nothing. And then all of a sudden, there were police everywhere. And but it, it was it was two entirely different worlds. And yet, the crack epidemic and the drug epidemic was in college as well. It wasn't just in my neighborhood. And so I just want to interject that. Because here we have two people who are having a very intellectual conversation on the radio talking about data. That could be you know, okay, this is an intellectual puzzle. And those folks who are interested in this thing for political reasons could get angry. But here are two people who have been profoundly affected by it, because they lived in the area in which the data describes completely differently.
Yeah, and you know, you're a little bit off topic, but you're getting to something like a recent trend I've I've, I've become begun to notice when I go to talks now, because they're starting to be this, this new field of algorithmic fairness. And there's been a bunch of talks about these algorithms in the context of the criminal justice system, not just with predictive policing, which we haven't even talked about yet, but also with the downstream thing of recidivism risk. And it just, it kind of drives me nuts that these people who are just discussing this are typically, you know, data scientists who sanitize their language so much, that it's like, they forgotten that these are real people, you know, they're like, well, let's see, there's a statistical definition of fairness, blah, blah, blah, blah, and you take the data and blah, blah, blah. And I think part half of my frustration is the sanitized language. And the other half of the frustration is if you don't know this already, you would not you would come out of these lectures, not knowing that the data is as messed up as it is that you know, so that's one of the reasons I always start with, we don't have crime data, we don't have crime data. And I honestly don't think we are doing the right thing as data scientists, data scientists, when we when we don't mention that when we pretend we have prime data, and then we move with it. We just like, oh, given the data, here's what we're doing next. Like No, we should. We should start with we don't have prime data. And and then see if there's anything we can salvage I'm sorry to vent?
No, no, please, because I am so glad you brought that up, because one of the questions I wanted to ask is, is it's the content question. But but here is, here's the question I want to ask is, is, are human beings when they are the object of data sets solely objects? Or is it ever possible for, for someone to be a subject in a data set, and what I mean by that is, is are the people who we are recording and the actions that we're recording, so decontextualize, that all they become is a data point, and their identity and their individuality by design has to be invisible, or is there some way and for the data, to really recognize the fact that these are people that they do have a subjectivity and a personhood that we shouldn't violate by reducing them to just these these these data points. Is it possible at all to create a dataset that respects subject hood? Or did people have to be objects by definition?
Okay, two things, first of all, I don't think is possible within a data set. But I do think it is possible within the process of building a data, sort of automated decision system and algorithm. So I think the process of building an algorithm is a human process. And therefore, we can add to that process, this consideration of a human of human subjects. And so the second thing I want to mention is that I'm actually I'm not a philosopher, but I'm working with a philosopher named Hannigan from from University of Connecticut. And we're writing a chapter on ethics of artificial intelligence for this combined volume, which is going to be out of the world, because a lot of the people writing for it are really making a lot of money on on on this field. So I'm going to be very curious as to what they think ethics look like from their point of view. But what we're writing about is something that you might be familiar with, called the ethical matrix, and we're trying to apply the ethical matrix as a construction to the world of algorithms. And what it does is, and this is our contention that we should insert this into any algorithm, algorithm process, building an algorithm, it sort of explicitly names the stakeholders of the algorithm, it as rows in that in the matrix, and it explicitly names that are their concerns as columns. So in that sense, the the people who are being targeted by predictive policing are the people that are being scored by recidivism risk scores would be there, they would be they would be represented in the these the row of this matrix and in the columns, their concerns would be represented, like, we think it's unconstitutional, or we think it's unfair, or we think there's not enough transparency, or that it's inaccurate. So those would be columns of that matrix. And then the idea would be that the cells of the matrix would be color coded based on how likely is it that somebodies sort of human rights or legal rights or or constitutional rights are being violated or something along those lines? So legally, how risky is this, ethically speaking? This for this, this stakeholder in that with this concern? The short answer to your question is we don't consider people as subjects. But we absolutely can. We absolutely can. We just don't. The way, the short, another way of saying that is like, right now, the ethical matrices that these these big companies like Facebook are working with, or the one by one matrix, just one by one, I'm the only stakeholder that matters. Profit is the only concern I have. It's the one by one matrix. And what I'm saying is we need to expand that view, we need to consider more than just you and more than just profit.
Okay, so suppose Facebook comes along and says, well, jack, we are in fact trying to account for subjectivity. Because when we have user data, we're not just looking at I mean, we look at clicks, but we now know your political views. We know your sexual orientation, we know who your friends are, we know what gets you angry and sad. We know what movies you like, and music he likes. We have from your clicks, a complete picture of your personality. And so what we make money because we know you as a whole person. What's wrong with that argument? Why isn't that subjectivity in the way that I was describing it?
I mean, thanks for teeing me up on this one, jack.
You know, why they pay me the small bucks?
I mean, look, I think it's completely clear to anyone who thinks about it for more than a few seconds that the proxies they're using for us as human beings are weak, if, if at all valuable, right? So just to name for one reason, I have a specific goal in mind of who I want to seem to be when I'm on Facebook, right? So I am giving off a certain image of myself, I want to be happy, I want to be successful, I want to be social. When I don't feel happy, successful or social. I'm not even on Facebook. So just for that reason alone, this is not a faithful representation of me. Not to mention that I clear my cookies whenever I can. And I do things intentionally, strangely, on social media, because I don't like them to think that they know me, right? So there's all sorts of ways that I can sort of explicitly try to mess up their image of me. You know, so. So that's the first thing but I should mention that I reviewed this book, and I'm blanking on its name, but it was written by the guy who used to be the chief data scientist of Amazon and he had a really really troubling philosophical sense view of data science. Namely, he just basically asked us to consider sort of turning over our freewill to the algorithms that just thinking of me, he didn't even say it in that way. Because I think it would have been too obviously a stupid idea if he had said it like that. But he just, he just kept on making the point that like, these decisions that we're trying to make, they're so predictable. And it would be so much easier for all of us, if we just let the algorithm do it for us. And then, you know, He even went to the point where he was suggesting that there should be an algorithm that decided this, decided who gets who gets like the kidney transplant, based on whether those are the person, the people that based on a list that ranked people based on their social utility. So there'll be some kind of scoring system deciding how socially useful a person would be. And if they were very socially useful, they'd be higher on this list to receive the transplant they need to to live. And it was very, very Orwellian. And so I might my point, isn't that like, he's wrong, because he is wrong. But that's not my point. My point is that there is a group of people living currently living and working in Silicon Valley that thinks these things are good ideas. It's very, very frightening.
It's it's, it's the the ethics professor, I mean, that's ignoring 150 years of ethics. I mean, it's, it's the worst. It's the worst reductive version of utilitarianism. We had we had Peter Singer on the show a few months ago. And I think even he the most consistent utilitarian alive, probably would would would bristle at that being the soul matrix. So but Okay, so so but I want to push this Facebook thing, and then we're going to go back to the predictive policing. Um, I want to go back to the the people are stupid problem for a second, because there's another aspect of the privacy. There's an aspect of the privacy debate that bothers me that I struggle with because on the one hand, the argument for privacy, I think, is perfectly clear and justified. But the things that Americans want to keep private seem very strange to me. They don't want people to know what books they buy. They don't want people to know what what meals they eat. They don't want people to know, what porn they look at, which I understand for some people, but they don't object to the fact that there are cameras on the streets of every city. Now, they don't object to the fact that there are on highways, there are license plate readers, it feels like, because we're a consumer society, we care more about consumer privacy than we do actual privacy. And does Facebook, take advantage of that? Does Facebook cultivate that push us to not care about the right things? Is Facebook, the result of that? What's the relationship? Why do people at least in my view, have the wrong priorities about privacy? And, and get upset about the things that that, I guess, are the least harmful to know? Or do I might just wrong, that those things are much more dangerous than I think they are?
Okay, I have a funny answer. And then I have like a true answer. Like, the funny answer is that like, I, at least until recently, was completely convinced that within 20 years, you know, everybody will know everything about everybody. So like, in particular, when when somebody runs for election for Congress, like if we don't have naked pictures of them will be will be like offended. It'd be like, hashtag show me Show me the dick pic. Like, where's the dictator?
I have been telling people for years that in 20 years, everyone's gonna be naked on the internet. So on we, you know, we You and I, at least we have found one piece of content. Okay, please. Sorry, go on.
I have a sense that we have more than one piece of common ground.
I guess I guess the longer answer is I don't care about privacy. And it's not. And I don't understand other people's concerns about privacy. And it's not because I actually don't care at all about privacy. It's because that the privacy conversation is so messed up, because it is almost completely controlled by people who don't really have to worry about privacy instead of being and I don't see a route to it becoming controlled by the people that actually do need to care joonie to protect their privacy, which, as far as I'm concerned, going back to the predictive policing are these poor black kids living in the over policed neighborhoods with as you say, a video surveillance in every single corner and also possibly in where they live in their places of residence because they live in the projects. They have. I really am I really
need you to But I really need you to talk a little more about the people who don't have to worry about privacy that is incredibly interesting. Who are those people? And then transition into into into the particular piece? Is it because they're so rich? Or is it because they're public figures? Or is it because they have they understand the tech enough to hide themselves? Who are the people that control the conversation? And why don't they have to care?
Yeah, it's exactly what you just said. It's like, and I and I don't think they're bad people. I just think that they're distracted by their own sort of selfish concerns. Like the people that control the conversation, the E FF people, their libertarian techno nerds, mostly white well off peak men with jobs, who think about privacy all the time. And I and I've been gonna have like haters hating me about this. But I, the problem is that, like, it's a class thing. It's a power thing. If we should worry about privacy, we should worry about it because of the power. asymmetry. Not because somebody might know that, you know, you like to see her sex or something. You know, I think, I think we haven't actually asked enough questions. We haven't examined our assumptions enough around what matters around privacy. And so what I've and the other thing I'll say, by the way, because I worry about algorithms as sort of my object of interest, not not not data, is that many of the examples in my book, center around scenarios and contexts where there is no right to privacy. So we didn't talk about the teachers yet. But the teachers are being scored by a statistically terrible algorithm. That's almost a random number generator. Could they complain? Maybe it's a secret, they don't have very many rights, but in particular, they have no rights to privacy on what's happening in their classroom. For that matter, people who are getting trying to get a job applying to job have to answer questions, if they're going to get that job, they don't have the right to say no, I'm not going to answer that question. I mean, they might have the technically have the right to it, but they don't have the power to do it. And for that matter, the people who are being policed, and the people who are being our they're standing as criminal defendants in the in the jury room, in the court system, they don't have rights of privacy. So privacy is not the thing that I care about, the thing I care about is justice and fairness. and fairness and justice have something to do with data. But it but let's let's be clear, the people that are the most vulnerable, the people that are most likely to suffer under unfair and unjust algorithms are the people that have the least amount of privacy.
That, that that is it. That's an incredibly well articulated justification for my feeling about privacy, I would never have been able to put it that way. But I think that that's, I think you're right on target. So so so let's get back then to the predictive policing, because after that, I want to ask you, I have to ask you about the Russia Facebook scandal, but then I want to ask you what to do about it, because we we are running out of time, or what to do about the data. But but but let's go back to the people without power in the predictive policing and where we left off, which is we hadn't yet talked about the predictive policing computer, and the way that continues the problem of choosing who to enforce the rules against and who not to
write. So what we just as a quick review, we don't have crime data. But we do have arrest data. Arrest data is very, very lopsided. And there's way more arrests for all kinds of crimes in poor black neighborhoods, because that's where we have historically sent police with our theory of broken windows policing. Now, I'm going to tell you what the predictive policing algorithm does, it takes the location of historical arrests to try to predict the location of crimes in the future. So using this proxy for crimes, which is arrest, it says, Okay, well, if the arrest all happened in this neighborhood, the problem the more crimes are probably going to happen in that neighborhood. Therefore, let's send police back to that neighborhood to look for crimes. And so what they end up doing, as you won't be surprised to hear is they send police back to the same neighborhoods over and over again. But instead of calling it the theory of broken windows policing, because we it's out of Vogue now to say that it's politically, you know, it's it's touchy, because it's clearly such a racial profiling issue. So we don't say that anymore. And we say, No, we're using scientific policing, but we end up doing exactly the same thing.
okay, and part of the issue is that people don't the people who do have power don't experience a see you You postulate, you offer a scenario in which they use stop and frisk. In the the neighborhood in San Francisco, I think was it the golden parks, I can't remember what it was. Give that scenario of what happens if we take the stop and frisk and the sun of policing and move it into the rich white areas. What would that be? Yeah.
Well, actually my favorite current example of that, I think that was that was in Chicago that in the book, but I think my current favorite example is like if we, if after the credit crisis, we sent all the cops in New York down to Wall Street, and we arrested the bankers. And for that matter, we could have just shut them down and look for drugs in their pockets, because lots and lots of traders snort cocaine in the bathrooms. We didn't do that. We didn't criminalize bankers. We didn't criminalize people who worked on Wall Street, even though they committed plenty of crimes.
And the fact that infinitely more people than then then the the petty crime that is being caught in the in the poor neighborhoods,
which are almost all victimless, I should add. But yeah, so I mean, and so I, but I just the thought experiment is if we had done that, then the data would have exhibited this crime spree going on in Wall Street, and the data would have said, Hey, send more cops down to Wall Street, because that's where the crime is, you know. So in other words, what we're doing is we are, we're establishing a police practice. And then we're using this historical data information, which is extremely lopsided to predict the future of policing to be exactly as it was in the past. So we're propagating whatever practice we have already established. So instead of like what I like to do, just to be cutesy about it, instead of calling it predictive policing, I like to call it predicting the police.
It's also the David Hume problem, right? David Hume in the 18th century asks, Why must the future be like the past, it's just a matter of, of belief. We don't know that. And and this is, this is a self fulfilling prophecy in the same instance, the future is like the past solely because you're acting exactly the way you were in the past, and you're just getting the same results, and you're not changing anything.
Just to be clear, I think it's a little bit worse than that, you know, it'd be okay. It'd be bad enough if we were doing that as humans. But when said we're doing this, by adding a mechanism that is supposedly making things more fair, because it's important to remember that when these scientific this so called scientific algorithms are introduced to these systems, they are brought on as mechanisms that will make things more fair. So we have this like added layer of trust, we are actually trusting these things more than we would have trusted ourselves. In other words, if we didn't have that system in place, we might actually say, hey, should we be doing this? But we have this system that's telling us, oh, this, don't worry, this is scientific and objective. So you should be doing this.
I, I want to sit here and think about that for a few minutes. But I can't because I'm on the radio. But but but that's I think, I think I think that's right. And I think that's part of what I was trying to get in the beginning is when we have this veneer of science, this veneer of math, we we excuse things that we otherwise would never excuse. And so so let's transition for a second. Right now we're dealing with the aftermath, or we're still trying to figure out the scandal about Russia advertise Russia in interfering with the American election system. And one of the ways that it does it, is by creating all of these targeted ads to people on Facebook, who are siloed in environments where they only encounter opinions, who that they agree with, and that there is some question as to whether or not these fake news stories. And these, these these, more than fake these planted news stories affected the election to such an extent that that it changed people's minds. So I guess the first question that I have for you is, do you as a data scientist believe that this, that this tactic really had a strong enough impact on the election that changed things? And second, do you think that Facebook is culpable for the consequences and for allowing these the these organizations these these Russian proxies to to do it? Or are they are they as they often claim a neutral platform and the people who post and the people who advertise are responsible for their own content?
Okay, so I wrote a piece for Bloomberg two weeks ago, that said, Russia, that Facebook could do a lot more about the Russia ad problem. And in it I outlined and a scientific test, based on what I think they can do a natural experiment that I think they can perform on their historical data. And in other words, I think they have the data that they can use to decide whether the Russian ads affected voter turnout. And I think they could do it. And I think they haven't done it. And I think or they have done it and they didn't want to show us the results. And so I'm really glad you asked this question because it is exactly The point I wanted to talk about at the beginning, which was we haven't put the science into data science. And and this is exactly the kind of quite I mean, you asked me Do I think that the effect was big enough to change the, you know, the election? I don't know. That's, that's not something you can guess that's something you have to measure. That is exactly what data science can do but isn't currently doing. We have not asked Facebook to look into whether something influences our elections, because we haven't asked them to do it. I mean, literally, there's no reason for them not to do it, except for the fact that we haven't required them to do it. And so the second question is, are they a couple of COBOL? Absolutely, they're comfortable. And just to be clear, that they know how to do this stuff. They're not like, ignorant, they spent quite a bit of effort to prove to their advertisers who are their customers, just to be clear, the customers are political campaigns and political pools of dark money. Those are their customers, they spent quite a bit of effort explaining how powerful the influence of Facebook is showing them that it affects voter turnout, the I voted button if you remember that. So the I voted button was being tested in the 2012 election cycle, as well as 2016. And Facebook went to some trouble to brag about how much it mattered. If someone saw the I voted button, that it really changed turnout. So they on the one hand know that they're, they're having an effect, and they're showing this to their customers so that they can get more money. But at the other hand, there's they're sort of asking, acting as if they their platform couldn't possibly be used for nefarious purposes. It I mean, they just can't do both. They can have both. And I think people are starting to finally figure that out. So yes, I do think I mean, let me let me ask you this, like if Facebook isn't culpable for this, who is?
Well that's that that certainly is the question, right? is is is this just a conglomeration of nefarious evil doers with their mustaches? And their and their and their I don't know what pipes I don't know what evil do or smoke anymore? But But, um, but or is it? Or is it really this media company? And and this is this is, of course, been a problem since or a question since Napster and that you know, are, are these programs responsible for what goes through them. And Facebook, at least to me, feels incredibly responsible because they do more than host it they curate. And and I guess I guess that's part of what I see the algorithm doing in Facebook is it's a curator, it's just a curator, a very clearly defined curator that's interested in making money rather than making people happy. Because of course, all of the data shows that the more time you spend on Facebook, the less happy you really are.
Right? They have that algorithm, they have an enormous amount of money at stake. They they profit enormously from their system. So yeah, I definitely think they should be held accountable. But I do want to agree with you that there are evil doers. And, and actually, the evil doers are all of us. It's even worse news, right? We are terrible on social media. And so I don't want to say that it's all about the Facebook algorithm, because it's not it's also about the way we we ourselves act on the Facebook. And so in other words, like, I liked the idea that Tim Wu put forward last week, or maybe earlier this week, of having an alternative to Facebook, you know, a sort of non commercial version, I sort of a public common online public common. But let's be clear, there's going to be fake news there, there's going to be conspiracy theories there, there's going to be like crazy stuff happening. That's, that's messed up. Because there's going to be people, even if there's no algorithm. So I don't I don't want to imply and I don't think Facebook is culpable for all of that. And I don't want to imply that. Once Facebook does, tweaks its algorithm, all these problems are going to go away because they're not. There's there's something about this, that is the nature of social media, there's a nature of being somewhat anonymous, and not having to see someone cry when you hurt their feelings. Like that's just part of part of their modern life. And I don't know what we're going to do about it.
That is that is a subject that I really want to have a whole show about. In just a few minutes we have left. Where do we go from here? What do we do about it? You talked about the Hippocratic oath. Could you talk a little bit about that about the ethics matrix? What's the next step for data scientists? And then how can we as being part of of the group that you just described, what can we do to contribute to make a more ethical environment that doesn't undermine our democracy in the way that you articulate in the book?
I think we should start with respect to the algorithms and the ethical matrix and the concept of ethics, I think we should start small. I think that Facebook algorithm and the Facebook effect on democracy is possibly the hardest problem, even though it's the only one that people seem to be really aware of, there turns out to be a lot of algorithms that are being used for decisions, like who gets welfare, who gets this job, who gets fired, who goes to prison, who gets policed, these algorithms are much more finite in nature. And so in that, in those contexts, an ethical data science or a framework for ethics along the lines of the ethical matrix, or something else, or some kind of Hippocratic oath, like I don't really care what it looks like. But stuff that takes into account, everyone who's involved, all the stakeholders, all the consequences, all the constitutional rights that might be violated, or what have you, desperately needs to happen. And it's going to be hard, because right now, people are very comfortable with their one by one ethical matrices, which say, I'm in charge, I'm in power, I own the algorithm. And what I want is blah, blah, might be money, the blog might be efficiency, it might be lack of accountability, because that's another thing that these algorithms are very good at distributing to their masters. But we have to expand that view. That's the answer to that. In terms of in terms of Facebook, the Facebook algorithm again, that's the hardest one of all. And I do think you need another hour, at least, to think about, you know, how can we protect our democracy? And it's, it's not at all clear. Because it's hard to measure? How do you even what is even a proxy to democracy. So we've talked about the difficulties for proxies. This is a big, it's a big one. So I'm kind of going to deflect on that. And I'm just going to finish by saying once more, that what I'm really calling for, is, in addition to think expanding our concept of the stakeholders and the and the consequences, that we're, we should be putting science into data science. So every time there's a question that people don't know, the answer to, like, does this like and one of the most important questions is for whom does this algorithm fail? So does this algorithm fail more for women than for men? Does this algorithm fail more for blacks? And for whites, etc? Does it fail more for people with disabilities than for people that don't have disabilities? Every question like that deserves a scientific test. And we can do it and we just haven't done it. So let's start doing it.
Well, Kathy, this is this is one of those instances where I have to use all of my willpower as host to enter the party, because I could keep you here for about six more hours. This is we've just started and this is incredibly fascinating stuff. I will have a link to your blog, math, babe and your books, including weapons of mass destruction on the website. Thank you so much for joining us on why.
Thanks so much, jack. I have had a great time.
You've been listening to Cathy O'Neill and jack Russell Weinstein talking about algorithms, big data and its effect on democracy and a whole host of other things. And I'll be back with a few more comments right after this.
Visit IPP ELLs blog pq Ed, philosophical questions every day. For more philosophical discussions of everyday life. Comment on the entries and share your points of view with an ever growing community of professional and amateur philosophers. You can access the blog and view more information on our schedule our broadcasts and the y radio store at www dot philosophy and public life.org.
We're back with why philosophical discussions about everyday life. I'm your host, jack Russell Weinstein, we've been talking with Kathy O'Neill about big data about its effect on democracy and about algorithms. And that's where we ended up, we ended up talking about how to fix algorithms. And what she asked was whom do the algorithms fail? Do they feel minorities? Do they fail women? Do they fail people with disabilities? Do they fail the poor? We don't have that answer, because we tend to focus on the successes, we tend to focus on the profits. And even when those successes are sort of make believe or those profits are on the backs of other people. We think of them as just great results. And I think part of that is because it's mostly invisible to all of us, right? I don't know what the algorithm looks like, is it five variables on one line of a computer is at 750,000 lines with you know, 75 different alphabets as substitutes? I don't know. I don't know what it looks like. Because I'm not a computer programmer. What I do know is that Facebook doesn't feel satisfying anymore. I don't see the people I want to see and everyone seems really unhappy about it. Everyone seems to be yelling and complaining and upset. But the real problem as Cathy points out That that's what we're talking about. We're not talking about the policing. And we're not talking about the questionnaires that people have to take before they get jobs. We're not talking about the way that teachers are harmed from evaluations that they don't know how they work. We're not talking about the smaller things that we can fix. We are in a transition time. In our lives, we're in a transition time in the world, we're going from an analog world to a digital world. And we're going from a world that exists primarily on paper and in person to a world that exists primarily virtually on computers. And if we are going to make that work, we have to look at the basic stuff. First, we have to look at the more foundational philosophical issues. First, we have to look at justice, we have to look at fairness, we have to look at equality, we have to look at truth. Those questions don't change just because we're talking about algorithms. Those questions don't change just because we're talking about math, math looks objective, and a proof is a proof. It either works or it doesn't. But we're not talking about that kind of math. We're talking about information about zillions of things put in a way that is disguised to look like it's science. And it's not science. It's sociology, it's anthropology, it's political science. It is philosophy, it's psychology, it's all the things that we've been struggling with from the very beginning. The medium has changed. The computer is bringing us new circumstances, but the human problems have not. We are still asking the same questions. We are just being diverted from the places that we have to ask them about. With that you've been listening to jack Russel Weinstein and why philosophical discussions but everyday life Thank you for listening. As always, it's an honor to be with you.
Why is funded by the Institute for philosophy and public life Prairie Public Broadcasting in the University of North Dakota's College of Arts and Sciences and division of Research and Economic Development skipwith as our studio engineer, the music is written and performed by Mark Weinstein and can be found on his album Louis Sol. For more of his music, visit jazz flute weinstein.com or myspace.com slash Mark Weinstein. Philosophy is everywhere you make it and we hope we've inspired you with our discussion today. Remember, as we say at the Institute, there is no ivory tower.