The California Accountability Project: Combining Cutting-Edge AI Technology With Traditional Reporting Methods
3:00PM Aug 26, 2023
Speakers:
Keywords:
ai
reporters
tip sheets
legislator
state
california
stories
cal
matters
data
work
project
build
transcription
great
organization
tip
democracy
information
news
whatsoever so yeah, so it's like I believe there's a site, generative ai news.com which is largely like a middle professor that is brought in other stuff. Nick dot Kapil is he put out. I forget if he wrote it or somebody else did, but they basically did a round up of for the organizations that have put those back guidance standards principles publicly, sort of like a synthesis of who's who's saying what, and things like that. Yeah, but it's it's tricky right now because it's like a full time. Like, I'm glad that there we obviously there are people who are starting to bubble these things up. But it's like a full time job to be like, Okay, so now I need to go over here and I gotta figure this out and who's doing what shows
up on the side or like when the reader is going through the content? That's what that part is interesting. Where does it make it on their policies to be highlighted during the story reading process or
getting to the to get to those consistent conventions, like even if civilians don't necessarily pay attention like photo credit conventions? You know, are consistent at least and it was like that and so it's like, how do we get there for for this stuff as well,
I just think there's gonna be a tsunami that hit just gonna be like the dam breaking. This is coming. You can see it definitely all of a sudden it's gonna be here. You know, I was just meeting with the digital first senators, you know, about this project and telling him about, you know, digital first and all that have a whole taskforce looking at how they can do stories with AI. Yep. And, you know, as soon as that starts happening, it's going to happen. Yeah, that's it's, it's going to happen fast. Mary. Whenever there's a year this happening like weather related updates, smaller piece of folks. Streaming like a party next door. Oh, yeah. Game show that the year over year prizes to get tipsy.
Give us a minute.
Once a professor.
This assures that is not my I think that might be what they were. Firstly
that is an old one sighs that thing it's got to be eight pounds
right all right, well, I think we can get started. So next door is a game show. It's gonna start lively and probably stay lively. Robert Hernandez puts people on the spot. It's called blank is the future of journalism. What those folks in that room don't know is that this right here is the future of journalism. So you're an hour ahead of them. So congratulations. Welcome to the future. And welcome to Day 37 of Monet because that's how my liver feels. I'm Mark the valley. I am a Program Director at night foundation night funds and supports the infrastructure of local journalism in the US and a whole variety of other stuff as well. I'm joined by two great colleagues who are going to I'm just gonna tell you who they are and they're gonna tell them they're gonna explain themselves. So to my left, here, we have Dave Lesher. Who are you?
Well, good morning, everybody. Thank you for joining us. I'm Dave. Lesher. I am the senior editor at Cal matters. I'm the co founder of Cal matters and former editor and former CEO. Before that I was a couple years in various think tanks and then many years at the Los Angeles Times and as a political reporter.
In sadness had to go pan.
Hi, I'm Seth and I set it open. I lead product at Cal matters. I've been there about two and a half years and the product team works on the reader experience newsletter experience and managing the infrastructure to delivering news to all the readers of Cal matters. So I worked on essentially the first version 1.0 iteration of legislator tracker which is what this will be updating as well. So that's my background. And before this, I built my own news media company that was focused on news for a very specific demographic, the youth so that's what I did before I got to Cal better super excited to be here.
That's great. Thank you. So we have a couple of goals for this session. In sort of hopeful angles that you're going to get out of this. This is largely a case study. So past present future of this of this specific project. Ideally, it's helpful if you happen to be a reporter based in California. This is this is useful to you. Basically today, source very soon. One of the things that we're looking to do with this project is expand it to other states. So if you are here from somewhere else starting to think about what would need to exist in your state in order for this to work there as well. We're eager to build those connections and understand that there's also this as a as a case study for a type of project. There's a lot of interesting work happening between different organizations, universities, news organizations, etc. That I think is a model of a way of partnering between orgs to be able to build these things that are greater than the sum of their parts. I think we put AI in the session description because we're legally obligated to do so. There is some AI in here, we're gonna we're gonna get to that as well. thing I'm personally excited about with this is that this is there's a clear concrete problem and there's this is a long running project at this point. And so the AI work has been done in service of solving a problem. Not to say that we can't or shouldn't ever just be looking for solutions in search of problems, but that's not the case here. And then finally, because this is really about empowering and helping reporters being able to do their job, I think the model and this is the handout we're going to talk about in more detail around around tip sheets. Really also trying to figure out where we can be doing that type of thing as an industry in other places to really make reporter's lives easier. And also, you know, create a dialogue and feedback on what specifically can be helpful about these types of things. So before we get into all of that, I guess just to level set a bit so you both work at Cal matters. David, you had a hand in making Cal matters Cal matters, especially in so what is it? So Cal matters. Let me ask this. Does anybody who doesn't know what Cal matters is? Great.
Good Well, most people do. That's great. We're eight years old started in 2015. We're a nonprofit news organization based in Sacramento, but we have reporters all over the state. I think our staff is up to 62 now 40 Something in the newsroom. We cover the major policies or major issues in the state of California, really looking at the state government but also just the major issues that bubbled up from the street and throughout the state. We share all of our work with media throughout the state. So our stories are a newspaper and we're on radio almost every day all over the state. And then you know, we that's the thumbnail
it's great. So and then then there's this project that sits within Cal matters in its original name is Digital Democracy, which is coming just in time because analog democracy is not long for us. So can you can you talk about what this what this is within calmatters?
Yeah, so this is I just stepped down as editor about a month or six weeks ago, to start to build this project. It's really been something we've been working on for quite a while. So it's within Cal matters. But you know, from you know, the work I've been doing from the LA Times and the think tanks and at Cal Matters has really been working a lot around state policy and state government and state politics. And one of the things you recognize right away is how especially invisible the process are and the is in the policymaking process in Sacramento and the legislators themselves. So you know, I mean, one of the problems that I really thought we needed to work on as journalists was really to kind of bring more transparency to that policymaking process and make you know, those individual legislators more approachable or understandable or accessible to their constituents. And so, so that was the problem that we tried to or there we're addressing with digital democracy. You know, the Digital Democracy came about you know, it's basically pulling in a whole bunch of data from throughout state government. You know, we have campaign money, we have transcripts from every hearing and every floor session. We have money, data on votes and legislators and political registration and election results from the districts. And all of that goes into a database and then the AI rolls over that to identify stories, tips for reporters. So I mean, the idea here and the hope is that they're, you know, that a relatively a handful of reporters can really meaningfully cover 120 legislators in California. And, you know, I have kind of three goals coming out of this one is you know, that that you can change behavior by creating this transparency I mean, you know, legislators just vast generalization but you know, when you're when, when things are pretty much invisible. There's really no reward for you know, casting this vote for the public interest. When it you know, is a politically risky vote, you know, what, because of special interests, and there's really no penalty for, you know, siding with a special interests on a vote that you know, is going to raise issues for public interest. So, you know, I do think that when there, there's this kind of transparency, legislators are notoriously thin skin, they will know this is out there. Just that it exists, I think will change behavior. There's certain lists like you know, we get we'll collect all the data and all the gifts that they get, you know, and print the top 10 lists, who took the most gifts and you know, it's not a list you want to be on two years in a row. So one goal is changing the behavior of the legislature. Another is that since there's you know, this, this process is so invisible, you know, if for those who want to engage with the state, you know, legislators are with an issue that they care about, you know, it's really hard to find that portal or that information, especially, you know, something that's trusted or, you know, not from the Legislative themselves or their opponents or or some bias source that there's a nonpartisan, trusted source of information to access to understand who these people are and what they're doing. And so, you know, I think the second goal is really to encourage more civic and civic engagement. And then the third is really to help journalism mean you know, like, I say, the, a lot of the like, or across the country, a lot of the State House coverage has been diminished quite a bit in California, you know, the fifth largest economy in the world. The the press corps is pretty is a, you know, a fraction of what it was before. And so you know, if technology can do a lot of the reporting that we used to have when there was a giant or really robust press corps covering state government, then you know, I think it can really we can do a lot more journalism with fewer reporters and using technology to do a lot of the reporting so so those are the three ambitious goals
to pull out one thing there which which I'm not a Californian, but I'm married to one. And I can tell you that, you know, for those who aren't in California, anytime you go to California, you land at the airport, Gavin Newsom shows up and reminds you that it's the fifth largest economy in the world. The stakes are high. Yeah. And the consequences are high, you know, especially here and so that's one of the reasons why it's such a great starting point testbed for this. But when you know, accountabilities, sibling, you know, especially with a with a funder hat. On is impact. So curious about, you know, obviously, Cal Matters has a good track record of demonstrating impact for Californians through reporting. And this is, you know, the way you're describing it as sort of an accelerant for that for Cal matters and the 250 editorial partners around the state. So we have, can you just talk through one example of how that cycle has played out and sort of how you see this sort of speeding up and happening more frequently?
Yeah, like I said, we have from the very beginning really done a lot of our distribution, reaching a giant state like California through our media partners. And so we've had various different kinds of kinds of collaboration with local media, and really have a good relationship with the major newspaper chains and radio, especially in some, some especially recent work with television. But I mean, one example was we have a project on inequality called the California divide, where we we embedded reporters at five newspapers all over the state and coordinated the coverage to cover you know, 40% of California almost is in or near poverty, and you can't cover California without covering that. issue. So that was a collaboration and there was a lot of impact out of that on on legislation for farmworkers for there was a lot of COVID workforce issues, wage theft issues. This one was a collaboration with CBS TV a local local CBS channel, it was about, you know, PTSD and mental health issues in among California firefighters. And it's really I mean, it's a great series I would highly recommend it. It's very well written by Julie cart former Pulitzer Prize winner from the LA Times. A tremendous writer, but you know, the I had no idea about how severe this issue was. There's suicidal issues. There's a lot of PTSD and mental health issues among firefighters in the situation's they've been in, especially in California in the last couple of years. So, CBS TV and Julie watts. The reporter at CBS collaborated with us on this story and we won cnpa California news Publishers Association Award for the series and then and then we won an Emmy. Our first Emmy at Cal matters for the story that Julie Watts did so, you know, I mean, we do we have done a lot of collaboration around the state and had impact through that. I think the Digital Democracy level is going to be a whole new a whole new chapter because we're looking at individual legislators too. So we're not going to do just statewide stories, we're going to really look at you know, this legislator in San Diego, you know, and write for the regional media and work with them to write to so that there's better understanding in that region or in that city, about who this person is and what they're doing. And so there's more opportunity for direct impact
in with the with those thin skinned assembly people, the idea that when it shows up in their districts in media that they know their constituents are paying attention to on a daily basis that that that they have to respond to that that's part of that behavioral change these that you're talking about, as well.
Absolutely. Yeah. I mean, these stories are, I mean, they were going to place them in the media in their districts. And when there's a story about them in their local media, you know, they they're really, in a lot of cases they just have to respond. So they will get I think, you know, that will change their behavior and and raise more awareness in the in their district about who they are and have him that's impact.
And so one really meaningful and interesting interface for this is for reporters and journalists around the state directly in addition to that, there's an opportunity to serve Californians directly as well. And so Stephanie, that's one of the things that you're you're thinking about what's the public presentation of this data so that people don't have to wait for a reporter to go dig through it? They can they can do it themselves as well. I'm curious about how you see that fitting in with the rest of the color matters offering and sort of how that how that gets placed.
So we in 2021, rolled out the legislator track or glass house, which is intended to be a directory for legislators and so that process was super helpful because it helped us understand beyond that reporter or somebody who's kind of plugged into this, how can we think about an experience for you know, the regular Californian, because one of the things that we discovered was that Californian readers who might be interested in this they really break down into very specific groups. There are people you know, there are parents on the PTA board who want to understand what's happening with education. There are people who are plugged into this on a daily basis because that's part of their work. There are people who and who are an advocacy groups who want to be kind of informed about what's happening with legislators. So it what it really breaks down to is how can we present and the legislator tracker in 2021 was kind of the version one so what we'll roll out with digital democracy is kind of this on steroids in a way, so there's going to be a lot more in it. But how can we think about a meaningful experience that caters to all those groups we know that reporters will know exactly what they're looking for and how they're digging into things. And we know people who are really plugged into these topics know exactly how to navigate and they have more complex needs. So we started thinking about that when we were developing this and so the question really comes around the user experience and the user design, especially as we're thinking about complex use cases to how they search through the information and sift through the information. How is that going to look? So that is something that we need to factor in as we are building this out? How can we create what is one cohesive, welcoming experience for everybody who's interested in there, so that is a challenge to so far and that in version one itself, it presents some interesting questions. And the second part is also the assumption here. I mean, for us, the legislator tracker as it is now majority of the traffic comes in from Google somebody searches for Katherine lexpert And she they land on her page and that's how they're digging into the legislator. But if you're looking at the stories about on Cal Matters website, so stories that might be talking about her or about the work she's doing, and she's in some way connected to the story. So what we also think about are, how can we lead a person who's just reading a story who's not necessarily doing Legislative Research? How can we lead that person to or give them a kind of a clue on who this legislator is? So we have these cards and these stories that says, hey, if you want to learn more about this legislator, click on this and maybe they will go through and they will dig in further into the legislator, or they can just get a snapshot of who the legislators while they're reading the story itself. So for us, it's an opportunity to think about how can we experiment folding this into the overall experience as well, where it's not, it is a destination site in a sense, people will go there to research and the people who are in the know will but for somebody who's just reading stories on topics they care about, how can we kind of clue them in and lead them into learning more about that legislator? So we had a couple of you know, interesting ways to do that in version one, and it'll be interesting to dig into it in version two as well.
So I do vote, but sitting here and thinking about this I realized like, I don't know who my state rep or senator is in Miami. I'm curious how many folks know who they who represents them at the state level? Like half Yeah, right. So so I think when you think about these stories that may not necessarily be directly about the individual but is about the what they are doing or not doing for you and a topic area you care about, you know, economic mobility, wildfires, things like that to actually know and this is the person behind it. You know, super, super helpful to make that connection. So, back in the these halcyon days of 2007 and prior we used to have just every news organization had like 50 State House reporters running around, capitals, making sure everything was just a Okay, right. That's no longer the case. Although obviously in some states like California, there's there's a decent Statehouse core. Some of this is about helping those folks but so much more of the opportunity in this era, particularly for the burgeoning number of publications that are truly locally focused. is being able to put this information in the hands of folks who are not Statehouse reporters, they're covering a beat or in an area and this is a way for them to get essentially pushed information so that they're not having to stay on top of what's happening in Sacramento all day every day. And the tip sheet which many of you have, if you don't, Professor chase here is going to walk around if you want to if you want to see this more closely, I'll put it up on the screen here too. Is is a way of trying to be able to provide that alert of some sort of interesting thing that's worth following up on from a reporting perspective. Can you Dave, talk more about the tip sheets what's going on here? What you know, sir, how do you how have you been thinking about the purpose of this and making sure that it is genuinely useful to reporters and it has a time savings?
Yeah, I mean, I think this is just fascinating in a remarkable document. This is you know, that giant database that we're putting together this is what the AI produces out of it. It's it's the AI has been trained on what a story looks like and what what kind of anomalies there might be, it can pick up in the legislative process. And when it sees one, it generates this tip sheet. And, and to me, this tip sheet is like it's got all the elements of the story, literally the first paragraph The tip is written by a bot. And you know, amazingly they have metrics for when the bot can use the words significant or controversial based on how much testimony there is or how much debate there is or whether it's there's a split vote and things like that. So, so that the tip is actually fairly analytical. I really was surprised it was written by him. But But But anyway, you got the lead. You've got the nut graph there about there's the background on the bill. It even pulls a quote from the hearing. So you can put that at the top of the story and then it can pull live images so you can have a photo out of the hearing and then it has all the data about the relationships between the vote between the legislator between the supporters and opponents of the bill. It's really the body of the story. So all the elements are on that tip sheet. And then you know, I mean, I do think reporters still need to pick up the phone. I think there's there's things that we want to confirm and and there's things to elaborate on. There's also you know, the tip sheet is also linked. You know one of the things about this AI not like LLM 's or Chechi Beatty all of the data is from official government documents, and the tip sheet links, everything it has on there to the original source. So as a reporter you can trust the information that's in the tip sheet. And you know, if you want to look and you know the tip, the AI will pull the quote but if you want to go look for a better one, it'll take you straight to that discussion in the hearing. And you can look at what other quotes there are. And same with all the other parts of that data that they did the data on that sheet. So in addition to the tip sheets, reporters will have a dashboard to kind of query the database, you know to look further into some of the things that are generated on the tip sheet. So you know, I think we are we're gonna have a couple of reporters working full time just mining stories out of this, this tip sheet and and sharing them with all the relevant media that you know, relate to that bill or that legislator. And then you know, I've been going around the state talking to editors all over media, had newspapers and radio about, you know, if you have a reporter who's available and has, you know, can cover stories at a legislature, we will give you access to the tip sheets, and you can customize them. So I just want to follow the legislator from San Diego or I just want to follow environmental issues, and it'll send you a tip every time something comes up that it's found in the data. So you know, hope to generate a lot of good stories off of that.
So I'm a reporter in San Diego. And I get an email from Dave bought 5000 with a tip. I already have a full time job. Like I'm already working like five stories. So this is this is a thing that feels like somebody should publish this. So what else are you doing beyond just finding your way to my inbox to help me out?
So we one of the things where this is part of the reconnaissance I'm doing with all the media right now. Just telling him you know, here's what we are doing. It's gonna roll out in January, you know, and there are various ways we would like to be able to share this. I mean, the power of what we're trying to do is create transparency in the legislative process. So the more eyeballs the more, you know, district level constituent level, access to this we can get is what we're working on. Right now. And so there's three different ways that we've talked to local media about how we can share what we're doing here. One is that we're going to build a page for each of the 120 different legislators. And it will be it's being designed by a web firm called 10 up which if you know him they're there. They've been great. It's a global web firm that's doing this a lot because they really care about this project. But the point is, it's going to be designed for a broad audience. You know, the idea you come to this page, it's not going to look like it's for insiders. It's not going to overwhelm you with all the data. It's really going to be you know, when I take a look at this, this is for me, this is for, you know, a constituent in a district who wants to understand these issues. So we will build these 120 pages and we're going to white label them for media that want to put their local legislators on their websites. So you know, that can blend in with their webpages. And it will be automatically populated with, you know, data updates and with any relevant story that's written about that person. The second way is that like I said, we're going to have full time reporters mining these tip sheets for stories and you know, we will share those stories with the the, the relevant media or with anybody who wants them, but you know, some of them will be directly relevant or related to, you know, certain areas or topics. And then the third way, like I said, it's, it's for those places that have a reporter who has time and the interest and availability to access to write stories now and then about the their legislators or about the legislature, you know, we can give them access to the tip sheets, and they will they can customize how they want to get it. So there's three different ways that you know, we hope to reach as much at California as we can with with this tool.
That's great. And so, you know, we're talking about a lot of pieces of this this system, right? This is a pretty sophisticated thing. And it's it's been developed over over the years at the risk of putting up shocking slide. This is what it looks like. So you can you can see that there's there's a lot going on here. And we're talking about multiple entry points to this to this as well. So if something I'm curious about if they're, you know, the all these AI pieces that we're talking about and things like that are there any any sort of highlights of what's going on here that that would be helpful for folks in the room.
So this might be one of the easiest way to simplify the different pieces to this. So the best way to think about how the system works is we are working with Cal Poly in San Luis Obispo. Their team is helping us essentially with what I would think of as the left part to this. So when you're looking at the bottom there, that's all the legislative information that we are gathering, parsing and translating to make it available through an API for all these outputs. So you're thinking about bills, you know, votes, financial data that is made available. That's the bottom part. The top part is where the transcription happens. So whatever is being all the conversations all the meetings, all the details are transcribed. And that is essentially making the transcriptions for what the legislator said when they said all of those available. They feed into essentially what you're seeing in the center there the API. So that is really making all of this information available for us, which again feeds into where you see tip sheets AI, ti part. The tip sheets essentially implements is generated using AI based on this data, what did a legislator say? And well, how is that an anomaly from what we expect from from this particular legislator? So this tip sheets are generated. So this would be thought of as almost the back end so far until that API part and what we're really focusing on right now is the green section, which is the outputs that come out of it. So we had glass house and 2021. And so this will be that you know, powered up version of that. So that is essentially that public facing tool where anybody could come and start looking at what the details are to their legislators. They can dig deeper if they want, or they can look at surface level information about their legislators. The Syria search and directory that's really the tool that is going to help reporters or people who want to dig in further. How can you dig into the data to look at basically what are those related search queries? You know, how did this person vote in a certain way on this particular topic? So looking at that related relational elements to it, the CDs and search is going to its intended in no sense for that power user. It's maybe you know, a Californian who's curious enough might dig into it. But that particular output is intended for the power user who really wants to dig into the details to mind through the data and is using kind of the front facing element as a starting point for their research. And then we have the tip sheets portals, all the tip sheets that are generated, how can reporters you know, subscribe to them? How can they set up alerts so that they get tip sheets that are relevant to them? So those are the three front facing competence that we are working on right now? Because all of this is required to be set up for us to even conceptualize what's coming up in the front end. So that might be the most simplest summary or what was amazing.
I was, that was great. So we've talked about Cal Poly a couple of times in this and into a think sort of surface the history there and in their role. You know, there's this this public policy group that has been there for a while and this this, this first version of this is really kind of incubated there, catalyzed by an investment from Arnold ventures back in 2015. And so what is what was the thing inside Cal Poly that gave birth to this or how did that come together itself? And what sort of what did they How did they get from this agitation of a breakdown in public policy to this thing, which looks appropriately like it's from 2015?
Yeah. So it was just kind of by coincidence, Cal matters and, you know, digital democracy. We're both started in 2015. And we were in dialogue back then we both were starting up. We and had a lot of interest in in kind of creating more transparency in the in the state government. But Digital Democracy started at Cal Poly San Luis Obispo where they have a tremendous computer science department. A lot of people go from there to Silicon Valley. It's it's very well known in the state for for technology. But it the Institute for Advanced Technology and public policy was created in 2015. And that is to create a digital democracy which was this public facing page but it was started by a Republican state senator Sam Blakeslee and in partnership with Gavin Newsom, back when he was lieutenant governor, and and some others but you know, the the interest was in just the problem that I've been talking about, you know, I mean, not many people knew what was happening at the state policy level and with their state legislators and and that's a big problem to have a healthy democracy. So so that's what so they they started in version one. You know, a lot of what we were doing now that was was created then they gathered all the data that we're talking about most of it we've added a couple more significant databases in this second version, but a lot of the data they gathered and you know, especially the creating the transcripts, in the video from every single hearing and every floor session, and they they, you know, they had this, you know, a web page where you could really dig into what's happening in the state. The problem is that, you know, you kind of had to know what's there you had to know what you're looking for and then you had to go find it and so I think in v two, there's two things that I think are the big, the big difference in in what I call the, you know, the last two miles. One is the AI you know, you don't have to go find stuff with all this data. Now the AI is actually going to, you know, search everything there and kind of prompt you with tips. The other is, you know, that, that the process, the digital democracy is now in collaboration with Cal matters. And, you know, we're a journalism organization, it's our job to take that information and distribute it. So, so those are the two biggest differences. There's a lot of you know, we are doing a lot more collection of money, the various different kinds of ways that money comes into the process than the first round and some other things but but those those two, two steps are the biggest difference. The other thing you know, you can see. Back then they they went after they did it in California, they went to Florida, Texas and New York, and they did the same thing in those states. And like, like in California, you could, you could search I go to their website search a bill number or a legislator or a key word, and it would pull up pull up, you know, exactly in a transcript where those where those items came up, and it would link right to the video of the of the hearing. So it's a tremendous tool, but they they they built it in those other states and obviously every state has different databases. So Cal Poly became very familiar with, you know, how to how to do this, how to replicate this and other states and that's what we're going to be looking at too.
If you build it, they will come right that did not happen here. Yeah, exactly. And so nowadays, it's about getting getting the information in front of people in the right ways at the right times. You know Sapna like every product owner in a news organization. Congratulations. You're now an AI product owner. I'm curious about you know, this thing has existed for a while it avoids some of the more exotic things that we're grappling with today. But as you are rolling this out, within your organization and making decisions on behalf of hundreds of other organizations as well, are there any aspects of this from an AI perspective specifically that you're grappling with or that you feel that this is a good use case or case study for?
I think this might be the cleanest use case. Where we at least know what we are solving for. So it's, it's to your point, it's you know, we're looking at a solution for a problem. We know it's not a solution looking for a problem. So I think that AI part to it. There is a human element on top of it. So this is not quite you know, AI giving you the perfect solution for everybody to run with. So even when we are thinking about the transcriptions, which is one element of it, so we're using a transdermal the Cal Poly is using a transcription tool. And we've all used those tools before they're not perfect, but there is still a huge human layer on top of that, that has to correct and confirm that the suggestions by the tool is actually something that we want to incorporate because really, we're looking for editorial accuracy here at the endpoint. So making sure that there is a human element on top of that, that is either accepting or rejecting what the AI is recommending for some of these transcriptions is an important part, the video part of it as well, there's you know, facial recognition might be somewhat simpler now than it was a few years ago. So there's kind of elements of how we are using AI within that as well. And definitely the tip sheets because the tip sheets are essentially entirely powered by AI and the idea is based on how reporters are, you know, referencing or the general trends that they're providing the tip sheet is really looking for those anomalies and trying to under identify what is going against what is the general reported trend and really looking for facts and figures? To support that and to produce that in the tip sheet. But then ultimately, it's a tip sheet, it's still on the reporter to really look at how they're going to use it, how they're going to verify certain aspects of it. And so it is intended to be the starting point. So that makes it not necessarily easier but that we have those layers with on top of AI so this is not AI just telling us what to do. This is giving us enough leads for us to follow on or fact verify before it's being put in front of, you know, whoever the user is so there is an element of this is not the most you know, Chad GBT fun version of AI but this feels like, you know, the use case where we know we're solving for a real problem, which actually can be solved mainly by AI wouldn't be possible if there wasn't an AI aspect to this,
and that ongoing human in the loop commitment through that through the whole data processing. Where who are those humans?
At this point, there are Cal Poly students who are working on the transcription part of it and how they are verifying fact checking, accepting or rejecting what the AI recommends. And then ultimately then point would be when the tip sheet is generated, it's on the reporters to also follow up on what the leads are.
Great. So yeah, but never ending supply of students actually powering these things behind the scenes in exchange for pizza. And they get paid, they get paid more. I shouldn't even joke about that. So we had this thing in 2015. And now we have this cool thing. And we figured out you figure it out. I shouldn't say like I just showed up a year ago and was basically in all seriousness, Paul Chung was hit by Simon back. He's still there. He's I was gonna say nice things about him. But now he doesn't need the ego rub. The I showed up a year ago at Knight Foundation. And I was basically gifted a very clear concrete thing to fund. And one big piece of that was from 2015. And this sort of living for a couple of years recognizing that this citizen facing site for everyone and no one was not gaining traction, that there needed to be some pivot in order for this to be effective. And so one of the principles behind this is that Cal Poly and and a few others actually dug into that and tried to figure that out through a research project that was thankfully written up in a paper at ISFJ. So then I had it was like great. I was like, I'm just like this. Can you talk a bit about Dave The and what that process was, and sort of what the what the outcomes were?
Yeah, absolutely. You know, digital democracy v one was up and running for about five years. And it shut down about three years ago. But you know, one of the things I absolutely love about working with these computers, the computer science faculty and followed Cosmo is is really the one who's the lead at Cal Poly said they, they look at problems and they go Hmm, okay, how are we going? To fix that, you know, some of these data sources that we're pulling from are intentionally designed so you can't do what we're trying to do. I mean, literally, there's a form about lobbyists and what bills they're working on, and they put the numbers of the bills so close together, just so you can't scrape them. And these guys look at that and they go, how are we going to do that? So we did it. But anyway, this was the problem with they looked at the problem with V one of digital democracy. And so how are we going to how are we going to fix that and like I said, they were looking at those second two miles, the adding the AI and adding a journalism, you know, collaborator and and so to do that, they went out, you know, and did the survey and did this report that you know, to test whether there would be interest if they built this and they worked with, with Paul and a knight. grant to do this survey and focus groups, but you know, the results were, you know, in a way not surprising. I mean, there was 98% of about they had 193 respondents to the survey 98% said, you know, news at the state level is important. You know, in significantly they 37% I believe, said 37% said they felt well resourced to cover that news. You know, and then most about 80 something percent said, you know, if they had this kind of a tool or better access to the news about what's happening at the state level, that they would write it, they would cover it more. So the so, you know, there was a report that we had with that we call manners also, you know, we brought relationships with media throughout the state that we have been working with for years. So we when we put together this idea about V two, you know, we had a really good I mean, we had a good idea that, you know, if we could build this AI that could give tips to reporters that there would be an audience for it among among the media and that it would make a difference. In the state if we could do it so
that the product discovery rigor that went into that report, right, actually, putting some putting a version of those tip sheets in front of the reporter saying is this specific thing useful was really helpful because some of these questions, you know, it's it can be like, should we have cleaner air? Yeah, sure. So, so that that was a that was a big piece of it is the the depth of the effort there. Okay, it's two lightning round questions. And then we're gonna we're gonna open it up to everybody here. So in this current phase, where now things are, you know, there are multiple organizations working together to rapidly get this stuff out the door, even in the past. Couple of months. And I'm curious about any any things that you have learned along the way and specifically where you've made you've adjusted course, and then getting this up and running?
Yeah, definitely resources to this. I mean, I lead product we are a small team and we are not necessarily equipped at this point to be building out such a powerful tool. So the I think when we started looking at the different roles that we would need to actually implement this project became, we came to the conclusion about six to seven net new people that we would have to go and find. So that was useful to discover early on, because right now we are working, as Dave mentioned with an agency that is going to help us build it, which is really thoughtful about how they're taking in the requirements, how they're thinking about that user experience. So it's really good to have a partner in how this is getting built. out because Cal Poly is the partner that is helping us power this and then we have to think about how do we have a partner that's helping us build it out. So that part was definitely a learning. It's nothing that is easily doable within an organization. And the second part is it just you know, goes back to the user experience. I mean, we're talking about a combination of search and the ability to filter down so when it when some of the earlier discussions happen is this kind of like an Amazon where you're just able to kind of like search but then filter things down to find what you're looking for, or is is a Google and that's still a conversation in progress because the idea that this is intended to be one go to tool is great. And even when you saw the Digital Democracy earlier version, there was this button like you know, for Californians or something like that, and then for reporters, but if you are really looking at a unified tool that is supposed to cater to all these groups, what does that experience going to look like? And how do we make sure that it doesn't turn away one group over the other and really build an experience? So that is still a work in progress? And I think that really makes us think about the user experience in a way that's very different from a new site where obviously we're thinking about, you know, articles and stories and who the readers are, but this just has such a complicated nuance to it, that, that aspect needs to be really thought out for this to be a successful kind of usable tool for everybody. So it's still a work in progress.
Thank you. And so Okay, so Dave, it's September 2024. We are pretending there isn't an election in a handful of weeks and we're in the ball, you know, Ballroom in Atlanta at Oma. What do you want to be true about the state of this project when you are telling folks in the room about it next year.
So there's a lot to be done between now and then you know, I think we have three phases between now and the next. Oh, and a one is where the construction we're doing right now, which will be done in January when we're planning to launch publicly in January. And there's, you know, between 10 up and Cal Poly and Cal matters, I counted more than 30 people building this thing right now. So the end, you know, we're in meetings constantly. So I mean, it's the construction is a lot of work. So that's between now and January, January the California legislature comes back into session so January the spring, January till the summer. will be you know, really up and doing this in real time. You know, feeding local newspapers, having reporters have access to the tip sheets, you know, seeing what kind of ideas the story that the AI generates, you know, just really getting this thing running on its own legs and then you know, going into the summer, we're going to be looking at, you know, moving to other states, we'll be looking for the partners. I think the easiest steps are to Texas and Florida and New York where you know, Cal Poly is already familiar with the databases that the state the state generates, but you know, I think we having hadn't had a record of what this thing can do in real time. You know, we'd like to go find other states where we can start to launch it for 2025
That's great. Okay, so we've got like, 10 minutes I've got a microphone. I'm around
questions, Neil. Oh, great. Thank you.
Julie from the San Francisco theater. I was wondering you know, you watch a lot of San Francisco city government commission hearings. Are you going to open source this code? So maybe you can take it and build a tool for San Francisco rather local? Maybe?
Yeah, absolutely. I mean, I think this could be applied to city and state governments and school boards. And I mean, once this is built, it is replicable and all those places and yes, it'll be open source to do that.
One of the funny parts about this is that the creator of this the guy that did mention the White House in San Luis Obispo. We asked him how do you expand this, the city council's board supervisors school boards, is that we build it so specifically for legislator legislatures, it's going to be easier to go to other state legislators first, where they have the same structure to houses and committees. And then we'll figure out how to do more complex ones. So the San Francisco standard solves that first, also,
a trip games with New Mexico and in Mexico where we have some familiarity with the State House. I know that in New Mexico we collect different data sources based on laws and regulations. So like we don't collect as much information on lobbyists as nearly California so I'm trying to think about how we can prepare because we're okay with pushing for disclosure of lobbyists disclosure, spending money in the past as a news organization, how we can prepare for making 2025 March I know we can talk because I know Dave, right. But how does the best way that other states can prepare not California and Texas not Florida for this role?
You know, I mean like I say every other every state is different. And yes, some states don't have as much information as California. So I think an inventory of what data sources what you that would be best to tell the stories. I mean, as a journalist, you can sit down and you know, you know what, what you would want to know what are the influences on the legislative process like I mean, I look at really three of them that are really powerful in on an legislator one is obviously money. One is the district that they come from, and you know, are they in a competitive seat or a safe seat or, you know, what's the demographics the power the level the you know, all that data is in there. So, and then the third is, you know, that that legislators background usually and what kind of topics they are interested in, but those, you know, look for, list all the data sources that will feed you know, everything you can see to try and parse out why this person behaves the way they do or why that vote went the way it did. And then that's the starting point for the, you know, the Cal Poly or the computer scientists to come in and, and you know, what can we grab what can we pull in and then overlay the area and then
I guess my advice from meeting Bay Areas that are really appreciate the project, I want to make one point and then ask a question. On product. The point is that doing as we do building technology in this sector, for journalists, and users, the importance of sticking, like funders staying engaged over a period of many years, in tradition, that is funders get really excited. fund something, it doesn't do exactly what it's supposed to do. They walk to the next thing, the ability to pivot or have a technology development process ongoing long enough to actually pivot to meet the needs that you're using. It's it's what journalism needs and and it's fantastic to see this project.
Great jumping on that quickly. Because I mentioned Arnold was in for the first part of this and when I didn't say is that they came in again. So this round is is night and Arnold together go funding this this stage and that what that that was not, I don't wanna speak too much for them. That was not a fait accompli, right. That was not a we're going to support this forever, no matter what. And it took Neil and others Cal matters to sort of help figure out how to how to reframe this. But But I think that part of that was the obviously this the opportunity that we we all see here. But the fact that there's there's such a high degree of overlap between their very clear funding, model and priorities and what this aims to aims to serve as well. So that's like that's anything in service of that, right is when there's that true lock, it works better.
And I will get to a very short question and promise, but also just a lot of hype on AI obviously, it's been the center of the conference. The profound impact of ALM is lowering the cost of transcription and doing transcription a lot better and doing summarization really well. So against all the hype, these are such good applications of advances that frankly, my organization wasn't able to do three years ago for cost of transcription. So like, this is the AI revolution making this possible. I'm really excited to see the search into an extraction from transcription. Is that something that you're exposing through the API can any journalist do a keyword search through abstract quotes from policy makers? That be an awesome feature?
That's not right. Yeah, in fact, you know, I mean, all of this is public data. It's just very hard to find. So we're pulling it all together in one place. And then you know, what we want to do is make all that publicly accessible. Put build a dashboard, that anybody can come in and search any of this data all in one place, you know, and, and certainly the transcript in the video. I mean, that was in the in the first digital democracy version, but you know, we'll certainly have that again. In the second one.
We're super quick, tangibly, yeah. Most chunks of this and overtime, all chunks of this. The code itself is available on GitHub. I think the project name is digital democracy. So I think as the as it's being that those pieces are being resurrected that's, that's available there to see how that works. That doesn't that doesn't provide the exact API that you're asking for today, but that at least shows the reference point.
Actually, in the question I was going to ask you about that like, well, parts of this are already available, apparently right. So it will be the best place to find the beginnings I'm
sorry already mentioned. I you know, there's there's nothing available right now for this that you can see the the first version is still the you can if you get the right URL, you can find the first version it's still accessible but this will be up in January, you know, and that's when we'll be doing the, you know, the tip sheets will start to generate and the stories will come out. There'll be this search interface for the public and then the web pages for the legislators. That's that's all to go up in January.
Two questions one is a quick I was just wondering if you all sorry, in Florida and North Carolina called creo. You recently use chat TBT to help produce some data or scraping trackers can you get for me for investigative project and it generated is AI hallucinations, some ISBN numbers around banned books that took extra work to fact check because I was noticing all these mistakes. And so I started with some inaccurate data. Lots of lessons there. Have you all seen this with the programs that you're using? Is that a problem? Or is that ultimately language models? And
yeah, that's, I mean, it's this is not a large language model. It's not chat GPT it's a it's a AI tool. That's called statistical inference. So it's just wrapped around this database, and it can only pull from this database and everything in the database is an official government source and everything that it generates is linked back to the original source. So so you don't there is not a hallucination problem with this AI. You know, it's aI because you know, it's actually a machine that actually goes through and looks for the patterns in the data to that might identify a story. I mean, they actually train the AI by reading 1000s of stories written out of Sacramento and identified what they call 25, phenom ZZ that are indicators of a possible story and then, you know, in addition, it'll identify anomalies in the process. It identifies the patterns of how the policy process works. Like if somebody votes against their money, it'll generate a tip. If somebody votes against their party, they'll generate a tip, you know, so, but anyway, it's all contained within this database and there's no it hasn't read the entire internet lecture GPT
How were the things that we just been pleased that smaller newsrooms or medium size experience to have a project like this in house 40 Plus on I'm in North Carolina, have longevity. We've learned so many little tech projects that kind of fizzle out when people get laid off or when there's a lot of transitions in the organization. So wondering how we can keep something like this sustainable
you know, I mean, I, my if this is successful, I, you know, I mean, the the template will be there and you know, it's going to evolve fast. I mean, I do think there are ways that AI in the chat GPT LLM is are going to come into play here somehow, but obviously, there's a there's a large you know, you know, it's not ready yet by any stretch, but you know, I mean, I don't know, you know, I'm, we're trying to make this as accessible as possible. And, and, you know, I think if it can live up to what we're hoping then I think, you know, there'll be a lot of access for a lot of different media. There's
will support it nationally as we build that out. So the poor thing is always supported from one place. We're talking to Shannon doe and the local news funding North Carolina and so the people that your ways to share with groups of publishers and make it available, any local misery condition is going to have to figure out how to do the reporting on the information that generates and determines earlier question. When they first started this project 10 plus years ago, one of the things that was tied to was the change in legislation and opposition in California to make some of this data more available. So ideally, this can also be kind of a prompt for some legislative actions and some transparency data in the States as promised transparent and so that plus our intention to keep this up as a long term, supportive thing where a lot of it comes from us and then the local newsrooms, take the stuff and then it's up to us reporting resources when it against
there's there's scaffolding there. That's independent of the technology itself. Right. And so I think one of the things as we think about the expansion is, what is the organization in Florida that is performing the at least part of the role of Cal matters and being the Florida experts who can work with small news organizations right there, like there should be ideally there's one technical installation at maximum per state. Right, and some of that can be there can be one for the entire country. But that's I think it's really about making sure we're not duplicating efforts in and some you know, I look at North Carolina, I focus on Charlotte, because it's a nice city as a place that is more collaborative than a lot of places around the country. So it's a good testbed, but that's I think it's really going to be about using that as a as a lever for kind of making participants collaborate as well.
And just to add to that, just the model for maintenance manageability going forward, it's very different for this project. So the idea is how can we think about supporting this going forward and what elements of that would be applicable in other use cases? So we need to define that model and how that is looking for us. And then what part of this is going because there is an ongoing maintenance and manageability from a technical point of view from you know, even the front end point of view that needs to be factored in for us
to collaborate, and that's exactly what sustainability wise I mean, it sounded widely useful about, you know, the people that are working on this and that you're going to do like a profile for each of the assembly members. It sounds like it's a really big job. So are you guys hoping that with AI all of that is going to be less and what is the current workload for each legislative session? I'm just wondering, again, sustainability, and I like the idea of public policy really influencing how this is gonna go forward with transparency because I feel like if it's there, and it's unusable, it makes no sense. So I look to your thoughts
you know, I mean, what I'd like to do is just make you know, any any reporter that is his ability or interest to write about the state, we will give them access to the database so you know, and make it as convenient as possible. So, you know, I mean, Cal matters will be the host, but, you know, my hope is this, we're just creating a tool that any news organization around the state can build, and then the though the webpages that we're building also, you know, that's something we can make available as a white label.
What's the manpower involved in making it happen year by year? So what's the expert? Yeah, no, no for you. So the viability of the system, like what goes into it to make sure that it's gonna be available for all to use? And with AI, you know, advancing is the hope that manpower gets less and less than that, whatever machine is making this work, because I don't know. This is not my expertise is a going to like, learn 10 do it in an easier way where the manpower is less. I mean, what is the expectation? Yeah, and then as I went to policy, painfulness, sustainability for years to come.
The part of that as this technology gets better the cost of doing this comes down the cost of all the servers Microsoft comes down. But part of the idea is that and frankly, as a CEO, part of my job is to kiss our donors, assets and feet are nice to them. And Mark has been an amazing partner. This is an income that is not only coming in and giving us the money in which you have to, but actually getting in the trenches with us and helping us figure this out and pushing us on some tough questions to make it worthwhile. What night and the army ventures folks have committed to is a five year runway the last couple of years are almost committed to but the idea is that we have five years to build this and make it effective and make it so important, so valuable in so many places, but we'll find more sources of support for it. So the cost comes down over time and the number of supported the sources of support as improved value of processes. So we fired tip sheets Victor 25 times a day he cranks out amazing stories, and then people start to notice more donors start to reach up and support journalism and that plus the technology helping posture make it more sustainable. And
there's also I think, by thinking about the the layers of a technology investment that we make one time an operational investment we make once per state and then sort of, you know, capacity within small news organizations. That's a variety of different funders. And I think the thing that I was drawn to in having this city situated with Cal matters, specifically is the idea that Cal Matters has access to fundraising potential that no local news organization in California can because of the scope of the mission and the scale. And so I think that that helps as well of finding the right homes for for the work.
And you know, I mentioned there's more than 30 people working on this right now. That's just to build it. I mean, the ongoing capacity is much, much smaller. Yeah, yeah.
It's a big one time Yep. Fundraising machine in the Central Valley far bigger than count matters. And normally the sad
thing is that last question. Last question. No pressure.
Okay, yes, real quick is it's kind of a two four for So I mentioned the way this is gonna work. Just sort of average for boys. Like, for example, let's say we want to look for Congress and validate those record. gun reform. So do you have a match like a search engine, you can type in those kinds of terms and then you would get the results? Second question is, obviously this will be great for looking at how read legislators vote who's supporting them while they're in the legislature, but during election cycles. How would this also be helpful for candidates that aren't in office? Yeah, to find out some of those interests. They don't have a record,
per se. Yeah. Yeah. Obviously, this is all about incumbents. And we can't do that for every challenger. But you know, we're looking at the next election cycle. And you know, a lot of the data we're pulling in about money and the district and the election results and things like that. We can capture a lot. You know, we've done a voter guide every year, and the next one is going to be much more robust because we're pulling in some of this data. I mean, the their money is in the same data source as the incumbents. So, you know, we can capture money for the challengers. We can capture their bios, we can, you know, we can pull their press releases and social media. So it won't be quite as robust as everything we have on the incumbent, obviously, but But you know, I hope to have grow into the best voter guide we can build out of this. And the cert first question, you know, about validate. Oh, and yes, that's exactly. You can, you could, you could, you know, set up tip sheets for this for the state legislature, but, you know, tip sheet that just says validate on gun control, you know, and whenever that comes up, you'll get a tip sheet.
It's got to be most of this has to be state legislature, but yeah, California representatives, you know, Congress and so what will be really good information on that as well.
Not in this database. This is just the state legislature. Yeah, yep.
legislative bodies restructures.
So you're in California, it's January 1, you want to tip sheet. Who do you email
day that count matters.
There we go. Sapna Dave, thank you so much. Thank you, everybody. Great questions. And for spending time with us. Thank you. That's great. Yeah, thank you. Well done. Thanks. What's What time is your flight 230