you ever struggled to set up your template here this year. The only trouble is my students always try to use two different scroll almost the same. It's definitely if you have to check lipsticks. Yeah. Someone's like I have a question and I'm like, did you try to get to I'm sorry, it wasn't three
seconds oh that that's advanced. Yeah.
Yeah. Watch yourself there. Well, thank you for that. I think that's a great resource.
Yes. Their internet's not doing very well. So it's up there. Yeah.
What's
up right Day. Evening. What's up?
All right, I'm gonna know I will get Wow, okay. All right. One of these should go down
maybe
all right. I'm gonna hold this so far away from my face, and hopefully we'll survive. I'll give us another like 30 seconds for people to come in here. And then we can get rolling. Not even 30 More like 10?
Yeah, I'm not gonna touch me that Hello. Welcome to my talk AI tools from before and beyond chat. GPD is all anyone talks about his chat GBT? And we're gonna talk about kind of chat GBT but kind of not chat GPT So mostly, we're gonna not talk about it, but it is going to come in. Mostly because charged up tea is a great excuse to use a lot of tools that are not charged GPT but kind of existed beforehand. No one trusted them in a newsroom. People were scared. They were too technical. That was terrible. But now everyone is so jazzed about chat GPT you can wave the wand of AI around literally anything and your boss is probably like this is great. And so this is going to be a selection of this is great that you can use Additionally, if you kind of go back in time and look at these tools, they're much simpler than what is this like? No. All right. The big issue I feel like with Chad GPT is it can do too much stuff. It is one of those Swiss Army knives that weighs two pounds. It has 100 different little attachments to it. And you don't know what half of them do. I would say that GPT and a lot of these AI tools even though they are very powerful. They have really poor discoverability you're not sure what all the tasks are that are possible. There are ways you can kind of learn you can talk to your friends you can take a class about prompt engineering, but generally speaking because they are so flexible, it is very difficult to know specifically what's possible. So in this talk, we are not really going to talk about specific tools usually. So when you talk about chat GPT usually what you mean are large language models or generative text AI. It's going to be the same thing for this talk, where I'm mostly talking about concepts or categories of tools, as opposed to here's a specific tool you can run out and use today that does this or that. Two reasons for that. Number one, I think that using tools narrows your focus too much. It kind of puts a blinders on what's possible, by understanding, understanding but by knowing what kind of lives under the hood of these tools are their categories of tools, you're able to think a little bit more expansively and understand the possibilities in a way that you wouldn't if I just said go use grammerly And then it makes your text better or something like that. Secondly, I don't like vendor lock in if anyone here loves things like Twitter or Photoshop, or Google Reader, all of these tools at some point in time something bad has happened to them. They started charging subscription fees. They were destroyed by someone who just decided to be the king of it. They just shut the process down or they shut the tool down. If you're able to control your own technology, then you're able to accomplish a lot more in the longer term because you're not, you know, at the mercy of the whims of some other financial institution. So most of the stuff we're going to talk about today is useful for investigative work. I like to think of AI generally being used internally in the newsroom as opposed to externally and things that you send to readers. But there is also a little bit of business stuff there too. So hopefully, we hit a little bit of stuff for everybody. Alright, and keep using the word tools. I'm not going to use the word tools again. So I'm gonna say the word tools right now and you're gonna say boo. To wait, no, you got too excited. Something something tools. Yeah, it's awful. What we're gonna use instead is the word model. So I'm gonna say the word models you're gonna say yay. You're I mean, I appreciate the enthusiasm. It's still early the copies going models. So when you're talking about AI AI is fundamentally built fundamentally built on models models are what all of these tools have under the hood and anytime you think I'm using an AI tool, you're probably just using an AI model that has a little bit of window dressing on it. And you say so what's the model?
No, no, you say someone what's a model? Yeah, okay, good question. If you're a technology person, close your ears for a second. A model is something that takes some sort of input, and gives some sort of output back. And that's all it is. It's the most generic general thing in the world. And that's why there's so many different kinds of them. So for example, if we're talking about chat GPT it is a generative text chat model. The input in this case is some sort of prompt something you say to it, and then you get back some sort of text response. Again, seems very generic, but that's why people want to like take a class about prompt engineering. There was a question I heard yesterday. Where someone said, how do we get good at prompt engineering and there's literally a whole class sponsored by open AI about how to do a bunch of different tasks using chat GBT, or similar tools called prompt engineering for developers who here's a developer. Even if you're not you can do it. It's fun. You'll still learn all this stuff. It'll be great. So a simple kind of model that's maybe more specific than a chat model might be a sentiment analysis model. It's always my go to because I hate sentiment analysis. You give it some sort of text, and it gives you back whether it's positive or negative. So let's say you're Coca Cola, you release a new kind of coke. You're like, do people like this new coke? You scrape a million tweets from Twitter and you look at every single one of them and you say, generally positive generally negative. Turns out everyone hates New Coke. So you shelve it. So that is the sentiment analysis model. very generic, but also very specific in terms of use, you give it some sort of tax that gives you back positive or negative. There are all kinds of models out there. That's what we're here for right now. This is a website called hugging face. It's the place where all the models live. And if you're like what a models do, I'm like, Hey, here's what models do. summarization conversation, fill mask, document, question answering image classification, Image, image, video classification, and you were just sweating. You're looking at your list. You're like, this is horrifying. Don't worry. We're gonna break down some of the ones here that I feel like are useful for journalism, and then you can go explore them later. So when we had sentiment analysis, which is pretty easy to understand, right, I say a phrase you tell me whether it's positive or negative. If I say I love kittens, positive or negative I say I hate garbage. Positive or negative. Amazing. You all genius who needs sentiment analysis models? Nobody, we just need this crowd. So that is a specific version of what's called a classification model. A classification model takes some sort of text and then puts it in a category in a sentiment analysis model. What is the category just positive or negative, right? But this category can be literally anything. And this is where the magic happens for investigative journalism. So let's say you're the Washington Post, and you have like 30,000 reviews that have been written of random chat apps, you've downloaded them from the App Store, and you're like, I want to find all the ones that are people complaining about dudes being creepy perverts. Now, do you want to read 30,000 individual app reviews? No, you don't. You're like, Please God, let me train a computer to do this. And you can and there's actually a video on this that shows them with like a spreadsheet open, where they just went through a bunch of them and they said is this about a guy being creepy? Yes, yes, yes. No, no, no, yes, that's no. And they also look for things like sexism and racism, bullying, and stuff like that. And so what you do is in order to teach this model to put things into categories, you just categorize a bunch of it yourself. You go through 200 300 400, whatever. And you say, in this case, is it positive or negative zero for negative one for positive? You do have to do a little bit of work. I'm sorry. The computer's not going to necessarily do 100% of it for you. And this is what the Washington Post did is they just went through a big long list, labeled labeled labeled, they didn't read all 30,000 though, what happens is, you train the model to say look, here's an example of 300 of these different app reviews. Here are the ones we're interested in. Here, the ones we're not interested in. And the you feed all of this to the model. We'll talk about what that means later. And it says love. That means something's positive hate. That means something's negative, sick, cool. It learns associations. Between the words over here in a sentence, and then whether they're positive or negative, or whether they're about men being creepy perverts or whatever it is. And then you say, alright, computer, go read those other 29,000 app reviews, and flag the ones that seem to have words about men being creepy or flag, the ones that are about positive thoughts or whatever. Now, there's a problem with this. And the problem is language is awful. Language is a terrifying a weird thing. So let's say we're trying to make a classifier that's about fishing. Right? Wonderful thing I guess. I don't know, never been fishing. But if we're talking about fishing, fishing, fishes and fished Are those all about the same thing. We got disagreement who says yes. Who says no, you will hate voting. It's terrible. So I would say generally, these are about the same topic. But are they the same word? This one's easy. No, they're not the same word, right? Because there's word endings and things like that. And you might say, Oh, I'm just gonna remove the ing is the E S is the IDI that's gonna be great. And then I'm like, No, it's not great. Look at running, running runs ran totally different words. It's terrible. And if this was like five years ago, I'd give you a lecture about things like stemming and lemmatization. And you would write like, all of this code. Do we want to write all of this code? No, we're so unhappy about that. Those were awful days. Those were terrible days. If you want to learn about how to do this, I have a website called investigate.ai. And you can learn how to do this. I just stole this from there. None of it matters anymore. None of that matters. We are past that point, kind of somewhat a little bit. Alright, so let's say we need to hire somebody as an intern. Would you rather hire as an intern, a newborn baby, or a grouchy teenager who's going to do a better job reading documents or you agree? I mean, there's still grouchy but they are smarter than a baby. So if we do the old fashioned way, it's called training a model from scratch, where we literally like this is language, fished fishes fishing. These words kind of mean the same thing. And the model is like googoo. Gaga, I am learning this a little bit at a time, but now it is, or maybe even two years ago. You start to move into the land of fine tuning a pre trained model, which sounds very fancy, it's not very fancy. It's the idea that you have hired a grouchy team that has read more of the world than just those few tweets or those few app reviews that you've fed it. So it looks more like this, where you say, Hey, someone with more money and computing power than me, said, computer, go read like 200 million tweets and learn what tweets are like, and then they take the resulting model that comes out of that which isn't good for anything. Not necessarily yet. And then we take that model on we say, You're a grouchy teen. You know a little bit about language, though. You might have biases. You might have been trained on bad information, who knows, but you're smarter than a baby. And now we're going to expose you to what we're actually interested in. Here are the app reviews that are creepy. Here's the ones that are not creepy. Here are the tweets that are positive here, the tweets that are negative, and then it learns a little bit more and we get to use that and it's a delight. And you say, how am I going to do this? Say to say that? Yeah, good. Wait again, who's a developer? Who can program? Yeah, nobody, right? Don't worry. It's all easy now. All all easy. So hugging face that website that has all the models. They also want you to make models, because eventually they're going to run out of venture capital and they're going to have to start charging people for something. So they have a way for you to fine tune your own model. So you just say, Hey, you go to this website, and you literally just say, I want to make a text classifier. And you just upload a spreadsheet that has things that are, you know, flagged zero or one and it magically makes a genius, fantastic model for you. So whereas we used to live in a world where we had to write all of this crazy code, now we just like upload a spreadsheet. It goes and stands on the shoulders of like I said, someone with more resources than us who made this initial model that read all the tweets or read a bunch of stuff on the internet, and we just say, Hey, tell us tell us the results of this. And it says great, this first one is not men being creeps. The second one is men being creepy, and we're so we're so excited. I don't know why that vs came up later. But this is better. That's worse.
Now, even though these come from fundamentally different places, where one is like we're poking around in code, and we're analyzing how language works, and we're making like 10,000 decisions at every step, and then the other one we just like click a button and upload a CSV. It's still just classification. It's this is just a category of machine learning or a category of AI stuff. So by knowing that category of this classification, we're now able to be a little bit more flexible at how and where we use it and the tools that we use. So I'm sure we've all heard of chatter GBT, this is where chat GBT comes in. Instead of just reading 200 million tweets. GPT is read like everything, everything that exists, and so it knows way more than a grouchy teenager. It's like a grouchy, old person that is fantastically knowledgeable and will sometimes lie to you. You do not have to teach it the difference between a man being a creep and a man not being a creep. You don't have to teach it the difference between positive and negative. All you have to do is say hey, do you know what these things are? Put it in a category for me. So for example, I can say categorize the following text as being about environmental gun control or immigration respond with only the category and the text is a bill to regulate the sulfur emissions of coal fired energy plants in the state of New York. And GPD says oh, that's the about the environment. Yeah, absolutely. And so we did not have to fine tune anything. We did not have to do any work. All we had to do was say, Here's a few different categories. You can put this in and then it will put them in the categories for you. This is called Zero shot classification. Previously, when we fine tuned it, we had to give it a bunch of examples. In this case, zero shot just means we don't have to teach it anything. It's already pretty smart. If there's a question that's a little bit more nuanced, when you're trying to put things into categories, you might have to do few shot classification, where you say, hey, GPD here's like four examples. Just do the same thing because like, does it know everything? Yes, but doesn't know exactly what you're looking for all the time. Maybe not? Nearly as great. This is delightful. But you know what a picture's worth 1000 words. You can do classification with images, too. So this is from 2018, leprosy, the land. It was a piece about illegal Amber mines in Ukraine. It used aerial satellite photography in order to take a bunch of pictures of a bunch of Ukraine or they took a bunch of pictures of we went to Ukraine and they said, some of these have illegal Amber mines and some of them don't do you as a journalist and investigative journalist want to look at like 100,000 different images, and then categorize them as having illegal and reminds are not illegal and reminds say, no, no, right. So you're gonna train a classifier to do it in the exact same way that we trained a classifier for text. Previously, you had to go through a jump through a bunch of hoops in order to make it happen. You had to design systems and pick out features. And there was actually a bunch of, I guess, we'll just say tutorials that talked about how computer vision worked that use this as an example. You know what you have to do now? You just upload like 20 examples of illegal Amber's minds and pictures and 20 examples of not illegal Amber minds and pictures on that same delightful hugging face auto trainer and then you have a model that just does it for you. It just puts things in categories for you. And then you say, here's a picture from the plane. Is this illegal airlines or not? And it says yes, where it says No, and it's incredible. Say, wow, yeah, good. Good. This one's less exciting. We're just who loves to put things in categories? Yeah, thanks. Yeah. So words, words can also go into categories. No one calls it that though. named entity recognition. is what we'll call that instead are named entity extraction. If you have a bunch of text and you want to find specific things that live inside of it, you want to extract people's names, or companies or countries or dollar amounts or anything specific that lives inside of text. You can just say, hey, find me the stuff. And it says this is the person this is the person apparently, renegade mercenary is a product. It's not always perfect. It's fine. The Kremlin is an organization Russia is a geopolitical entity, and 62 is a cardinal number. So named entity recognition, I feel like is oftentimes the lowest hanging fruit to do a lot of really good stuff in your newsroom internally, specifically, things around source diversity. So if you take all of the articles that have been written in our newsroom, you can very, very easily used ner to extract all the people that get quoted, and then be what's the gender split and all of this, and there are a lot of different examples from around the world about folks who have used this in order to try to get more parity, in sourcing in their newsrooms, or at least be more aware of the way that the gender split is going or other sorts of splits when they're looking at source time. Now you say Sama, why don't we just use chat GPT to do this because it knows how to do everything. It's that swiss army knife that has a million tools in it and one of those things is if I give it the right prompt, I can say hey, I'm Jonathan Soma. I'm giving a talk at OMA 23 in Philly, please extract the entities out of this here are the different things person event, organization and location and it says great, Jonathan Selma is a person oh when I 23 is an event Philadelphia's location is this a smiley face? Why is it not a smiley face? What happened here? Did it follow the instructions? It did not follow the instructions I said use a comma to separate the entity and the type of entity. What did it put cool hyphen there. This isn't what I wanted. I wanted to like export this to a CSV file but it says no, I'm just gonna do whatever I want. This is oftentimes the problem when you're using something like a large language model, because it is so flexible sometimes it just ignores you and does whatever it wants to do. And so keeping the reins on with a smaller model or more constrained model, it's sometimes a little bit more effective. But as we saw before, are the models always right?
No
text embeddings are my favorite thing to talk about mostly because I get to talk about animals. Are these animals all the same animal say no? No. Okay, great. Which animal do you want to start with? Say cat? Yeah, thank you to the person who said wolf but no, we're going with cat is a cat's and attacker the same. Now what's the difference between a cat and a tiger? sighs give me another thing because that's not what I use as the classifier later. Yeah, wild and not wild, wild versus domesticated. So like we want to split them up right so the cat goes on the left and the tiger goes on the right. If we have a dog, where's the dog gonna go? Is the dog more domesticated or more wild? It's more domesticated right? So it's gonna go on the left but a cat's not the same as a dog right very different things. I just had a dog for a while horrifying experience. So much more effort. So a dog's gonna go pick up at the top, just add have a little bit of spacing there. But now where's the wolf go? Is there a place where the wolf should belong in this? Top Right, right. So it's on the right. Why is the wolf on the right it's wild. Right but why is it on the top? Dog like yeah so things at the bottom are cat like things at the top are more cat are things at the bottom or cat Lake dog like at the top? Amazing. Amazing. Where would a lion go? Just you know somewhere probably by the tiger, right? It's a feline. It's wild. You know what, what if instead of using adorable emoji, we just put it on the graph, right? We have a graph. And we have things that are wild and things that are a dog like and we just we assign everything a delightful number on a scale of zero to one about our dog like they are zero to one about how wild they are. This is great, right? We can put all the animals in the world on this chart. What about socks where the socks go a mess? You're like, Oh no, this and I know the answer to this and you're like it's easy. We just need another dimension. And the other dimension is how much something is clothing like so we have, you know, socks or we have I can't think of another item of clothing but they can like go really far back on the z axis. That's why it's smaller. And it's great. So if you just took every word and you just ranked them on everything, how much like a dog they are, how wild they are. Whether they're closing, whether they're fuzzy, whether they are like code and you just how many different categories do you think you need in order to categorize like all of the words that exist? Yeah, so how about 384 Maybe 512 People just pick a number out of a hat because they're like, You know what, we want to represent all of these concepts by not code and fuzzy, enclosed, unwanted dog. We're just going to let the computer figure out what all the categories are. And the categories don't really mean anything and the categories end up looking like this. So this is the word cat and I think it's about 384 different numbers that conceptually represent what a cat is. You could say, Okay, this first number is it's very cat like and the second one is how domesticated is and the third one is how much like clothing it is and it goes on. But like these numbers don't mean anything. But they do mean cats. And so if we had a lion, the numbers will be kind of similar to this, but a little bit different. And if we had sock that would be a little bit more different. Why is this useful? Say that? Yeah, good question. So these two sentences are they about the same thing? Are they very similar? Oh, come on. They they have one word in common and that word is a How can you tell me that these words that these sentences are similar, right? If you try to search for things, if you try to say is this similar or not? Sometimes you really get constrained by exact word matching and you're just like, No, I, I know these are the same, but they're not the same based on the specific words. They're the same based on vibes. Like the feeling of this sentence. The vibe of this sentence is the same as this other sentence. Which brings us to vibe matching, also known as conceptual document similarity. It is the idea that if we use these text embeddings to not take just words, but to take full sentences or paragraphs or documents, we can take every single document or every single sentence or every single whatever and just give it 384 numbers or 512 numbers, and say, if you're the same, you'll probably have similar numbers. So we're no longer just looking at words. We're looking at these embeddings. You say, Why would this be useful? And I say, well, it enables something called semantic search, which is this not just conceptual document similar but the idea to say find me things that are like this. Great example the Wanda leaks, documents in a bazillion languages, but text embeddings can work across different languages as you see in this perfect, beautiful thing. Over here that includes languages that I don't really speak. If you have sentences that are kind of the same, you can say find me more like this. So I could take the embedding for I love money laundering. And I could say find me every sentence that is similar to I love money laundering, and then I could get a list of every single document in my data set that is about money laundering, even if they do not use the word money laundering. Maybe they say I love to steal cash, or I love to borrow cash that I've never gone to, and put it through a bank. I don't know. I don't know how to describe it. But yes, it is a great way to be able to search across documents, but not including exact matches. Has anyone ever seen people screaming about chatting with your documents on Twitter? And they're like, sign up for this class? Buy this product chat. With the PDF, it's gonna be great. It's literally all just semantic search. It's all semantic search. It's a two step process. The first step is it goes through your document, and it says find me all of the pieces of text that are similar to the question and then it sends the question and those pieces of text to GPT and it says GPT based on the text below, tell me about this thing. This is my favorite example. It's me using GPT in order to understand Hungarian folktales I do not speak Hungarian. Chat. GPT does it doesn't know anything about Hungarian folktales though, but I'm able to say what did Jewish kids steal from the devil? And supposedly, these three paragraphs have something to do conceptually with Zeus because stealing from something from the devil. And then when you send it to GPT it'll be like oh, yeah, like to see beating stick and THE GOLDEN CABBAGE and you're like, This sounds cool. can't verify it, but it sounds great. All right. You can also do automatic translation with text models, but I just don't think that's a good idea. I think communication over text is mostly what journalists do. And if you're kind of offloading that communication to an automated process, when you don't speak the language, it's kind of a disrespectful move to the people who are reading the the content you're generating. And additionally, I think that that's kind of fucked from a labor perspective because like, there are plenty of people who could translate that for you and add nuance and things like that. And you're like, No, I want to save like $3. So if you want to do that, it's fine. It's not fine, but you can just go on hugging face and look up a model. It exists. All right. We were excited. About images before, right? Make an excited sound. Yes. So when you deal with images, there are two concepts and those concepts are things versus stuff. Does anyone have any idea what the difference between things and stuff is?
Yeah, just who I'm gonna call on somebody. Now, I wouldn't do that to you. So things are things like this, like an object is thing less and that would be object detection. So if I have an image and I want to find some things inside of it, discrete things, this is an example where I say here's a picture, find me human faces rockets, star spangled banners, and NASA badges. And it finds them it says, Here's a human face. Here's a rocket. Here's a NASA badge. Here's another NASA badge just going wild with NASA badges amazing. And you're like, Well, what is stuff then? What would be? I'm not even gonna ask you. I'm just gonna tell you. So if you want to talk about stuff, stuff is also known as semantic segmentation which makes you sound very smart. Very smart. Um, one thing that everyone loves to do these days, is talk about how in the cities if you have a neighborhood that has more poverty, it probably has fewer street trees. If you have a place that is wealthier, it has more trees, it's probably cooler. Very popular, no climate change world, semantic segmentation would allow you to you're basically classifying every single pixel. So you're able to say find me what percent of this aerial photograph has vegetation on it, or has trees on it, or has oak fields or was burned by a forest fire or anything like that? You're just able to say Yo, find find this for me, it'll be a delight. Now one problem with this is this is a fine tuned model. So someone said I specifically made a model to detect vegetation it is open you can find it all hugging face, whatever. This one right here is not fine tune. This is a zero shot object detection model. And we learned about zero shot before zero shot means you don't have to fine tune it. It already knows about everything. Apparently human faces rockets star spangled banners and NASA badges are things that this model just happens to know. So if you find a model that is a zero shot object detection model because you want to detect stuff, honestly, it's usually going to be faces. You can just say find those things. For me. But if you need to fine tune or model yourself, like I personally really was curious about how many cars are in my neighborhood, because sometimes I borrow my friend's car and I need to park it. And I'm just like, there is no parking because there are 10,000 cars in my neighborhood. So we just went on Google Maps and I took a bunch of screenshots of my neighborhood. And then I went to a website like downloader prodigy or use Robo flow. And I just drew little boxes around a bunch of cars. And in the same way that the Washington Post had that spreadsheet where they labeled creepy man not creepy man, creepy man, not creepy man. I was doing the same thing where I say this is a car. This is a car. This is not a par. This is a car. And then I tried to model the model did a pretty poor job because I didn't spend that long on it because it was just supposed to be an example. But if you are trying to detect something that is kind of unusual or kind of nice, you just fine tune it's fine. You just send an intern to do it and you'll be be delightful. This one right here panoptic segmentation. It does both object detection and semantic segmentation. And I just want to talk about it because it looks really cool. Like doesn't that look impressive? You're like, wow, something really happened there. And it's highlighting every individual car and every individual person. So for example, you could say, for the aerial photography of cars, I would say for object detection, find me how many cars there are and count them. But for semantic segmentation, I would say what percent of every one of these photos is taken up by cars. So it's two different ways of approaching problems. So simple combo, what do you what are we going to do with all this stuff? You just chop out people's faces and then do stuff with those faces, make some sort of classifier? Here are a bunch of different examples. If you have pictures from a protest, you could automatically scrub people's faces. You could identify people if you have a set of set of faces that you already know who is who you could build a classifier to identify each one of these people. This is a piece that was about who got the most screentime and Big Brother Brazil, where they just grabbed everyone's face in like every third frame or something and say to who's who's in the picture right now. This one right here was about interviewees on broadcast. And again, gender parity, who gets more screen time? Is it men or women so there are a million different things you can do just with object detection of pulling out faces so even though it seems like kind of niche 1000s of opportunities right there. I don't know anything about videos. I don't know anything about videos though. All I ever do with videos is like chop it up into individual images. And I say please do some stuff. Sometimes it does some stuff, but there's one thing one thing I think is cool. Anyone here ever make explainer videos? All right. If you want to be famous, here's what you're gonna do. There's a tool called runway and or Gentoo and runway and what it does is it allows you to take a video and then add a little bit of text and it changes the video for you. And I think this is perfect for explainers because you just set up some cardboard boxes and throw things around or whatever. And you say turn this into x and it does it and you don't have to go outside and do anything wild or fancy. It's just magic. Does it cost money does it you know not do a great job all the time. Yeah, but you know, I don't play around with these things. You can play around with them. And then win awards. Be famous the first new newsroom that does this. I swear to God, you're gonna be able to like dine out on this forever. You'll be here giving a talk next year. It'll be amazing. Audio also exists. Transcription clearly the easiest thing in the world. Right now I'm probably being transcribed by otter AI with the AI powered transcription, say AI powered transcription. We could also just call it transcription in order to sound less fancy. Open AI people who control chat GPT we don't like GPT because they keep it secret. They don't tell you information. It's a closed system. Who knows what goes in they can change it all the time and you're out of luck if you are depending upon an old version. But they have an audio model that is called whisper and whisper is open and whisper is free and you can download whisper and it does everything for transcription. If you need to get something done fast, there is a smaller lightweight model. If you want to spend forever doing a really good job. There's a heavy model. And in terms of programming, I know we don't have many programmers in here. It's literally two lines of code. I'm just like yo use whisper here's an mp3 and it just does it for me. It just doesn't it's not hard. It's not hard to do. If you have video if you have audio if you have anything that you would like to turn into a searchable format or into some sort of text format you doesn't want a transcript. This is the way to go. anyone here speak languages other than English Yeah, a real tough thing about the Machine Learning world in the AI world is they're like we love English. Do other languages exist? Not really sure haven't heard of them. But whisper is actually remarkably good at other languages. It's actually better at Spanish and Italian than it isn't English. This is a we don't need to talk about word error rates but this is how good it is at different languages. The smaller the bar, the better it is. And it just goes on and on and on and it can just do a billion languages for better or for worse at these different ones and if you want to you can always fine tune it to make it better. I saw a blog post today that I did not read about fine tuning it to make it better at Dutch. Did it work? I don't know didn't read the blog post but in theory it is possible. Now are we ready? For something terrifying? Yes. So let's say you are an evil company and you're tired of firing all of your techspace journalists and replacing them with AI and you want to start replacing other sort of journalists that do spoken word stuff, podcasts, people, things like that. So let's take a tiny snippet of audio here.
He continued his pretended search and to get
that's all we take that audio and we take this model from Microsoft called speech acts. And we say hey, take that tiny, tiny, tiny, tiny little sample and just read this big long text for
us. That summer's immigration, however, being mainly from the free states greatly changed the relative strength of the two parties.
Was that perfect? No, but it's kind of terrifyingly good compared to where we were, you know, two years ago or three years ago. Even when like Siri talks to you when you're driving your car. So please don't fire everyone and replace them with robots. But you know, if you want to go down that path, it's it's right there for you. Another thing you can do with that note, let's not listen to that again. Okay, another thing you can do when you're working with speech as Speaker diarization, which is the idea that you have multiple people talking, and you need to figure out who's saying what, this is actually really hard to get running on your computer as opposed to everything else I've been talking about. So if you're like oh, I really love otter AI doing transcription for me because it splits up speakers. Yeah, you're fine. You're good. If you want to try to force someone in your newsroom to do this. Great. There are tools that do it but it's just a heavier lift than other stuff. So this is just the tiniest tiniest, tiniest sliver. of the models and abilities that are out there. And I showed you that screen from hugging face earlier that just listed all of those models and it goes wild. But what I want to encourage you to do is now not kind of thrown some random things at you is if you have the power to make this happen, whether you're a dev or your product poster person or you're everyone's manager. I think the most powerful thing you can do to make your newsroom better at AI stuff is an internal playground of accessible connected components. So this is not let's have one person sit in a room and try to crank out a bunch of you know, headlines summaries that we're going to push out to our reader separate things like that. This should be completely internal and completely open. Because unlike what was said yesterday when things fail in the world of journalism, from AI, the thing that fails is your integrity of people's trust in the newsroom. And so I think that keeping these tools facing inward until you're 150% Certain they're not going to screw anything up is probably the best thing to do. So why a playground What are you going to do? So it helps with ideation. It helps with experimentation because you're bringing different people into the room who can play with these tools, not just coders, not just the person who's an owner of a product, you're able to bring in more people with varying ideas, and I'll have a fun side about that in a second. It also builds trust within the newsroom. Because if you are someone who is not on the AI bandwagon or you're someone who's mistrustful of these tools, you're going to be like I don't know what they're working on in that room. They're probably going to replace me. There's a big fear of these tech people are just going to throw together an AI thing and they're gonna fire half the newsroom and if they are more aware of what is going on, by being able to experiment with these tools by having there be a little front end to it. They'll trust that that's probably not going to happen. And my favorite thing is red teaming. So Red Teaming is the idea that you tried to break tools. So you try to make them give the wrong answer. You try to make them sexist or racist or all of those bad things that show up on Twitter where they say, here's the awful stuff that happened that I made that GBT say and they say, oh, we should probably fix that. The people who are tech people are too in their own heads and too directed usually, to think critically about what they're producing because they love the tech. They're so excited about it. Whereas if you have someone who's like, this is going to put me out of a job. I'm definitely going to figure out a way to show everyone that it's bad. That's great for you because it doesn't end up. You don't end up putting anything that's tough out there. So have we seen this chart before? Is this is yeah, so it's the hype curve. Everyone loves to have it in the AI things are like we're in the AI hype curve. Are we are we just starting to get excited about it? Are we at the peak? Are we in the trough of disillusionment, or are we starting to get productive with it? And I don't think about this as AI in general, I think about this as all of the people who are contributing in the newsroom, because every single person who is working with these tools, whether they're just like playing with GPT on their own, or whether they're a dev or whoever they might be, there's somewhere different on this curve. And so I like to think that it takes a village in order to produce good AI driven products, and that village's people at all different aspects of this. So you want someone who's really grouchy about it. You want someone who's over it, you want someone who's really excited because they all bring different perspectives to the table about what could work and what couldn't work and kind of like even though these people are way too excited about what's possible, they're going to be able to think of things that these grouchy people or these people who are over at being productive are able to think about. So what's an example? So I made a model about the you know, creepy Washington Post thing, and this is my delightfully easy to set up model tester, where I said my comment was deleted because old men kept sending me dick pics. And then it said, this is definitely old men being creepy 100% And then I might be satisfied with that because I'm just like, This is great. My model works perfect, but then I send it around the newsroom. And because this is an accessible tool, people who aren't me can play around with it and they can try to break it and they can say things like some weird dudes kept asking me for nudes instead of talking about how to slam dunk a basketball. And the model says this is definitely not creepy. It's definitely about slam dunk basketball. So if someone is able to reveal this to me, because they were read to me and I'm like, Oh, my model is actually not good. I should probably tweak it a little bit more. Additionally, I know that I've been through a lot of different tools and covered a lot of ground, but the best part is these tools are all additive, and they can all work with one another in order to make a much bigger, fancier, really amazing product. So for example, let's say I have a transcription. I use whisper right? I use whisper because I don't want to send it out to an external service. I feel like it needs to be private. It's in a language that sound you know, transcription service doesn't support whatever. You could just keep that on your computer and have a transcript that you can open and like Ctrl F and search inside of it. Fine, right? But what if them, I combine it with some speaker diarization? I say let's figure out who said what, in this interview? Sure. Easy. Then I have an auto generated list of questions. So instead of me having to Ctrl F to search through this document, I can just have maybe a table of contents at the top that says here all the different questions. Me as the journalist I don't want to read through like a three hour transcript. I just want to see all my questions or all the things I said and hop to those sections. Maybe I'm doing a bunch of other research in the same research project or in the same story. And I have a ton of different documents a ton of different websites. I could take all those documents and I could dump them into some sort of database, right I have all of these text embeddings that will enable me to do perhaps later semantic search, but I could also take the transcription and dump it into that same database so it doesn't matter where it comes from. You know, I have a bunch of PDFs from somewhere. I have a bunch of conversations I've had with someone and now I have a such an easy search that it can perform across all of these documents without going to an external service without relying on anyone else or paying anyone money except maybe a dev that I have, who's doing a little bit of work. And so it's very easy for me to just search through this large cache of documents. If we wanted to get really crazy. We could do things like take the interview, and then send it off to Chad GBT and say GPT Do you have any ideas about questions that I could also ask? And I mean, people that that's it's in a different color, because you're probably like, that's kind of a weird thing to do. It is kind of a weird thing to do, but it's fine. Maybe you're experimenting, who knows. But anytime you're trying to use any of these tools in the newsroom, the worst thing you can do is just jump out there and use them which I feel like is what most people do. I feel like you really really really need a testing and tracking infrastructure that goes along with trying out these tools to know if they're working or not. So know how successful they are. So this is from generative AI newsroom. This was a post from ages ago, but it's about documents summaries in Danish, and it's about using GPT to take a whole document and then turn it into a short summary. And you might think you know it's kind of the same thing as maybe headline suggestion in a newsroom think of a simpler example. Doesn't work. Who knows? But what you can do instead of just saying, hey, people in his room, try this out. See if it works, see how it feels is actually track what's going on. And this piece is amazing because they say look like 60% of the time I had to revise what GPT said me? Why did I have to revise it here all the reasons missing important information, incorrect summary, AI was evaluating things or irrelevant information, poor language. And so by having those sorts of numbers, it's it's more like actionable insights one might say about what works and what doesn't or what you might need to tweet, because otherwise you're just kind of goofing around. And you don't actually know whether something's working or not besides whether you feel it in your heart. So all the different tasks that we mentioned today, we talked about categorizing text extracting people text embeddings, detecting objects, detecting stuff, transcription, Speaker detection,
audio and text to new new video. You can also I didn't talk about it, but you can turn video into images and images into text. And there's just the format shifting between them is a really easy way to say take a bunch of images, describe them with text and then ask them questions or something like that. But your homework is figure out what you can do with the stuff that you have with the content that you have. If you kind of combine all these bits and pieces, what is possible. What is possible and when you approach a tool like GPT or you approach a tool, some other large language model or something that can involve PDFs or images or anything like that, you now maybe have more of a baseline of the different tasks that you can ask of it. So my slides live at that URL. But right now, that's the only thing that exists there. I have a bunch of code samples that because I was just slamming more and more things in here until the last second. I did not put my code samples up yet. But by like the end of the day, there'll be up there so if you're a coding person or you want to send them to coding people in a newsroom, it'll be great. I'll also have all of the links that are from my slides in there and a few extra magic fun links, about kind of playgrounds about how to use this stuff, how to build these tools themselves, and all of that I also have three websites. Ai FAQ. WTF is me just collecting like tweets and posts and stuff that are usually dystopian or funny about AI normal ai.org is about the same kind of stuff that we talked about today. A lot of interactive examples. And investigate.ai is a more old school machine learning kind of situation where if you want to learn about linear logistic regression, or how to build classifiers from scratch, great resource there. So yes, that's it. That's me. Hope you enjoyed it. Go do a bunch of horrifying AI things. The hand
and now we have a hot second for questions. So folks with questions, I feel like I need to give you this mic, and then we'll make this happen. So right here. From a practical, practical perspective, how do we build a playground you do it in spaces and again face? Yes. So from a practical perspective of how to build a playground, there are two tools that are really really good that allow you to do like interactive stuff, and one is called great audio and one is called streamlet. So the example that you see here, this is a great demo app. It's literally like four lines of code in order to build this streamlet is a little bit more intense. It can do like interactive data visualizations and things like that. But by default, the all of the stuff that's on hugging face spaces, which is a subset of hugging face where you can play with the models is based on grades. I wouldn't recommend using spaces on hugging face though unless you maintain control of everything because they switch things up in the background all the time and like half my spaces don't work anymore. It's a real pain. In the ass. So doing it on your own infrastructure is definitely the way to go. But it's not a heavy lift. It's just a few lines of code and a little bit of installer. Other questions?
Thank you. Hi. Just curious about you. There's all sorts of tools here that feel like they're just inevitably going to be part of like Google search, for example, like our Do you feel like we're in a temporary stage where chat GPT is a destination or? I mean, when do we get the point where on Google you're just like uploading an image and say, Hey, tell me how many pedestrians are in this.
I feel like they wouldn't know how to monetize that. So and if they did have it, they would just shut it down soon thereafter. So I think that even if these tools are offered by large organizations, like for example, Google has offered a bunch of let's say machine learning vision tools over the years, but then they get tired of them and then they disappear like Google Reader or whatever else or they start charging more money. And so for me, a lot of this is about independence. So even if something is a little bit worse than a tool that exists that's sold to you by somebody at least you have control over for example GPT over time, the answers have changed. There's an argument as to whether they are worse or not. It's kind of depending upon what you say is worse. There's a paper that came out a while ago. You can find it on AI FAQ dot WTF under I think evaluations, but yes, it's mostly about control and stability, as opposed to what is easy to use or not.
How do you deal with essentially the challenge of proprietary data like we're giving all these AI tools essentially, like our transcripts of an interview that we did? You know, something we've written and things like that? I understand like for a playground, you can set aside content to play with, but just in general, when you're using these, like, I can't read through every term of service, it's like and what are you even looking for to make sure they're not essentially owning your data after you give it to them?
Yeah, so all of this stuff runs on your own computer. You don't need the cloud at all. Even the playground could just be like a computer that lives at the office that people like VPN into or something or sit down at a desk if God forbid you're working in real life. And yes, it's all the models on hugging face are models that you can download. And it has been usually what happens if you're hanging out on Twitter and you're on the awful world of AI Twitter and people are like, oh, a new bottle came out by Microsoft called speech acts. It's crazy or a model came out and something got leaked from Facebook called llama. You can just go on hugging. FACE LIKE A day later and it's there for you to play with. And you don't have to if you're using like something like this which is hosted on hugging face you are sending things to them but you can also just download it news yourself so might need a fancier computer but more questions this side of the room you're so quiet Is it because I was all the way over here? I'm sorry. And anything else? Anything else? All right. We are done. Have a wonderful day.