20230907 Bias in LLMs

6:53PM Sep 15, 2023

Speakers:

Rob Hirschfeld

Keywords:

writing

give

prompts

ai

models

build

chess

humans

responses

generate

question

work

chachi

put

tone

gpt

code

interesting

biases

cloud

Hello, I'm Rob Hirschfeld, CEO and co founder of racket and your host for the club 23 podcast. In today's episode, we dive in to the potential for biasing LLM models, both in good ways and in bad ways. And this idea that the expertise that we're feeding into these models is not sufficient to actually drive the outcomes that we're looking for or to sufficient. And we're going to be eliminating humans to out of the loop in a relatively short period of time. Both outcomes, at the moment, feel equally probable, which is troubling. We dive into how and why that happens, what's going on, and even better, get some concrete tips for how you can improve your prompting to avoid these same pitfalls, the fascinating conversation and I know you'll enjoy it

so that we would all jump back to Diana's original question about the death spiral on these platforms? Because I would love to pick that back up. Sure. I

feel like so let me just let me articulate a little bit more where it's going. And then I'd love to hear more, because this is just a this is just a thought that, you know, came to me and I don't have anything to back it up. But it seems to me if you've got a system writing itself and checking itself, that it's going to operate inside of a bubble that has confirmation bias, or misses its own stuff that maybe not even on purpose. And I am curious how, how maybe generative AI, writing code and governing code supports humans being more productive and more informed. And then and then having Where do humans intervene? So I don't know. I just curious what you guys thought about that idea?

Well, I don't see that you can necessarily take sorry, Rob,

I'll hold my thought. But

I don't think you can, at this stage of the technology take humans out of the loop. Because there's knowledge, institutional knowledge in each of us, regardless of where we work as the institution, we ourselves have institutional knowledge. And you can't you need to inculcate that into the large language or small language models, regardless of the domain. Because if you don't, the way the tools are created and used, will have none of it. So therefore, you'd never be able to correct a bias. And you'd never be able to say, but your source may not have been 100% accurate when you were trained. Right? Like, I was just writing a little piece on, oh, shoot, our AI will basically or the other word. Our API decided to hallucinate, we've sent it back for retraining. But there's no guarantees that even that will help. And it's true, because you know, whether it's the age at which the model stopped being trained, and is now just taking in information, or the biases that may have been inherent in the algorithms, in other words, what we were talking about with point of view, you know, the engineering point of view when people build software, I think those things have to be taken into account. So I don't see human out of the loop for probably two to three years, if not longer than that. I think cut

very short timeframe.

Whoa, wait a second. Think about December of last year with chaps UTP was introduced and where we are today, not even a year later. What's so great with you?

What's so interesting about that is Haven't you already noticed that you're kind of picking up when it's chatty dt as opposed to personal human generated? Aren't you starting to get spidey senses like Tom and I? were reading a travel magazine. And we're like, I think every single article in this magazine has been AI generated it just all it has a flavor. So I think we're I don't know,

it does. The thing that was interesting. On Tuesday, we were talking class was doing a retro on Google Next. Oh, So he went to Google Next. And we were talking about what he learned at Google Next. And the thing that was surprising to me is that they're vectorizing big datasets, not as training materials, but as props. Yeah. And, and so the, the assumption that I've been making that we would be retraining models or stacking models, is actually not what we're seeing happen at the moment. And in this, this lines up with what racket is doing with our own, we're putting a vectorized, we're putting Brackens database in front of hugging face, and using it as a, we're calling it RackN and GPT. To train to help train users to help users find information. But the strategy isn't to build a new model, this strategy is to vectorize the knowledge as a prompt, effectively, if that makes sense. And then when the model works, it takes because the models are now starting to take in these huge pieces of background information. And then then you're all that knowledge is fed into the prompt effectively. So you're asking a question at the end of it of, of basically telling it your life story and all the all this background, and then it makes a recommendation, which is producing to your question about coding and things like that, that's actually producing a better feedback system. So you could actually say, in the context of all this code I've already written, please help me generate some code, right, or pre and ideally, it's going to go and say, Oh, you already have a routine that looks sort of like this. But I think we're gonna have to get better at prompting and things like that. But ideally, what you're doing is you're leveraging the the humans that have already touched the system, and then that generative work is aligned with the work you've already done.

Yeah, it's setting tone. When you're talking about a piece being written in a magazine, that, Diana, it's, someone sets the tone, because if you go to church UTP, for example, and say, You are a world renowned marketing expert, your tone of voice resembles blank, fill in the name of the person that you like, and write me a piece based on this, that the other thing and you put all that in, it gives you a tone of voice akin to the character or the individual that you've talked about, and you can then play with it. But Rob to your point. So here's what I did with hugging face. I said, Give me a first with strategy B, give me a set of prompts that I could use to brain dump myself into hugging face as a persona or you know the amount of information on a particular debate, like, what are the right questions that I should be asked about supply chain or manufacturing or whatever, any of the myriad of other topics. And it was really good, because then when I did put it into hugging face, it actually gave that institutional knowledge flavor of what I wanted to come out. And that would be one way of capturing that. And I think from a coding perspective, ask the question, what was the point of view of the coder? Who wrote this and give it a snippet of code? It'll tell you.

Yeah. See, that's, that's where I'm talking about AI really helping humans.

Right, yeah. But it can also be detrimental. And you have to be really careful with it in some ways, because if the person who was the so called World Class expert, was like, out to lunch is the best way I can put it. And you know, had these like not great skills, great ideas, but not great execution. Let's say you can end up with a lot of hallucinatory information, even in code.

Max. Expertise.

Yeah, no, there you go. Yeah. The AI with us is going to descend we're there's two there's two dangers. One is the AI sounds really smart. So you listen to it. Whether it's knowledgeable or not, I mean, you could write that's, and if you said hell, please explain why you thought this it actually is going to invent the explanation. It doesn't know why it thought that right. That's, that's what's so deceptive with this. Yeah. But you also you also have the potential for whatever, you know, like we've said, been saying your biases, your inputs, whatever, whatever's coming through, you can probably do better if you pick your person to do it. And the more the better the more writing they have, the better are the more you're basically feeding stuff in to make it happen like, join you and I both have enough writing, I know that if I tell it to write in my style, it has enough knowledge of me to, to do my style. And it can produce if I say, hey, you know, take this outline in these facts and write something in my style, it will actually do a better job than if I just say is whatever to write. Or more palatable to me. Yeah, well, I

know. I mean, I've, I've put blogs in there, I actually tried to find a way to upload, like, I just did a podcast for supply chain connect on AI and new factoring. And I wanted to take the stream and put it into an AI night to get the transcript because they already had it or take the transcript, but to say, how could I better have improved? What I was saying, but still being me. Right? Like, could I have been? I tend to use large words, should I have been simpler in some of the things that I said, so I'm playing around with the notion of tonality. But I also now know, just from experience, if you give it a particular best in class, best of the world, you know, goat, whatever you want to call it, kind of you are that x level of expertise. This is your audience. This is the topic, this is the tone. This is the context, you get much, much better responses. Well, I think, listens.

Yeah, but getting back to coding, and I really liked that you brought up point of view. So one thing that I've been trying to accomplish with a couple of different models being in chat GPT is I'm writing a story about chess, and I need this chess game to play out. And I need five things to happen in this chess game. Like, you know, I need a knight to I need a queen to vanquish the other queen, I need that there are these things that I need to happen in this game to make the story work. And neither Tom nor I has been able to write a prompt for any of the AI that we have access to that describes five thing and they don't even have to happen in order that happened in this game. And it comes back with the moves, right, like white moves black males, right. And it says this fills all your criteria, but not once has it fulfilled all the criteria yet? So the concern that I have is, I mean, I may be feeding in an impossible situation, maybe that maybe those things can't play out in a chess game, although I'm pretty sure they can. But it says it tells you that it's accurate, and it's not. So how can I on that?

This is this is my my problem with using it from a coding perspective. And I'm working on a presentation right now to help sis admins and DevOps, people use AI better. So some of these questions are similar. It doesn't actually know the rules to chess. Right? What you're what you actually need is an AI that understands how to play chess. And none of these models actually know how to play chess. Right? What they're doing is they have ingested a whole bunch of chess games. And they, they are able to probably, it's probably smart enough to say, Oh, these are the moves you're looking for. These are the things that would like it actually sorts through all that knowledge. But it doesn't actually know the rules. Just like with programming, this was scary. It doesn't actually understand programming at all. Like it doesn't understand syntax, it doesn't understand what that should be, it doesn't understand even what the potential option like it like when you're building, it's part of what I'm trying to educate these these operators on. If you're built telling it to build TerraForm or Ansible playbooks. It's not reading the schema for those commands that it's showing you. It's just going to its database and being like, Oh, here's the common schema. And so it's not going to actually know if the one is putting down as the best thing to do or the right or possible. But, but it does a better job than people do actually out of the gate. So yes,

you know, I don't know if you use it the same way I do. But I use a kind of chain of thought method. So like, I'll ask it first. Are you familiar with, you know, in chess, I would say Garry Kasparov, right. World master champion, and you will say yes, and he will give you the life story or whatever. Are you familiar with the methods that he uses to play used to play chess? And it'll give you a response. Once you go through those steps and then ask it about After moves, it will give you the moose. It's because, you know, there was the competition with IBM with Watson. It is familiar with that. So you can get the boosts. The other thing you can also do is there's a new one, which I can send you a link through and LinkedIn. And I'll send it to you, Rob as well, which is useful for citations in research papers and stuff like that, like, because I submit stuff through ResearchGate, for peer review, and I do peer review, and all those kinds of things. So I needed something that would actually give me accurate citation from research. And that if you use that chat model, you will get very specific information with regard to any subject. So whatever the story is, that you're trying to build, if it's about the moves, if any research has been done with computer learning, and machine learning, or computer learning, or cognitive learning around chess, it'll pull up everything, you can then just literally copy and paste it into Chachi gpsa, based on this, now, give me the five weeks, whatever, or whatever. So you have to kind of think about the right situations and chain of thought works best for me. I find it takes longer, but it's it's more accurate.

Well, yeah, I'm not getting an accurate answer now. So I would, and I kind of don't want to try to figure it out on my own. This will take a long time. I was kind of hoping that I could help me. But that's, that's an interesting thought. But I did beat the blues. So there's still hope for humanity. Yeah, this is

the I'm interested because one of the things I've been doing, instead of doing as much of a dialogue prompting system, I've started to build template what I would call prompting templates. Yeah, where I will provide this is to me similar to the vectorization thing, where I'll provide more and more information in the question. And, and allow like, and so and so I don't have to dialogue with it as much. I'm like, Okay, here's all the things that I need you to know in formulating your answer. And that, and then then I'll dump a whole bunch of stuff in at once, which is effectively vectorizing my input, and then finding those actually gives me pretty good responses. But I haven't compared notes with enough people yet. Like, if we're writing marketing copy, or I'm writing a convert doing customer journeys using this technique. So I provide, you know, a page of notes and quotes and KPIs and things to highlight and stuff. And then, you know, all sorts of stuff frame tone. And then we actually build those up over time. And we've keep refining what those templates are, and then generate, and then ask it to generate narrative from that.

Yeah, there are, by the way, I have a list of 10 or 15 different sites that work with chat GTP, but are actually better for narrative writing. The person who's templates and I'll try and get for you is Michael pergolas from storied. He's Kingdom the narrative. I've known him for a long time. But try just as an exercise, try the chain of thought. And the reason that I started doing it that way was because I found I liked certain parts of what I would get back as answers, not all of it. But I would also be able to see the hallucinations much quicker, much faster much earlier to say, No, that's absolutely wrong. And I love the Forgive me, I've made an error or I'm sorry, I know this, did you? Because this leads me to my question of the week for the two of you. What are the correct pronouns for generative AI? Because it leaves in those don't sound right to me.

When you talk about it, or when you talk to it?

Well, when I talk to it, I say you and I talk about it. I mean, you know, I'm very friendly towards them and you know, gender nonspecific and all of that. Yeah. What products do you use with generative AI? Because please, get you a better answer, by the way.

I've heard that

it's true. Absolutely true. asked the same question twice, once with pleasing Thank you and one without you will get not the same response.

That is a great. That's a great question. This thing up on LinkedIn, it doesn't give its pronouns.

But yeah, it is it. i You're not the first person to tell me. Similar mg was the other person was like, No, you have to be polite to judge GPT you do produce good results.

Yep. And, and I always say, Sorry, it was my mistake or whatever. But this whole chain of thought thing, if you if it's like, if you go down that road, you can actually say to it, go back and revise because this is incorrect, or I don't agree with that. We'll keep doing it.

I'll do that. What I found is it has been getting much more verbose lately that one of the things that I find actually very frustrating, and potentially taking you off track, because I'm doing this for this paper, this presentation. And I started with a very simple, I need you, I need your help to generate a cloud. And it talked about gave me 1000 words about how to create an experiment that generated cirrus clouds a coke bottle. And I said no, I mean, a computer cloud and then it generated, like, and then it went through. But and the thing that's interesting is it's giving me 1000s of words that output for things when it should ask me clarifying questions. And I'm like, No, I really want you to ask me some questions to figure out what I need. And then it did. And then it generated this crazy long list of questions also. So it's not touchy. PT is not designed to be Socratic.

Well, okay, so I would have not to be, I don't mean to be corrective. But I would have said, first, are you familiar with cloud computing? And it'll give you an answer. Now that we know you're familiar with this, I need an experiment. Write me the prompts for you to create an experiment for me to build a cloud, a private cloud, call it Rob's cloud, what would I need to build? And then you'll get a short list, you might get a long answer, but it'll be all the steps and all of the, you know, I include the man hours, the resources, you know, the specific

book, try that no, probably incorporate the asking chat GPT to generate the prompts for Chachi DBT. It's actually a little funny, because there's no real history in its database of writing prompts for itself. But it does, I mean, this is the whole idea of of auto GPT of asking it, what the next what steps it should you should be following are and then letting GPT follow those steps. And that, to me, is actually indicative of the advice I'm giving, which is there's actually a tremendous amount of expertise baked into these models. And you're trying to tease out the expertise. It's not just Can you write code for me, what you actually want to do is ask is, is use it as a here's a sum of human knowledge. And, you know, you're asking questions to try and find the you sort of identify the frozen experts inside the model.

Yeah, I hear what you're saying, I do it all the time. I don't use it. I don't use chat GTP as much now as I was a few weeks ago, because I was experimenting with it to get to these points about how you have to engineer the prompt. And I'll tell you the other thing that and I don't mean to sound nationalistic at all. Be careful the way you put the English in. If you use a British English, I'm serious, you get a much better response because it understands the language without the idiom of what I would consider American English versus Canadian English or British English or whatever. And the more of that you use, the better the responses you get. Also,

there's a new one we need to wrap up. But there is a new feature in in GPT that allows you to warm up warm your prompts. Yeah. And it'd be interesting to play with this first, but you could actually if that's the case, you could say cheerio, good chap, that splitter to give it a British intro, and see, see if you can nudge it, you know.

See if it dumbs it down.

Ah, howdy Barger here to round up some categories

that you realize that by doing that when you do the howdy Are you do you know some some colloquialism. You are automagically, setting the tone for the way the responses are going to come back to you. Yes, yes. And so that's why I'm saying like, the more formal, you seem to be with it. In my case,

I'm actually going to go do this exactly, I'm going to do this exactly. I'm going to create different chats, and I'm going to start one with the sort of British I'm going to do the howdy. I want to see what the different results are. Okay.

So I have to tell you a story before we go. And my husband worked on a ticket last night and for one of his customers, and then his, the CEO of his company asked, asked there, they have their own. They have their own. I don't know, anyway, they have their own AI tool that Tom built. But so the CEO asked the AI tool to recap how the ticket went and was resolved in a medieval sonnet that made my husband Shining Armor of how this ticket was brought up, and what was wrong and how he solved the problem. And it was, it was kind of like a love letter. It was very sweet. Like it was a very sweet little thing to do. So anyway, fun time. That's cool.

That's cool. Yeah.

It is really good. I've had to do like, poems and camp letters and things like that. It's really neat. Oh, that's a blast. Yep.

Even though we're over time, congratulations, by the way.

Oh, thank you appreciate that. You know, my garden, my gardener? My gardener notices my gardener.

Yes, you're you're taught your 10 listings, which I actually went through one by one by one. They were the right context. And you should make hay while the sun shines. And I will tweet about it later with you. It's on my list of you know, things to do. And I'll put something in LinkedIn as well. But I, I'm not sure I agree with everything that they said. But that's just me, because I think that, that there are cherried. But the point is, if you put all of those into Chachi TP, or one of the other writing tools that I will send you both links to you will find some amazing tidbits to use for marketing copy.

Yeah, there's there's some. We we were excited when we saw the citations rolling in, especially since we're on both sides of the hype cycle like we're referenced on the upslope and downslope which is sort of interesting. Well, but yeah, I we got we got the 10 and, and I started looking around and I'm like it's a very unusual number of, of hype cycles to be cited it.

Wow. In these discussions, we go down so far into the weeds and then pull so far back at the same time, it really is an amazing aspect of club 2030 discussions. If this is interesting to you. And if you're listening to me now it was, I hope you will choose to join us and be part of the conversation. Bring your thoughts, your ideas and your questions to the 2030 group at 23 dot cloud, you will find our agendas and topics and zoom links. And I am looking forward to seeing you there. Thank you for listening to the cloud 2030 podcast is sponsored by rockin where we are really working to build a community of people who are using and thinking about infrastructure differently. Because that's what rec end does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. It's all part of building a better infrastructure operations community. Thank you