super excited to be here, I've got a pretty big presentation plan, I'm gonna try to cover a lot of ground because it's a big topic, and I think it deserves it. So we're gonna be talking about optimizing and fine tuning large language models, essentially how you can apply different techniques to work with large language models to get better results. So we're gonna talk about how to improve the usefulness, the accuracy and the relevance of outputs. Reduce hallucinations, put in quotes, because it's kind of a feature in large language model. That's really all they do is hallucinate because they're not really connected to the real world. But it will reduce the ones that we don't want to see that much, will also try to prevent harmful or embarrassing outputs from our large language models. Unfortunately, there are a lot of ways this can happen, especially when you're doing something super broad using the generalist chat base models that really have the potential an ad will bite into that toxic bias, unethical illegal content, and prompt injections. There's ways to mitigate these things. We will look to extend the capabilities of large language models as a self taught developer. I'm very passionate about connecting traditional software to large language models. We don't have to throw out All of our existing tools just because we have something really new, cool and shiny, that we can work with. Also know how to date or somebody follows do things that they're not good at, which are tiny. And look, try to think about ways to do all this faster and cheaper as well, which primarily comes down to choosing the right size model for the task, at least at the level, we're looking at things, I'm not going to be diving super deep into any kind of anything too low level, I'm gonna be kind of zooming in and out, I'm gonna be explaining the terminology that you see in machine learning that can really trip people up. It's not as hard as it sounds. So let's cover prompt engineering, we're gonna cover a lot of a product engineering, just briefly, we're gonna breeze through it, because I think we all have heard about this. And inference parameters, how you can make some tweaks there to potentially get better outputs. Retrieval augmented generation is a really big one. Function calling, I'm going to talk about how that actually works. First, open AI has an API for it. feature or but I'll explain how it actually works. And then, of course, fine tuning is what entry point AI, my software platform for fine tuning. That's what we do. That's what I'm really passionate about, about it of all these things. And we'll just kind of cap it off. So some underlying four guiding principles you can keep in mind as we're going through all this, first of all, is you have to break down complex tasks in multiple steps, the biggest mistake you can make is to take a large language model. And just try to make it do everything for you in one big step. I think that's where you'll be the least optimized, I mean, go ahead and track right, like, maybe it'll work, maybe you're fine. But in general, we break things down. And then some of them will go to our traditional software tools, and some of them will fall into large language model or maybe a series of models. The second is just use traditional software approaches when you can. So we'll look at some charts that put these things together. So let's start with prompt engineering. With prompt engineering, there's an interchangeable terms, these will be useful throughout the presentation, we can use the term prompt, school input, same thing. Context window is still a little technical, okay. But essentially, the context window is your prompt. And it can be grows as you start to add tokens to it and completion. However, the length would be more specifically the maximum length of your pocket. So prompt engineering, typically, what you'll see is something like some priming, okay, you're a plumbing q&a bot that answers question about plumbing and a helpful way. I want you to use temporary language and explain things in a simple way that anyone can understand. So that's like the style and tone, this can be quite extensive in terms of your instructions. You need to handle errors and edge cases. So like I'm trying to make a plumbing bot, I don't want you to answer questions about like interior design or something like that. So just don't do it. Okay. And then you'll see, you'll want some kind of a response from it that you can use in your application. JSON is kind of the language of the web. So typically what we go to. So that's the system prompt. And the user prompt will explain these one. But the user prompt was in the the user input to your, your prompt here. So somebody comes and says, How do I fix a leaky faucet? I like to call this the dynamic content. Because if you think about your whole prompt, most of it's not changing from each time you call the large language model, the only thing that's really changing is this dynamic content. It doesn't have to be a question. But this is a q&a by example, it could be tons of different things. I think last year, like a lot of people were just focused on chatbots, because it's new and exciting. And it's this big thing. And they can do so much. I'm personally more passionate about much smaller, like little specialized use cases they'll talk about. And to make these better, you go on the internet, and you find all kinds of prompt hacks. Like I'll give you $20 Do a good job, please, this is very important. I saw this in a Microsoft GitHub repository just yesterday. And this just made me laugh. Take a deep breath. And then and then the prompt, or ending with to be consumed by an application is supposed to make it more likely to put out JSON. And I would say hey, just try it. If it works, that's fantastic. Evaluate it evaluated. But I hope all this stuff goes away. Because I think it's taken advantage of the bias in the training data that we're training these aren't and if we're we training them better, we have better architectures, they shouldn't be necessary in the long term, in my opinion. But we are working with what we have. So go for, I think these hacks will only get you so far. And in this presentation, I hope to show you some more robust ways to get results from large language models. In terms of practical things you can do to improve your prompt engineering, I think the first one is to just make sure you're using the system and the user prompt appropriately. So when a model is fine tuned to become a chat based model. It's typically given some special syntax, which includes these special tokens that you cannot insert through your own text inputs. And so they're sort of protected in a way. And it's trained to distinguish the stuff that is in the system prompt and beginning from the stuff that's in the user prompt that maybe is, the system prompt is trusted inputs. So as the administrator of this large language model, you have some data that you trust, maybe it's metadata about a customer, maybe it was your instructions, your prompt, put it in the system prompt. And then there's untrusted inputs. So if we have a web application, facing users, those users can come and ask questions, they can try malicious things, they can try to hijack the prompt. That's an untrusted input, it could be anything. So make sure to put that in the user problem. And over time, as the training gets better, the large language models should be able to distinguish better between these things and hopefully improve issues like prompt hijacking or prompt injection when somebody tries to get your model to do something that you really didn't intend for it to do. There's also a few shot learning. If you're not familiar with this, essentially, you can really boost your prompt by including some product or some pair of examples, like how to fix a leaky leaky faucet and then include some examples of that and other plumbing related questions and how you want the bot to answer. So this is starting to go in the direction of show not tell, which we'll really focus in on when we get to fine tuning. But three plus examples is a really good starting point. And then chain of thought is another really useful technique. I find it personally annoying when I'm using chat GBT. And I know, that's what it's doing. It's just trying to reason through it before it actually gives me the thing I wanted. But it does help the large language model because as its reasoning that gets put back in. This is an iterative process I'm going to show you, but it gets put back in. And so it can use its own reasoning to help it get to the final step. It's just that you have to make sure the reasons through before answering if you ask it to justify its answer afterward, it's not actually improving the answer, because it already just jumped to a conclusion. So that's the section on prompt engineering. And inference is really scary word, okay. But it's just it just means that point when the model starts generating text, so these are interchangeable terms, you can say inference, generation, prediction, completion. Leading, it's all good use Oh, all right. Um, a couple other interchangeable words, if you're not familiar token and N word. Basically, the model is will generate sections of words at a time. But for the purpose of describing how they work, it's okay to just say, the generate one word at a time, even if it's generating like the prefix, and then the end of the word, and the ing and then the punctuation.