Mon, Aug 12th: “Controlling and Constraining LLM’s” 6:00 PM
11:51PM Aug 12, 2024
Speakers:
Josh Zapin
Dan Murray
Daniel Ritchie
Bill McIntyre
Uche Ogbuji
SallyAnn DeLucia
Aaron Bach
Keywords:
ai
prompt
generative
data
llm
talk
question
organization
model
copilot
people
individuals
gpt
output
user
large
engineering
language
tool
called
Are you plugged in?
Okay,
and I think we just need it for your presentation. And
then, yeah, so is this
Okay, sweet,
you can I think you can use the space as
well, okay,
and then,
will you help on the transition?
Yeah,
so I think what I'll do is I'll use the lab mic for interims, okay, and then when I introduce the first speaker, I'll probably switch to this, okay? And then I'll have, I'll have them come up and set up the layout, and I'll go talk. Yeah, sure, good,
yeah, I think we're good. I've unmuted it, yep. And then, yeah, I see your net. Yeah,
you can do a test, testing, 123, Testing, because it's going to be recording. Yep. Do
Oh, just gonna grab yeah, Stretch. Them apart.
It may done. Oracle has the system
that kind of the customer or business management system. What is it called? That runs everything, or am I thinking of Salesforce? It's kind of like Salesforce, but it runs your whole business, I don't know, or class sort of things. I think Salesforce does CRM CRM,
but it also runs your website. Does this? This is one big package.
Anyway, that was part of Oracle I would have never been in contact with.
No, it was just there was an article about how this they try to transition from all these different systems to this one in the UK, and then they couldn't collect any of the taxes from anybody in like the city went
that think about this. Oh,
I thought of you.
Thanks. It's like, you know you, it's like, you work for Microsoft too. You get people who, I used to blame, my friend, who look for Microsoft or Microsoft issues, yeah, I
had somebody really hounding me for a while, once they wanted to hire me as a vertical DBA. Oh, well, it's like, I have never used Oracle data. That's
what I think of, right? I
mean, the original I
know, and I can understand that person. That merely just looked into that microbial they would have understood that probably wasn't going to be very easy,
alright, although, when you're describing sounds like
Salesforce. No,
see you as she uses it. There was,
there was something called Fusion.
Think of it. It's another, like, it's a company.
Oh, are you thinking of the advertising
one on all the podcasts nowadays?
It used to be a separate company that Larry Ellison funded and then bought it. I
think, yeah, maybe,
yeah. And that
this week, that's it. Next week, yes, that's, that's what's bankrupted, but it's, but, I mean, it's like a different company, so
you can't really blame it, but I
like, Oracle literally has
a presence.
I mean, the advertising like crazy. I mean, it's like, see you's using it.
Well, they see your bookstore using I mean, within the company, NetSuite is kind of a separate, yeah. I mean,
it's like an acquisition
Yeah, it was this kind of sketchy thing
that's built into
this huge company, persuaded, which may have been Richard, oh, really, yeah, so, but that was quite a while ago. There were actually some met screen people connected high up when I was in the folder.
But I've never Richard. I mean it. I can see everything. Why don't you turn on the microphone, on the
Okay,
just turn the
patty through his
last, Yeah, we're saying recording through his lap. Oh, it is,
that's
so that's how it goes.
But I thought you did you initiate? Didn't you initiate the Zoom call? Didn't you initiate the Zoom Yeah,
yeah, okay.
Everybody grab a
seat. We'll get started. Please sign you is fine. It's like,
tell me do this. Mia.
Welcome everyone.
See I we just
hit 1600 members last week. We're already at 16 or so. Thanks to all the volunteers who make it happen. It's definitely a community of people that work on this group and run the subgroups, which you'll hear about in a minute. Okay, just a couple housekeeping issues. There are three exits. There's two in the front, one in the back. If you need to leave early, that's fine. When we get to Q and A, please use the microphone instead of shouting out the questions. I will walk this around bathrooms are down the hall and carry out the trash, and if anyone needs to escort to the car afterwards, please let us know. So I'd like to thank our board members. Richard Gann is my co founder. Here's Richard right there. Anna Sean giant and pranjal are students here at CU that help, and pranjal is the one that's with us tonight. He's helping with AV. Jacob is going to come a little bit later, and then the subgroup leaders. So subgroups are special interest groupings of people that are passionate about one particular topic, and they get together sometimes once a month, or more than once a month to go a lot deeper in their particular topic, if it's engineering or marketing or ethics, then we can go at the big meeting. So I encourage you to check this out. We'll have a link to our link tree, which points to all of these different groups. So first, let's just give a round of applause to everybody that helps make this happen.
Thank you guys.
Do any of the subgroup leaders want to make an announcement about an upcoming event? So Okay, Bill has an upcoming meeting that he will tell you about.
Yeah, hey, thanks. So, I'm Bill McIntyre. I run the engineering subgroup this Thursday morning, at 830 in the morning. Very early risers. We're having the second in our sort of deep dive series, large language model underpinnings. What's under the hood? Andrew spot, who helps me co lead the engineering group, gave a great lecture a month or so back on LLM, the embedding stage this week's will be on the output stage and prediction and how all of that works under the hood. It's probably the most technical of this year's path. Everybody's welcome, everybody's encouraged, and we take time to sort of really can dig deep and explain things. So yeah, this Thursday, 830 in the morning, it'll be at founder central sweater ventures, and it's on the calendar and the link tree was shown. A bit cool.
Thank you. Oh, awesome.
Thank you. Bill. Any other subgroup leaders want to announce an
upcoming meeting? No, okay.
Susan, who runs women in AI, asked me to mention her meeting, which is tomorrow. There's a QR code to get to the page to sign up for this. This is a really cool meeting. She already has 44 RSVPs. So if you're a woman working in AI, highly recommend you check out this group. It's, it's, it's very cool. I'll leave that up there for a sec, in case you need to to capture the QR code. The next slide is going to be the general armag Link tree. So everyone good here? Okay, cool. So this is our general link tree. So this has information on our Slack page, the shared calendar that Bill mentioned, the meetup page, the website, the videos. So this is kind of like the overall portal to all things our mag how to buy one of those cool T shirts with the friendly robot logo. You can find out about that here. So this, this is like our kind of global page to
to reach everything.
Okay,
you guys good there. Okay, so I wanted to thank our pizza sponsor, which is Nexus tech. Mark Richter Meyer is here tonight, and he will say a few words. Thank you Mark for sponsoring these uh,
national MSP, one of the larger ones in in focus on mid sized markets, and we're focused on hybrid cloud solutions security, and then, of course, gas solutions. And so we have all kinds of, like a private cloud efforts to do AI and lots of capabilities for data engineering and things like that. So thank you. Great to hear
Cool. Thank you so much Mark. Mark is actually my old boss from a million years ago. Right after Al Gore invented the internet, I was at a startup we were doing. We did a website, the first website for Universal Studios Florida. It launched in the mid 90s, and it had some really cool stuff, like static pages with lots of graphics on it. You know, no those, those are, those are fun days. We flew down to Universal Studios. We went on all the rides. We met with a VP of Marketing and and spun up a site. So it's quite a quite a time. Okay, upcoming meetings. So next month is going to be AI and education, and Jacob is putting that program together, Jacob Corona, who I mentioned previously, and then some TBD sessions. The dates are all set. So it's always going to be second Mondays in this room, six to 8pm
any chance of changing back with Wednesday nights?
It's all on room availability. Yeah, I'm sorry we're sort of like the last ones to the table, because everyone else books the room and we have to take what's left. So for this semester, we had to switch to Monday nights, yeah, so we're looking to do a startups meeting, a robotics meeting, and a healthcare meeting, and on the topic of robotics, if you know anyone that's working on an AI robot platform, you know, get in Touch, because we're trying to to put this meeting together. I'm in discussion with this company right now, engineered arts, that does. Amica, has anyone ever seen this robot in Vegas at the sphere? It's in the lobby, this talking humanoid robot. Anyone ever seen it super cool, like zero latency. You're asking your questions. It's just responding fully. LLM driven, and I'm hoping they're going to bring the desktop version, which is this one. Any guesses on how much that costs? This desktop robot, 100 grand, and it could be yours for hat for that used to have a boulder office. There's a number of fur hat robots locally. We've had them present. I'm hoping to get one in the room. Figure two, of course, optimists like so if anyone knows someone working on AI robot platform, we really want to talk to them. We had spot the robot dog in this room at our last meeting, walking around like you know, spot, walking up the steps, walking down the steps from Boston. Is it Boston robotics? So that was cool. Okay, next, we're going to try something new here. I asked Colorado AI news to give us a brief update on some of the developments in the Colorado AI space. We're fortunate to have the publisher of Colorado AI news, Phil Nugent, with us tonight, Phil, if you raise your hand, Phil's right there. So Phil's bringing two main stories to our attention for this update, the den AI Summit. It's coming to the Denver Convention Center September 19, and the theme of the summit is AI for good. And then secondly, Denver had a AI tinkerers hackathon where our make speaker and member Daniel Richie served as a judge. Daniel,
are you here? Yay, Daniel.
That was a very successful hackathon, and Phil is highlighting it because it dovetails with the story about Boulder and Denver's rise in the tech world and their central position. So if you have any sort of story ideas that you'd like Colorado, AI news to cover if you want to pitch your own monthly column in Colorado, AI news, Phil's the guy to talk to. Those are pretty lucrative gigs, right? Phil, okay, yeah, that's, that's what I thought. So see Phil afterwards if you want to pitch him on story ideas. Yeah, there's a subscription page too.
Okay, um,
tonight. So after our meetings, we've been doing after party at another secret, undisclosed location, totally unpublished. It starts after we leave here, and it's a dive bar that's nearby. I can't tell you what it where it is, but Vanna has a clue for you. Okay, you guys are gonna have to piece it together from there. Okay, so just for announcements, is anyone here looking for a job right now? Any, any job seekers? Okay, so about six or eight is anyone hiring right now? We're anyone hiring physicians. I see one hand. Do you want to talk about what you're hiring? I can
sure briefly, but it's not
making it exciting.
So if you see my T shirt eating the right word, and we're hiring, so hit me up afterwards,
and I'll give you some links. We'll connect
on LinkedIn, and we'll see what we can find.
If you if you email me, I will post it to our channel on the slack. Details on an email, anyone, anyone else hiring, any other job openings? Okay, yes,
oh, Daniel,
but our hackathon sponsors for this weekend's hackathon car garage raised double what they were looking for. So they have like, 300 extra million dollars that they're not sure to do
it. And
Nvidia just led around with 12 laps, and they raised 50 million series A so guarantee those two companies are looking to hire people,
two cool companies
that I would consider uploading into so you're interested. I'll try to send some info to Dan for their speeches, but we can do some contacts locally as
well. That's right,
confuse us.
Wait for your trademark lawsuit. Yeah,
thank you. Dan. Okay, I
used I like to ask a few questions, or, no, actually, let me open it up. Does anyone have any other announcements around sort of events, other groups, things coming up that we should know about? Does anyone want to announce anything to the group here? We usually open it up in case there's announcements. Dave
mayor, technical integrity, one of the sponsors, will have the boulder builders meet up. As you like to know, we have meeting coming up Wednesday. Always glad to coordinate with Dan around one another, and we do have the Denver start week version coming up, so please RSVP, there are a few slides for that. So recommend that you get ahead of that, because it does sell out. And also just want a big, big round of applause for this guy who does a lot. Yes sir.
Thanks for mentioning AI builders. That's an awesome event. If you haven't been to it's fantastic, but you do need to sort of book ahead, because it it fills up. Um, anyone else? Other announcements, events, other groups, going once. Okay, so I usually like to ask a few questions of the audience so our speakers can get to know who's here. So first of all, for how many people is this your first RMA meeting, first time you've attended, okay? And how many people have been here before. So it's 5050, that can be good news or bad news. I never quite know how to interpret that.
Okay. How many are from Boulder?
Who joined us from Denver tonight? Okay, good. You know, there's a Denver subgroup too, which you can go to as well. Run by David J usually last week of the month. Um, how about anyone from long mouth?
Golden. Okay. Laramie.
Laramie, okay, so who came from the farthest south? Did anyone come from? Alright, Castle Rock is our is our current winner. Did anyone come further south in Castle Rock, Florida? Okay, I gotta go with Florida. So we're going to give away a free automatic mug to the person who joined us from the final south of
Okay, so,
version of chat, GPT,
more than half my hand is up there. How many years a paid version of clod? Maybe a quarter paid version of Google, Gemini? My hand is up there too. Paid version of perplexity. I'm on that one as well. Okay. How many people have ever used a large language model in a production environment?
Wow, that's amazing.
How many have experienced implementing guardrails or constraints in AI models?
Quite a few.
Have you ever changed what you pasted into an LLM to protect sensitive or confidential data, so you paste it and you make some changes before you hit send. I've certainly done that. How many are familiar with the concept of LLM observability? We're going to hear more about that tonight. How many feel confident in their ability to control the outputs of AI models.
See, like one and a half hands up on that. That's why we're here. Hey. Okay, how many have worked on projects involving sensitive information along with AI?
Quite a few. Is
anyone currently exploring ways to improve the reliability of AI systems in their work? That's that's pretty many. Yeah, Has anyone experienced issues with inconsistent AI model outputs in your projects? Right? That's everybody here, okay? And for how many people is enhancing AI model performance and reliability a priority for your organization?
That's that's a lot of people as well. Okay, cool. So let me introduce
first.
I'm going to start introducing our first speaker. And let me do a handoff here on the lab mic.
I will switch to the handheld.
So,
okay, so our first speaker is Bucha, and he's talking, LLM, help us enhancing the reliability and find work. She was really the one that came up with the idea for this meeting. We were sort of talking, bring someone in, and he asked the idea, and I jumped on it and added some more suggestions for, you know, for a full panel. So thank you, Richard for doing that,
which is a AI engineering lead consultant
and startup founder with a long history in data and network technologies. He
contributes to open source projects like abuji
pt. He co founded the AI DIY YouTube show, and he's also a writer, public speaker and artist. He furthermore also leads the AI for entrepreneurs and startups, AES subgroup for arc. Please join me in welcoming
Bucha. Now, am I coming through now? I have
perfect alright?
So anyone who pretends they can control a large language model, I guess I've got my workout for you. Alright, so we're going to talk all about that, and we're going to do it in 20 minutes. Wow. Okay, so just a little bit, just as I mean that pretty much said everything that matters. You mentioned AI, DIY, armature, entrepreneurs and startups. If you're involved with startup, you're an entrepreneur, etc, please get involved. We need more activity. But you know, data, I work with rag, how can we get multiple large language models to work together to solve, you know, problems in products, etc? How do you, you know, get large language muscle act as agents? I'm talking a bit more about that. LLM off. How do you do traditional engineering on this, all this new technology that most of the time, the first time you see AI stuff, it's in a Jupiter notebook. Let me tell you, came out the boys in production, and then Jupiter notebook, so we help deal with that. And how do you deal with self hosted, private models? So it's a company called brewery, and we're just a consultancy, and this is going to be some of the stuff that we end up having to deal with in in our work that I'll be talking about. Okay, watch out for that shadow there. So, okay, I'm a soccer coach. Albion boulder Used To Be Boulder County united. Go Boulder County united. A lot of kids in the area play for that soccer team.
I coached
2013 women's, yeah, girls under 2013 basically, right. Here's my style to coach. You show up at the game and I'm like, Hey, we gotta win for me. Okay, here's the starting lineup. Now I need you guys to play defense, okay? And I need you guys to win every ball. Got it excellent. Having a good game. I'm going home. Alright? I mean, that's the way to coach, right? Well, no, of course not, but that's how we use large language models
every day. That's kind of what we're doing,
right? The coach shows up to the game, gives all the instructions, goes home. Let's hope we get the right things in the large language model, so just one initial prompt at the time. No, no, let's fix that. Okay, so we need the coach on the sidelines through, uh, guiding the players as to how to actually score the goal prevent the other team from scoring the goal. That's how we're more likely to to win the game. So if we are able to do that, if we're able to stay, I will talk a little bit more about the approach. If we're able to stay with the larger energy model assets going through its process, and not just an initial prompt, what can we do with that? Well, we can grab structure data from unstructured information. If anyone's ever like given an LLM support documents and said, Hey, can you find me the address of the the address of the producer of this product in this large document? It might work, it might not. If it works, who knows how it's going to come out. Come out in any format whatsoever, but you want to come out in a format that you can feed into a process so it can continue to work right? So that's why you need to control this outlet and how it generates it if you want to generate code. So to some extent, we already have a handle of this. If people who code use, you know, copilots or that sort of thing, and you know, they're pretty good at kind of following along with what they're doing. But if anyone's ever used one of those, they can also get really crazy, really quickly. So sometimes there's situations where you want to have constraints on the code that's generated by large language models. If you want to produce specialized content, I know it could be like features, functions, benefits for a product page or something like that, but you needed to follow the corporate standards, right? You need to be able to control and constrain it. And then this is, I'm going to be talking a lot about this, which is calling out to code interacting with other apps and other APIs. It's also called tool calling, agentic frameworks, agentic networks, etc. More on that. But in order to do that, if anybody's ever tried to, like, just try out some new tool that they never tried before, online, give me some instructions that said no syntax. Or online three, it could be a spreadsheet or anything else. We already know how finicky they are, so we need our large language models to be very precise in what they put out if we're going to have any chance for them to interact with other tools. So that's what we can do if we if we get there, and even better, we can improve integration, and we can reduce waste. So what do we do now, if we want to watch language model to do some of these things we're talking about, well, you do the old prompt engineering thing, right? You try a million times, you tweak a little bit, you add a bit more here, and you say, no, please do not say, Here's your your JSON. Just give me the damn JSON, right? We do all that's prompt engineering, right? We do error checking. After doing all that, we have to check it every time, and then we retry sometimes, like, okay, no, you got it wrong. Try again. But those are really fake it till you make it right. That's not really guiding the large language model that's kind of going home and hoping something that happens right. And really they don't work all the time. And that's that can be a problem. And it says, so it's money. You know, each request costs you money, each token in each request costs you money, and it's not just money, but it's also energy. It's just waste. It's unnecessary. So So yeah, let's try to avoid doing that, if we can. So what are we doing instead? So I think kind of the way that is most pragmatic right now for controlling, constraining large language models is something that is called structured output. It can be called grammar structure output, because, like the way computer programming languages are produced, is something called grammars. You can create some of those that you can tell the large language model. You have to follow this grammar your base now, there are two sides of it. You have to train the large language models to understand the rules of grammar, right? So it's almost like teaching a large language model how to know in any language, just trying to learn what's a verb from from a novel, etc. Once you teach it that, then you can it. You need to say, Okay, now we need a subject because we're in English, we're getting a sentence. Now we need a predicate because we're the next word. And now we need an object, right? We need to teach it. We need to guide it after we've trained it to take that guidance. So it's kind of a two step process. Now, that approach to guiding it is through what's called sampling. And I'm going to go through an example of what I was sampling. Sampling is basically the pattern of how a large language model decides what to say next. That's basically what is pretty straightforward concept, a token, which will come up a few times, is basically the basic unit of how the large is basically a number, but that number means something in language and sequence, the tokens become the language that we get as output, and it's also the input. It's basically how the large language model operates in the end. So yeah, just a few examples of things that you can kind of go out there, mostly on GitHub right now and and use to apply grammar resembling wellness. CDP is a really popular it's for local hosting your own local large language models. It has grammar, grammar support. Guidance was a project by Microsoft, but it's now just like it's on guidance.ai. Thing, but yeah, that's on GitHub. And hey, if you, if you're feeling this, go get a star. Maybe you use, maybe not. But, you know, made in Boulder, y'all anyway. So it's basically that, instead of, like some it uses, like a JSON control language that says, hey, the first thing I need here is, could, you could say something like the first thing I need here is a subject, and the next thing I need is a predicate. Next thing I need an object. You can do that sort of thing. Which will, you know, you can do that sort of thing with any of those, really. The only one is made in Boulder. So I do want to mention this is cool, just last week. So, you know, Sam said we were talking about this, you know, a little while ago, and I'm like, Yeah, let's do that. It's a good time for it. You know, you're actually about to be able to get your hands dirty on this stuff. Just last week. Open AI announced structure outputs in the API. So open AI, like, I want to say November, their last Developer Day, they did give the ability to have JSON mode so you could tell chatgpt to basically produce or GPT and essay to produce JSON. Right now, it's pretty good, but, you know, it started out, it's like, you know, 50% fidelity, then each time, they got a little bit better, but you're still with the best options, like 88 you know, 30% that's not you're back to that. Fake it till you make a problem, right? So now they're like, we're talking 100% you give me that Jason schema saved Julia uses, and we will guarantee 100% that that output is going to conform to that Jason schema. This is a big deal, guys, so I really it's so nice. Is it cool? And even though, of course, I love everyone to use to Leo, reality is, most people use GPT, which is imagine Tulio is for models that run on your your local machines. And this is great that you can now do it, because all the stuff I'm talking about, you can go out there and do it now. You talk to your developers, you talk to your product people. You say, we need to integrate with this API. What is a JSON schema? A lot of the ways that APIs talk to each other on the web are through JSON. You develop JSON schemas, and you can have GPT. What do you call it? Conform to that? So that's super cool. So that's one way of doing it, that ground release approach, another way of doing it. This is kind of, I'm not going to go too deep into it, because this is a bit more futuristic, but it is guidance. It is still the coach on the sideline. But it's not like it's still the coach on the sideline saying, okay, you know, if you play defense and clear the ball away, no matter what, it's kind of like, I want to be the field of wrestling today. You
know, it's just kind of more touchy feely, but it's research that came out that basically, you're steering the large language model towards styles like sentiment, emotion, writing style by adding what's called style vector. So vectors are these sort of unit, big bunches of numbers that get mathematicized. And that's how large language models work, they spit out all these numbers that turn into tokens and language at the end. So this is a system for adding what are called style vectors, which will guide the large language model in its process of generating output. And some of the stuff that they're doing, like, if you read the paper, which is, yeah, kind of remind me it's really cool. But like I said, it's not quite there yet, but it is coming, and I know that there's some products trying to implement it. So it's coming, and that's why I wanted to mention it. But I'm trying to focus on stuff that you can do today, and another thing you can do today, I'm calling you 2.5 out of 2.5 because this is, this is really fake it till you make it. But if you're picking it to you make it, at least, use some tools to help you. I do lots of projects where it's like, okay, you've got to write the whole workflow. You know? You have to try out a prompt, and if the large language model response does not conform, you need to figure out a way to check that it even conform. And then you need to figure out the process of retrying. There are tools out there that already help automate that process. Outlines, large language, model, language, model, query language, SG, language, there are examples, again, if you go out there to GitHub and grab those right now, and they will help you through that process. I still thinking to make it just, you know, make sure that we're clear on that. So to dig a little bit deeper, I wanted to get an example. And like indigestion, I had this product idea that for you know, the new company will have an LLM. I can't, I can never figure out what to order off the menu. Let's have an LLM. Do it right? I told you, I moved my indigestion. Yeah. This is why I'm not a billionaire. I don't know how to make product market fit people, so. But okay, so this is sampling. This is a process of sampling. We start with, I would like to order a right everything the large language model can say next is available with a product with a probability next to it. So that sampling, there's another process called selection, which is saying, which of these should I pick? But it's going to be based on that probability. So obviously it's more likely to say I would like to order a custom made suit for my and I do like an ingestion, but that's going to be too far. So we want to find a way to constrain it for what it's worth. Those are, these are actual numbers that I ran from through to Leo to like, look at the guts of what a large language model on my laptop was actually. What are the choices you're actually making as I process these constraints? So obviously, the first thing that you might want to build a bit of breakfast Bistro is down here. Maybe I want to handle something. The probability of that is 0.02% yes, yes, yes. You can improve all those productivities by having a better prompt, a page long prompt, this is actually a menu, and here are the items on the menu and everything else like that. And that's cool, but the soccer coach that gives a one hour Spiel before the game is still not as good as soccer. But a coach who's right there, so the idea of control is simply Okay, let's just cancel out a whole mic. So sorry. Sorry you. The words are supposed to be structured, not just the the probabilities. And I'd like to say that hand now is, is the is the top choice for me, right? So it's no longer going to be I would like to order a custom made suit for my dot, dot, dot. But we're going to start with I would like to order, once you process that, your next probabilities kind of go up there. You say it's not going to be from, it's not going to be for, it's not become, it's not going to be any of these. At each step, you're telling it, I'm taking away a whole bunch of your options. I'm limiting what your options are. I'm controlling what your options are. I'm constraining what your options are. And so step by step through that constraint. I'm not going to break through the whole, the whole thing, you know. So what I did is, and I didn't actually show it, because again, actually the more technical people in here than I expected. I kept thinking, Okay, I can't show any code. So sorry about that. But basically, I had the Jason schema, which would basically say, after ham, you know, you can say, you know, ham and cheese croissant. You can say, say ham and egg, or a ham omelet, etc. You can't say hamburger, for example, which also is really high, off the probabilities, off the selections, right? So now, of course, you use that today, I would like to order a ham and cheese croissant. So I hope that illustration just kind of helps a bit, really, just get a broad picture again. Like I said, if I knew people would be more technical there. So we have it, do it? Do I say questions till the end? Or
yeah,
I think we'll do that. I'm very if you can remember
and hold on to that. So yeah, if I knew there were more technical people, I might have given them more technical or a few more technical bits. But that's the basic idea. And I can expand on any of that, if anybody wants, during our QA, but just kind of talking a little bit more, because this 20 minutes is going quickly. Oh no, I went to one too far. So I do want to talk about tool calling, and this is kind of, to me, where the rover meets the road. Tulio does implement structured data generation, but the reason why I wanted to do it is because most of what I do in my work is having the large language model actually call external APIs, etc, and then that's much, yeah, a lot of it is, I should say there's a lot of text generation for various purposes, but a lot of it is having it called external APIs. And so what you do with that is it now has real time, real world knowledge, right? You can look up all these cool things, like, you know, what's currency conversion? You know, do a search engine right now, give you the latest results. You know, look up in our enterprise database, what's our order backlog. You can do all these things in real time in the whole workflow of the LLM. Or you can actually affect things in real life, inside and test that if you can turn on your house lights, you're using home automation. All that stuff becomes popular because possible. And in that case, the LLM is an agent. Basically, we can participate in workflows like any other code that we write, and you can even collaborate with people and other LLM agents. And I think this is really cool when you start like, like, you know, having the LLM basically, almost have, I mean, obviously you need control and constraints, but trigger that process of human oversight, etc, right? You start to, you know, think differently about the whole the problem of, how are llms going to replace people and stuff like that, which, incidentally, I don't believe they will, but that's right, this is someone who makes their living, you know, helping llms be better. But this is very hard to implement effectively without some sort of constraint generation opening. I actually implemented constraints a long time ago, but they didn't want to expose it to us, because they wanted to implement a tool calling it worked under the hood. And really, I don't work there, but from what I've heard, I'm pretty clear that it was just a matter of testing to make sure this is okay for everybody to use through Jason Sena and to Leo, given the name, you can imagine, it's actually geared towards tool calling, and it implements structure generation in order to enable this, which I already said. So this is another cool thing. Just last month, McKinsey, it's kind of like, it's one of those things where you're like, I've been saying this for like, not six months, like, you know, McKinsey with $1,000 an hour opinions once they say, now you're you better believe it, right? McKinsey says, future agents, next one, Sarah generative, AI and, you know, even these early days and they it is early days, let's be honest. But it's early days for everything. You know, there's so many opportunities once you have this idea of virtual colleagues, which are basically just LLM agents. So what this all means is that prompt engineer is going away. We're going to go back to real engineering. Sorry, my electrical engineering degree told me to say that the whole up, no, no, if pumped engineering is going away. Because, you know, computer engineers made llms better at following guidance, and the llms could become better at designing agent networks. And, yeah, I have no idea. So no easy answers, but I think the important thing to note is that tool calling and structure output and generation are the future, and the important way to sort of, you know, build towards that future. Is working with time, but is sort of working with observability and making sure you know what the llms are doing. There are tools out there that you can use right now, etc, and, yeah, look forward to talking more about this and seeing the rest of the talks. Thank you so much for your time.
Thank you. Thank you. So I can't set up our next speaker is Sally Ann Bucha, and she's going to explore the use of guard rails and experiments to enhance AI application performance and reliability. This talk is going to provide specific insights into optimizing the development processes to achieve cutting edge results once applications hit production, as I mentioned, Sally Ann is the product manager at Rise AI, which is a leader in LLM observability and evaluation. She's a passionate machine learning enthusiast and generative AI specialist. She holds a Master's Degree in Applied data science. She combines a creative outlet with a dedication to developing solutions that are not only technically sound, but also socially responsible. Please join me in welcoming Sally and Delicia.
All right. Hey,
one you hear me? All right, yes, good. Awesome, awesome. I think Dan said everything there really is to know about me. But hey, I'm Sally, and I work for rise. AI, in case you're not familiar with us, we are an observability platform AI across the board, so your traditional ml models, but as well as LM and we have a suite of tools that will help you across your development journey. Recently, I've been really focused on our AI assistant to observe AI copilot. We released it about a month so for today's presentation, we could talk a little bit about our journey and how specifically we're harnessing experiments and data sets to kind of optimize copilot. So with that, I think we should probably start with what copilot is, starting with our architecture. So Bucha actually gave a really great overview of agent figure, closing, tool calling. But this is the kind of high level overview of how our arise copilot works. So it is an agentic workflow. So we have a user who's going to give some kind of input theater asks the copilot for some kind of insight on the model. And so it's really important for any type of agent workflow is you're going to have this router planner. I'll kind of use those words interchangeably, and what that job, but that router's job is, is to pick the appropriate skill to answer the user's questions. So here it's actually longer still get first places, which is going to make a call to our API, grab some data, pass that to the element with our prompt, and then return the analysis to that user. So I'm going to drill a little bit more into the router and our tool calling to understand a little bit more of how we do that. So here we have our router planner or not, and it's one thing. And so there's a few different components that go in that allow us to help the LM really choose the right function to call for the user. So the first is platform data, and exactly what data we pass is going to depend a little bit on what router we're using. So covid has a few different routers that can work together. And so we'll we'll pick some data based on which router we're in. The next part is actually going to be some debugging advice. So we need it to have a few guidelines to help it decide which functions might be most helpful where. And then lastly, we'll give it some state. And state is something that we've found to be really important for copilot, so it needs to understand exactly where it is in the analysis. We can't have it running in loops, calling the same function over and over again for user. We can't take it how to take the users message out of context. So we include state in there. And we actually have a blog that I post at the end here where we go through all of our learnings for covid, in depth on state. The last is important for any type of tool calling architecture, we're sending it through the chat completion. There just a list of all of our functions called, as well as some debugging results as well. And then on the other side, we have a skill. So there's a lots of skills that copilot has, but at the root of them all, we're going to have some platform data, again, specific to the tool that we're calling. And then we're going to have some debugging advice. So we don't just give it the data and say, do whatever. We tell it a little bit about what it should do with the data, and then also we tell it how it should respond. So that's another thing we found to be super important, is we just don't want the LLM to come up with a response that's different every time we give it some instruction of like, okay, your response should be formatted in this way. Or, take a look at the data, determine this, these things. And then, you know, surface the most important. So, you know, as you all know, prompt engineering is a big part of getting these things to perform, and we did quite a lot of it when we were working with copilot. So these two components largely work together, and that's how users get insights and results from copilot. And I just wanted to throw this side up. I'm not going to go too deep into it, but I wanted you all to be aware of some of the things that copilot can do as I talk through this. And so about a month ago, we released our MVP at our conference observe. And, you know, as the pm of this project, I had to kind of start thinking and being like, Okay, we launched MVP now. Now what should I go take a bet on? And so luckily, arise, has a dashboard that I can actually track all my traces through our application. And then I started plotting to see like, Okay, what's my most popular skill that I should probably go and focus on? And by far, our search function. We call it AI search. It's a semantic search over a column of data. So you don't have to use traditional filters to curate your data. You can just ask them now on finding frustrated responses, so you can add those to a data set. But yeah, you can see here. This was seen like, a few days after we released it, and immediately the search function was by far the most used skill. So that told me, all right, that's probably something I want to invest a little bit more into. And so I thought, all right, well, how are people using search router right now? And what I found was actually something really interesting. There were a bunch of users that were using this skill in a way that I never intended them to do it. So this is just a data set that I have here. I'm just showing you some of the examples like that example there, they were trying to get people pilot, to get them like a structured filter from their test. This was just a general schema question, because they were trying to understand what they could filter on. I had some users that were trying to analyze their data prior to filtering. And then I also had some users who were trying to search across multiple columns. And so they just go on that copilot was designed too. And so from this, I realized that copilot needed a new version pretty soon, and I realized that it needed to actually exist on multiple skills, not just one, and so I kept the column search. We're getting a lot of value from doing a single column search, but I also needed a table search. This is going to be a broader search like, Finally, traces that are loose needed. But then they also mentioned the SDK, so that would be like a multi column search. I also said, you know, let's make filtering as easy as possible. I'll give them a text to filters. This is where I want you to turn my text into an attribute filter. And then lastly, what I'm actually really excited about, I'm calling it LM light analytics, but I want you to categorize my inputs, or categorize something on my data into segments, so that I can then use that to filter my data related on so this was the v2 that I determined we needed. I, you know, wrote out the prompts, I added it to my router, added all the skills to the router could call from. But I had this problem, which was, how can I be sure copilot picks the right function now that I've just introduced a bunch of them to my my router, and when we first started developing copilot, like my way of testing was I had a document
inputs for each of my functions, and then I would build a little Jupyter notebook, and I would pull my router and I would just manually test. I would put an input in see if it needed to change, and just kind of do that manually, and that really inspired us to double down our investment in data sets and experiments, and it was really game changing. So for the rest of my talk, I'm going to show you a little bit how we actually use data sets and experiments for this new AI search skill that I built, and we'll talk a little bit about how that that works, and so this is the process I'm going to talk through. So the real idea here is testing as you build. So instead of building and then pulling out my document and pulling out my Jupiter notebook and doing all this manual testing, we really found that my makeup action kind of taking our classic kind of testing that we would do in like software engineering and applying to alums, is super powerful, and it takes all that, like, manual headache out of, you know, iterating on ground. So in this example here, we just have a prompt change in the use case I'm going to talk about I have for my search router, but then I also added a bunch of functions, so that's what I'm going to be testing for. And the idea here is we have a bunch of use cases, which will be our data set, and then we have museum traces and email traces that we're going to test in our experiment. And we're going to automatically run anytime I make a change to get so I'm going to break down each of these, but just kind of giving you an overview of what we're going to go through here. And so building an experiment, the rise really, is comprised of these forecasts. Five of you include creating the GitHub action. So the first one is actually going to be defining your data set. So each record in a data set, you'll see it in my code here, is called an example, and then you're going to actually want to define a task. So that's what's going to take the example from above and then return us an output. You'll then pass that to an evaluator, and that's going to evaluate the output. And then you put kind of all those together when you run the experiment, and then lastly, create a GitHub action. So I'll talk you through building a data set first. So with arise, there's actually a lot of tools you can use to curate your data. I mentioned AI search, that's a really great one. But we also have the ability to use just regular filters, like I'm doing here. I was specifically after my chat completion spans which would have the function output, because that's really what I want to test here. I wanted to test did other router make the right selection for a function. You can also just do it manually to looking at your data and what you're seeing you do here is because I wanted to test this output message the function that was chosen. I actually used our editing feature to edit it to the correct so when all these traces were created, when these users made these new queries, I only had one function, so every output was going to be that one function. This is what you see me doing here, is just updating me to the correct one. And I'm kind of going to use this as my ground truth when I run my experiment. We have annotations that you can also use but optimizes of the use case of doing the output message was the best choice. So we have the data set here, and we'll edit it, and I'll eventually export this in my code. The next spot or next step is going to be defining a task. So a task in our world is any function that produces a Json serializable object. Typically, this is going to replicate whatever part of your LLM applications that you're wanting to test. So for me, this was whether or not the driver selected the correct function. So we're just parsing down here, just that tool call that was generated. So there's a lot of different ways that you can do this. It's just the way I did it for my experiment here, but I'm loading in the LM router template right for my repo. This just exists in a python file within my experiments. I'm doing the chat completion and then just parsing the output in the way that I'll need it for my evaluator. Now, an evaluator is a function that takes the output of a task and then we're performing some kind of assessment. So this is really what serves as a measure of success for your experiment. You can have multiple evaluators, some code based on album, as a judge familiar with evals, this evaluator class is going to implement one of those. And so this is really central to testing and validating the outcome of your experiment. So in here, I have the template hidden, just so I could get a sniff of the whole coding here. But basically what I'm doing is from that task that I have output, and then I'm comparing it from that output from the data set that I updated, and just asking the L and like, did they get the right function? And so I also parse out some things like label which is either correct or incorrect, or 401, as well as an explanation. Explanations, I think, are really, really powerful, because it gives us an idea of why the LLM, as a judge, made the call that it did. I like video this kind of LLM explainability. It's not quite as technical, but it allows you to kind of feel about the layers and understand why an LLM made the call that it did, and helps you kind of understand where maybe some of those gaps are for any negative labels. So we have our data set, we have a task, and now we have the evaluator. That's what
really makes this whole kind of solution magical is being able to actually define a GitHub action. So if you're familiar with pushing to Git or creating GitHub actions, you can kind of outline these workflow files, and then Git will run your action anytime it's on a user criteria. So here I did on push specifically for the folder cuff, where my search functions live. So you can imagine you could put this on different skills. And so you can have experiments that run for each leg of your tool, or maybe you have some that are just general to make sure things aren't broken. You know, just to find some kind of steps. But this is going to be the important part here. So hands there, you can see, I'm just pointing it to my AI search test, which just contains all of those components I previously walked you through. And so this is where the magic really happened. So now, anytime I make a push to that copilot search folder, you can see this is like a PR. You can see that it's listed there. You can watch it run. Go into it just like you would any other kind of co pilot test. Watch it run. It'll it'll do a single while insults and seeds, run the script, but we'll give it a few seconds, and it will pop up green. And then what's really cool about that is then I can go directly to a rise and view my results. So one second to catch up with me. All right. So here I'm going to go back to that data set where I showed you all that I created and edited outputs, and now I'm going to have a list of experiments from all my different runs. If I had multiple experiments that ran, they would all be listed there, and then I have all my data here so I could actually compare, look at my labels, just kind of do that check immediately, rather than having to go into that notebook and do it all manually. So yeah, I know I promised you all some conversation on guardrails as well. I decided not to kind of add it in here. But if anybody wants to know what we did in regards to guardrails For covid, I'm happy to chat along on that. I did want to highlight a little bit of the other product offering that arise has to offer, not going to, you know, pitch you guys. So, you know, this is all the other features we have. We really help you out, from early development all the way through production and to help and from doing it myself, I can say that our products really do help you all out. And then this is the link to the blog where we go in depth of all the lessons we learned with copilot. One of our motivations from from building copilot was, yes, to help our customers get to value faster and make their lives easier. But we also really wanted to get our hands dirty with all ones and share everything that we've done. So we've done an introduction piece. This is the lessons learned piece. My next piece will be on the deep dive architecture. So we really want to share everything that we're learning along with everybody else, in hopes that it makes building these ll ones easier. So this is for co pilot, and then, and then these two the top of our platform units. So everything describes you. So I really recommend thinking this is really a lightweight solution for any of your applications, production set of tools that this is our guy. If you want to leave us a star on, feel free so and that's my time. Thanks everyone.
Thank you very much. Oh, switch over to AD. So and he's going to discuss how organizations are navigating the challenges of data security and privacy when it comes to including employee attitudes for the AI security policy, rating tools. Mayor talked earlier about the the AI builders, you know, and that's right, first of all, sorry about that. There sort of like so there it is, a seasoned product development leader with over 15 years experience with a fortune 500 companies. He has led teams and delivered impactful solutions, including overseeing venture concepts and patentable IP at FIS. Previously, he was SVP of software development for lens interactive people, acquisitions and platform development, please join
me in Welcoming pleasure.
Yeah, some of my happiest moments. You insurance,
insurance, government, government, any sort of, any sort of industry, for a set of reasons. You are not these industries. These industries are really they understand the power. They understand the building power anybody in those organizations. Anybody in those organizations,
those individuals who are trying to serve cyber security needs in the regulating industries often see on the best day, on the best day, on the worst on the worst day. We saw a lot of the hang on these industries, especially attracted to generative first and foremost, they all of us, understand that there is real power and capability. Last year at the Harvard Business School, competition between two consulting teams, same groups of consultants, same role, makeup, same kind of workload timeline, saying that for one key there's one, one of those genes, 12% more work done. 2025,
quickly.
Now. Higher up you're looking at those numbers, 100 new bodies. My organization is capability. Excited. We have individuals in cyber security, compliance, legal waving the flag and say, Hold on a second. There are some very real concerns that we have to think about with generative value. Everybody in this room knows these already. There's the notion of regulatory compliance, right? Health care organizations are under HIPAA, financial services organizations are under PCI. There's numerous compliance frameworks, and if you do not comply with those, I mean it when I say you get hammered financially, sometimes legally. So there's tremendous risk for these organizations sitting on troves of sensitive information that presumably might have some sort of benefit with generative AI, they don't dare touch it because of the regulatory compliance risk. There's the risk of leaking sensitive data. I've all heard these examples before. We actually heard a story recently of an organization that was consolidating NPS right Net Promoter Score feedback from their customers, basically their customers attitude about the different organization, not knowingly. They shove that to chat GBT, because they've got jobs to do, they need a quick data summary of that information. Lo and behold, what ended up happening is somebody not associated with that organization came back and said, What's the general sentiment about XYZ company? And it said, based on their most recent NPS survey data, their customers are not very happy with them. Whoops. Nobody did anything malicious there, but there's the danger of leaking sensitive data. And of course, we all know reputational risks around inappropriate exchanges a model either ingesting or spitting out information that is sexist, hate speech filled or otherwise not representative of the image of that organization. Remember, these are regulated companies, and these are names that we all know image matters to them. So here's the paradox. You've got incredible capability, got cyber security saying, Hold on a second. We have to take care of these concerns. How do we handle those two together? Well, liminal rose into the breach here. We did a survey very early on in our lives, and we found some kind of terrifying data from 1000s of regulated employees that we interviewed and surveyed right now, almost three quarters of them are reporting being blocked in their organizations with generative AI, right? They're saying My boss will not let me use chatgpt, etc, okay, almost 55 a little bit over 55% actually, of those same users report that they're using unapproved generative AI tools at work anyway. They're using chatgpt on their phones. They're emailing things to themselves. They're off corporate networks. They're getting around the guardrails because they know the productivity game. These are not malicious people. These are people who are trying to get their job done. Now here's the big 160. 4% of those individuals, however, are using that generative AI powered content or work, they are representing it as their own when they go to their managers and their leadership teams won't touch that one too much. But there's issues here. In terms of organizations are developing policies, procedures, processes for generative AI employees aren't following it. And again, there's not a whole lot of malice here, folks, right? The typical healthcare practitioner, a front office person in a doctor's office, the typical insurance underwriter, the typical Wealth Management advisor, the typical individual working in these regulated industries is buried under work. They're trying to get their job done. They want to do their job well, because don't we all and how could they not be attracted to technology that says, I can help you do that better, faster, higher quality. So that is why we came into the breach. Steve and I came from FIS. We saw firsthand what it was like for a regular
kinds of business, while also innovating, while also sitting under the compliance frameworks that we adhere to. So limit will exist to help regulated enterprises say yes to generative AI in a couple of key ways. Fundamentally, we want to give the Chief Information Security Officer these organizations the ability to say yes. These individuals, I've been one, say no for a living. That's what they do, right? They are the ones that are catching the javelin. They are the ones that are handling data breaches. They are the ones that are handling attacks. These are the ones ensuring that the organizations are compliant. Never get gratitude or thankfulness for it, but they're doing it this platform designed to help security organizations be an organization, yes, by enabling the safe security usage of generative AI, regardless of how you're using it, I'll show you what that means here in a moment. Right? Regardless of the data sensitivity, and there's a lot of sensitive data in these industries, PII,
property, liminal, exists
to ensure that interactions with large language models are properly cleansed of this information in a way that doesn't destroy the overall flow of communication. So how do we do this? The core of our platform is very fortunate. I'm very fortunate to have a head of machine learning who is a multi competition winning model designer, we have an algorithm in place that is incredibly good at understanding where sensitive data is likely to exist in a prompt. Don't have a big database of every bit of sensitive information on the sun. We don't have every single regular expression or rules engine out there that handle sensitive data. We've got an algorithm that understands what are the ways in which these individuals are likely to talk about sensitive data, what modes, what contexts, etc. So what liminal does is it acts as that secure gateway for the employees of that hospital that ensure that wealth management firm, that life sciences company, in a way where we don't put the burden on the end user. So slide of AI. We say, do your job. That's what you're trying to do here. We'll take care of the security side of things for you. Now, crucial to this, it's easy, actually, really, to redact sensitive information, but as we know, large language models thrive on context. If I give a bunch of redactions to a large language model, I'm going to get a bunch of garbage back and won't do what I've asked it to do. So that latter half of this whole exchange is the ability to retain all the rich context, the intention of the user's attitude and that prompt, so that even though the sensitive data is taken out. It's no longer non compliant for that particular organization. We can ensure that the response to that end user, again, that front office staff at a doctor's office, that insurer, that underwriter, whoever that is, that they get the interaction that they're expecting from generative AI. And we do this across the three big modalities. I think we've all seen this. There are probably three big categories of use around gender debate.
On the easy one, I am talking to a large language model through some sort of chat experience. Shocker, we all know that that's been around forever. I will tell you, sitting in a room of wealth managers, sitting in a room of Life Sciences researchers, chat is the most fascinating, useful app to them right now. We can talk about what generative AI is going to be 50 years down the road. Get lost in that future. To these individuals, remember they are trying to do their job. They are buried under manual work. Direct interactions is actually very intriguing to those we're not going to overlook that. We also cover generative AI in the apps that they're using, whether that be on the browser, on the desktop. We also cover in apps that they're building. Right? All these organizations are racing to figure out, not only how to enable generative AI for their employees, they're trying to enable it for their customers, because their customers are beginning to demand it. They're beginning to expect it. So luminal does all three of these modalities pretty uniformly. We talk at length about how we do that, but I think it's actually a bit more fun to show you how we do that. So I'm going to take a look today at the actual liminal platform, and I'm going to show you kind of two different angles. Remember those two kind of primary masters, if you will, that we have to serve in a regulated industry. Number one, that end user, that person at the doctor's office, that underwriter, that wealth management, advisor, whoever that is, the individual is trying to get their job done, but also the cyber security organization whose job it is to ensure that these interactions are safe, secure and compliant. So I'm going to start here. This is the I'm in dashboard. I'm in cyber security. I like dashboards. So this is where I start my day. One of the things I love to highlight liminal we have the pleasure and the honor of not having a particular course in the generative AI race. Today, we support about 13 different providers, all the names, you know, perplexity, open AI Azure, open area and drop it, etc. We do that because our earliest hypothesis is proving true in these industries. As you all know, there is such an arms race right now, with all of these providers coming out every single day, one upping each other with models. Everyone in these industries is looking for how do I get my job done better? And they recognize that there are different models trained for different purposes across these different modalities and functionalities. Luminal allows you to connect as many instances of every single model under those 13 providers that you like. Crucially, as an administrator in these organizations, I can say whether that model is licensed or available to the entire organization, or whether it's available to a particular team. You can imagine Aaron the sales guy in CC. I'm not a sales guy, but if I was a sales guy, you can imagine Aaron the sales guy that in one particular model with very restrictive policies, but maybe a research group gets access to six or seven models with looser policies. This is the first part of observability and governance over generative. Ai, these individuals in cyber security are used to being able to provide fine grained control over the experiences based on who you are and what your job function is. We allow them to do that. Regenerative. AI, I'm gonna spend just a moment here too in our policy controls. This is where from the simple little screen, but this is where the magic happens. And what do we do when we find generative, excuse me, when we find sensitive data inside a generative AI prompt, liminal comes out of the box knowing a lot about categories of sensitive information, PII, medical information, financial information, also allow organizations to define their own custom terminology, as you might imagine, just because you're a hospital and you're under HIPAA regulation, that doesn't mean you might not have data that is specifically sensitive to your healthcare organization, regardless of what that is, we allow the administrators the ability to say, based on either globally or across all of the models that have been connected to my organization, what Do I want to do when I find sensitive data? I mentioned this before. The core of our algorithm is a set of models arrayed in an ensemble that we have trained for the purpose of looking for sensitive information in context. That doesn't mean we know every name under the sun. We know every location or medical facility or occupation on the sun. The Model understands. What are the diverse manners in which people talk about these concepts? Sounds obvious, right? Regulated industries is full of expressions and huge databases and Rules Engines. It's kind of a novel approach for them to believe that an algorithm can understand what you're trying to represent, even if we don't happen to know the specific manifestation of what you're representing. Okay? So we detect a lot of different sensitive data. Sensitive data, the policies that we apply very straightforward. Again, we want this to be a very clean user experience. A lot of security tooling. If you've been in cybersecurity like I have, user experience is not high on the list. We allow users the ability to say, what do I want to do when I find this data? The two I'm going to highlight here redaction is exactly what it sounds like, and replace Aaron Bach with person zero. It's like I never existed. I don't care if the downstream model provider says they're not training their model on my data. I don't care you're going to learn about person zero at Aaron Bach. Intelligent masking is a little bit different, where we're going to apply a heuristic to that particular category that eliminates the sensitivity, but leaves behind context that's going to be useful for the LLM to be able to produce a response. So an easy one. If I were to masks, mask ages, that's an easy 138, years old. Under the HIPAA guidelines, that's identifiable. You cannot use that information because that could identify me. But late 30s, as long as there's no other identifying information that promptly surveys is okay. So intelligent asking is a way that we apply heuristics to different concepts of data such that it removes the identifiability leaves behind context. A lot of other stuff we could touch in here, but you get an understanding now of empowering the, excuse me, cybersecurity organization with the ability to enable safe, security, generative AI, I want to show you what this looks like from an end user standpoint, merci for chat experience. Remember chat old hat to all of us, to individuals in the regulated industries, this is the most common modality through which they are engaging, generally not what they're used to. Now I'm going to come to my prompt here, or paste in one that I created for this event. Want to show you something here. I pasted this prompt and notice right away some sensitive data. Wanted to prove to you database under the sun, that's Elon Musk kid's actual son's name. I don't know how to pronounce it. It's not a name I've seen before, but our algorithm understands the context in which a name is likely to occur. We understand it. Crucially, know what we're doing here. If you've been in cyber security at all, been around data loss protection, data loss prevention software, when you put sensitive information into a platform, you get your hand slapped. Don't do that. Aaron, rewrite your prompt. Get rid of the sensitive data. Well, we know in generative AI, that's not going to work, because that's the whole point of me writing this prompt. We don't do that. We say the user, hey, we identified a couple things here, some blue things, some yellow things. I don't need you to worry about that right now. Go ahead and submit your prompt. In this particular case, I'm gonna pick GPT four open AI, little behind the scenes. It's going to cleanse that information on the fly to open AI, as that response gets streamed back to us, chunk by chunk, we're going to rehydrate it on the fly. Show you what that means here in a moment. But moment. But crucially, you can see if I'm writing here by creating a comprehensive insurance proposal. I'm an underwriter of some sort. I'm getting what I expect. I got a client X, whatever I should write his name, client. He's not.
I've got my job done here. This individual needs me to provide advice for them on insurance, I was able to plug in some data and get that information to be quick and easily. I can copy paste this massage and move on with my day. Now, again, that serves one end user, but we got to make sure that we serve both. And if I really quickly go back here to our admin dashboard, if I'm an IT administrator in this organization, you'll notice here is a log entry that shows the prompt as the user input it. But crucially, here's what we actually sent to open AI, very straightforward. You see us, based on the policies of this model for this team inside this organization, redacted that individual's name. You see an example of intelligent masking of their age. You see an example of intelligent masking of a street address, HIPAA says that a state is okay as long as there's no other identifiable information, maybe the large language model will write differently to somebody in their mid 20s who lives in California. I don't but nevertheless, we enabled that interaction for that individual, and did it in such a manner that we retained a sense of our data security, sovereignty, governance and observability in a manner that I can show my auditors. Yes, I am compliant. I've done what you've asked me to do, and I've ensured the highest standard while enabling this technology that can impact so much of my workforce. Now, again, I could talk all day about how we do this, maybe a little bit of time back there, but want to just highlight one thing as I kind of talk about the founding journey of liminal. I was recently part of a accelerator program where we got invited by a very, very large bank, a name you would know, to talk about the idea of responsible deployment of AI, generative AI, intrinsically a bit AI at large. And of course, everybody in the audience talked about important topics, removing bias, handling hallucinations, data sovereignty, all that type of stuff. One that has become particularly pertinent to me when we think about the responsible deployment of AI, regardless of our industries, regardless of our experience of craft. I'm thinking about those individuals in those industries. I'm thinking about the story that I heard of a doctor and the network of one of our employees, a doctor whose PTO was docked because they didn't complete their patient notes on time. They didn't complete their patient notes on time because they have been working around the clock for the last year, making up for staffing shortages within their hospital. We're trying to harm anybody. But they got deal. They're what they need more than anything, is time off. And they got that taken away from them because their organization had a policy, maybe followed policy. I'm thinking about a large wealth management firm that I spoke to recently, and watching individuals talk about the long 550, 100 year plan of generative AI, and watching all these wealth managers just fall asleep, watching them lower their eyes, and watching them perk up. When we start to talk about, how can we help you serve your clients better, your clients who are relying on you for their retirement. When we talk about the responsible deployment of AI at large, among all the other topics that we're already used to, I want to encourage you to think about what is on my heart really, which is the legions and legions and legions of everyday people around the world and these organizations, they are just trying to get their job done. They're trying to do a good job. They're trying to meet their performance reviews. They're trying to get their boss to be happy with their work. They're trying to go home and be with their families at the end of the day, we have an opportunity to unlock the capabilities of generative AI in a safe and secure and compliant fashion to ease the burden on those individuals. That's sort of what we carry forward when we think about the individuals in the regulated industries, legions of them just mired in endless busy work. And that part of the ultimate dream and vision of generative AI and its responsible deployment is pulling that load off their back by enabling a safe and secure and compliant experience. Thanks so much.
Thank you very much, Aaron,
I'd like to ask the three speakers to come up. By the way, Aaron, I think a lot of people in the audience are thinking, well, could this guy be my boss? Please?
Sure. You know,
I love someone,
so I'm going to trade microphones here.
I Okay, salary, yeah, yes, sorry. Okay, so questions for the panel.
Question, okay, Mark,
so I have a small startup, and I can't afford a controller. Where we're talking about this,
did this product be my virtual control person or whatever? From my small
we have a luxury of not being open AI in Microsoft, we don't have to pay large costs of infrastructure for running foundation model. What I tell industries and individuals that I speak with is that if you are not using minimum, you're probably paying too much for generating AI. Candidly, we have individual practitioners. One in particular is a plastic surgeon, they're an office of one, and they're a customer of ours, so certainly possible.
Next question, do we have someone over here that recently?
Yeah. Also a question for for Aaron, just wondering, dealing with a space like healthcare and HIPAA, does your organization like? Do you already sign like business service agreements so that okay?
Yes. And then just
curious, what's your perspective on apply like? Presumably you would want to apply like, qualitative analytics, like, a whole story to understand like, what you're doing is that something that you guys do and you're like, trying to be transparent about, or, yeah, that's
a great question. Really appreciate that. Um, you know, really, it depends. And I imagine you both would say the same thing. I think organizations are on a spectrum in terms of adoption of AI or gender. Dad, one of our customers, very large pharmaceutical manufacture. They've been using AI forever. They've got research teams embedded in it. We help them in a very technical capacity. Another of our customers is a healthcare network. One of the very first questions we got asked with how I acquire and help them so they, you know, they're very, very early on, and part of that, and I think it's touching their question, they've heard a lot about it. They're excited about it. They have some ideas around it, but they don't quite have that use case defined for their organization yet, and they have a sense of how they might want to utilize it, but there's not a really critical sort of again, use case that that organization is defined. In a lot of cases, we will come in and help organizations define that or identify it. We have a lot of wonderful material data on other regulatory industries through a bank we know that works for banks health or healthcare organization we know and seeing what works for healthcare that may say, I think, a crucial component to again rolling that meaningful fashion is not just having technology for technology's sake, but actually being able to define a use case that's going to impact your particular organization in a way that, in a way that ultimately communicates value to whoever that value is to be communicated to, sometimes, as your CFO, who wants to understand, what am I? What am I saving by implementing generator? But it is, it is, I think there's a lot of organizations that are on that earlier side of the life cycle, and they're trying to figure out great technology. What do we do with this? That's a very, very new space.
I don't know how
many more questions we'll have, because
we got tons of time, so jump in comments. Yeah. Well, it's more of a product question. Just to clarify
the question,
you're capturing your users input. Like, how concerned are you with what users of your tool are actually doing?
Great question. Part of the benefit of building ourselves as a security company from day one was that we forced ourselves to eat the stock to type two presentation we're going for high trust. Right now, we have locked down our technology in a very, very robust fashion, such that even if people are putting really sensitive information in there, there's no way that we can access it with another explicit reason. So I'm not terribly concerned about it. Yet we entered the world really wanting to help, help good people do the work. Well, we're looking for the individuals who are just trying to look at look, there are malicious actors out there that are trying to put either direct or indirect, if not all right, attacks and other inappropriate exchanges or whatever the gender. It's a challenge to figure out exactly how every single industry responds to that. But from our standpoint,
as a platform, rather than the organization having to be hopefully concerned necessarily about retirement problems, they don't know how their employees are using generative AI at all today, their security organizations are blind. We want to give them insight into that so that they can begin to sort of peel apart different sequences and deal with another one as an example, we had a prospect, not a customer. They built an internal chat bot on Azure open the offer benefits questions. Before long, people started saying, Well, my name is Aaron. I was recently diagnosed with this condition. But how do you treat that? That's very HIPAA non compliant, and suddenly that organization, they technically are the tenants of that Azure open AI model that now has that information there. But by the way, deep in Microsoft's Terms of Service, they have individuals reviewing prompts at regular intervals for the purposes of abuse monitoring. You can't turn that off. You have to reach out to research Microsoft and file a use case exemption. If they tell you it's approved, then they'll pause. So I would say organizations are thoughtful, certainly about not wanting to put themselves in a position where data is submitted in an appropriate in an inappropriate manner. They can't fully control that. Hence, why we try to assist in that? Very often, and it's complicated question, but it is one that's from a platform perspective. We certainly care about if
you don't mind for a rise. I'm also a product manager. I'm going to try and try the same question. So you showed the graph where your search function was the most popular of your MVP, but you're also looking at qualitative analytics, like a spring recording some people using your interface, watching whether they're, you know, editing a code or JSON, like you were, like, specified from change. Like, how does that factor in your like, your product strategy, as you're seeing people using your product with data?
Yeah, totally. So we definitely looked into some of those analytics work you can't block a lot of the PII and block things off we I feel like, for co pilot, specifically because of this application, and we do the home conservability, a lot of the tools I use are ours. Like, I'm looking at our traces to see what kind of conserving what's our viewing. Like, I showed you that graph, and then I showed you some of the traces of what users were actually so for me, where we're at with copilot, it's only been a month. I'm really just investigated or invested in understanding what people are trying to do with copilot, and is a copilot actually helping them? So it's a lot of looking at our traces, making graphs, visuals, trying to, like, segment my data to understand. And then it's a lot of like, user kind of interviews with like, Hey, I saw using copilot a lot. What are you trying to do with it? Is it actually helpful to you? So that's where we are at currently, but we've definitely looked at tools. So, David, you mentioned full story. That's one that we've looked at before, and they can certainly be helpful. I just don't think for copilot specifically, that's where we're at. And what's going to help me personally drive adoption of that feature.
I have a question for Bucha, if one of the companies in the regulated industries just says, Okay, we're too worried about sending these props up out to the cloud, to the commercial llms, we want to set up an internal only LLM and we want to pick the most powerful one, which one might like they likely pick, and what would they be missing out on doing that?
Yeah, so that's a good question. What is the likeliest in house, LLM? And it's a tricky one, because llms kind of have personalities. You know, even the ones that are using the plots a little bit different from Geminis, a little bit from GPT. You know, they all have their personalities. So the problem you're trying to solve, the nature of what you're trying to do. Are they doing retrieval augmentation? Are they doing function calling? Are they, you know, is it so, like, for instance, there are llms out there, which are open source. You can take all the information and you cannot layer. You can basically fine tune your own information on top of it. You can also do that with a lot. Of course, that's what you're trying to avoid. So there's no easy answer. I will just there are a few out there, like, there's, you know, Lama, 3f, 3.4144405 this is called, which is the biggest one for meta, you know, AI. And then there's, you know, there's Command R, which, shoot, I
forgot. Oh,
yeah, thank you. So there are there, there, there's so many. That's the thing in terms of open source llms that you could bring in house, etc, and deploy, and I know that, for instance, IBM Watson, for instance, will allow you to, you know, basically, deploy a whole bunch in house which are sort of more a little bit closer to in size and styles of larger commercial ones. So yeah, there, there's just so many options. But if I were to pick up one that most people would probably start with and probably find soon, on top of they would probably be, you know, lot of 3.1
cool. Thank you. Next question from the audience. Okay,
thank you. So this one is for you. Are in your models, output directly impacts the customer, what the customer received, like the example that you showed at complete policy created right how are you verifying the accuracy of your translation of masking and unmasking so that there's no,
yeah. Are you speaking specifically to how we ensure that we rehydrate sensitive information, or more, how do we know that response is valid in general?
Response is valid in general? Yeah, great
question. We're not doing that today, candidly and rationally. You know, from a technology standpoint, you know that one's been asked to be many times. I'm not convinced yet that that problem is going to be solved by I tend to be in the camp that the foundation model providers themselves are in the best position to validate the accuracy of the output, insofar as you define accuracy, that it is the most statistically relevant actually in the sentence. But it came to that there's, that's a bad mind. It's entirely possible that there's, you know, a third party solution that is more accurate what we've seen in the regulated industries that we have served, given their very stringent cyber security regulations and policies, they have already conditioned their workforce to understand, you know, hospital front office staff insurance under the this is not replacing you. This is intended to be an augmentation of your capabilities, aka, you have the responsibility of validating the information that you receive. It's an early bet. That's one that I'm gonna let play out a little bit before I you know, look at something as a director.
Yeah, no, I don't know if this is what you're asking. Well, one thing that came to my mind when I was looking at the examples is less about is GPT. The process of, you know, masking and rehydrating will always introduce some sort of potential for an issue is potential for an artifact. So for instance, you replace it with person one. What if it's a rag operation where that happens to be similar to a field that's retreat from the database? Or what if they're on the policy implication with not being mid 20 but specifically being, you know, 21 or something like that to you, I think the question I have is more like four areas where you would be a bit more responsible because you did the process of redacting and rehydrating, what your process might be for quality, right? Great
question I see now. So two things I think about there. Number one, part of our service to our customers is the service. It's not just the platform. It's our ability to consult and say, based on your organization's standards, what policy would appropriate for you, and the policies on day one are probably not going to be the same as the policies on day 300 so number one, there's an ongoing different categories of sensitive data. There's also the aspect of there are a time where our customers will report back to us. In this particular example, you apply to heuristic to a medical facility name that brought back somebody that actually skewed the results. One in particular, if I go back to that customer of ours, who the fortune 100 life sciences company, they set up an example in their liminal where they replace their company name with pharmaceutical manufacturer, which turned the prompt into Tell me about a pharmaceutical manufacturer. And OpenAI was much more the concept of a pharmaceutical manufacturer that is Claude actually put in the name of one of their competitors. So in that particular case, there's an interesting opportunity where, oh my god, software engineer solve that problem, but from a pretty customer kind of standpoint, helping to educate people on what generative AI is. Its statistical probability and output. It helps create not just individuals who use generative AI, but ones who are going to be critical thinkers of the results they get through generative AI, and if they do get a result that isn't quite correct, part of their engagement with limo, part of the relationship with limo, we kind of train them to be skeptics of what they're receiving, and to kind of continue having ongoing interactions with different types of models to see what responses they get across each so it's kind of a side set answer, but it really, I think, does highlight the need for any such solution that's going to enable generative AI, it's not going to replace your ability to reason about whether what we're seeing is appropriate, or it has to augment you as an individual for your industry, and not replace you. That's something that we pilot heavily with every issue.
Cool. Thank you. Next question, okay, right here, just about walking over, I would remind you guys, if you're a job seeker or hiring manager, later this week, we're going to send out a quarterly email where we ask all the job seekers to write in and the people job, and then we'll send out a summary email with all the people looking and then all the jobs. So figure out our list on email. You can participate in that if you like,
hire. So the question I have is, recently, I got an opportunity to use Enterprise version of chatgpt While I was interning for h1 so it says at the bottom that the prompts that we put in will not be used for training purposes. So don't you think that could mask the sensitive data that these people have
problem with?
Great question depends on how trusting people maybe flip it there the system of that large pharmaceutical manufacturer as a customer was quoted to me as saying, I will never trust Microsoft with anything important in my life again for a whole host of reasons. Now, Microsoft does similar things. They represent their data security policy. So now I don't drive my customers in terms of, well, if you want to trust them, that's your business. More importantly,
I can't do that. But
what about CO here?
What about perplexity? What about your own Azure opening exam? What if you host that model yourself? As more and more models are introduced with different governance and procedure and policies on top of them for different organizations. Again, remember security, they too are small under the side. I guess I recently was working at a conference talking about JBI, all the security professionals in the crowd were laughing. In fact, that 80% of vulnerabilities still come from unpatched operating systems. So these individuals have plenty other they do not want you to have to remember, wait a minute, what's open? AI's policy? Well, shoot, we're talking perplexing out. What was it? They both go together. Oh, God. Now there's a third party. So even if one provider is okay from a policy standpoint, the organizations that we're working with are interested in the capabilities of different models, and the security organizations are instantly saying, I don't want to manage policies across all these different providers. I would rather have a third party that I can trust to do that. Now, if I pull away the little hat for a moment, totally not. I don't think we'll be eyes out to hurt anybody. Maybe we can talk about that. If they're not training data right, their models on your data, wonderful. They should be doing that the moment that you start to introduce other providers into the equation, I think the typical cyber security office is going to start to pull their hair out.
Great question, Bucha or Sally, and do you guys want to speak to any of that? Okay, next question here from James,
hello, Bucha, in your talk about all of them constraints. Is it analogous to like a negative prompt and AI in this generation with like, stable diffusion?
Yeah. So that's a good question, is it? So if anyone there's especially stable diffusion, but if you've used a generation, you can do something like, show me a picture of a mountain, but I don't want it to have snow on the on the peak, or something like that. So it's a negative prompt. So that's very there's several layers to that. So that is more like prompt steering. So I talked a little bit about vector steering, which is basically what you're doing, is you're kind of taking, should I say, you're almost doing like additional, like clauses on the prompt, but those clauses are not always there. They're only there if the LLM itself detects that. It's got to bring that in. So it's like, okay, I made this nice mountain picture and there's been a snow. Oh, wait, they said no snow, right? So that's the negative prompting example. So, so yes, the negative prompting is very similar to vector steering. Now Lama CPP, which is a very popular open source large language model processor, actually supports negative prompting for language as well. So you can give it a prompt and you can give it a negative prompt as well. I do mention that when I talked about vector steering coming out a bit late, still being worked on, because the sort of vector steering the paper is talking about is so much more sophisticated than even the idea of negative bonding. So hopefully that's useful. Thank you.
Another question, great, and this is for Linda. So today, prompt injection is still a vulnerability, and that GPT for open AI, given that fact and how it probably won't change in near future, how can you guarantee your security on something that's in essence, insecure, and you guarantee it was like, like what LifeLock does with identities, that you have a million dollar guarantee that If they work with liminal
and data is lost, you
up to a million dollars. I do carry plenty of cyber risk insurance. My friend Daniel, an interesting thing happened when we started to talk to systems in these industries who had similar thoughts around, hey, prompt injection. Tell me what you do for that. And early on, candidly, right early stage startup, we have a lot of different problems. Injection was not one that we particularly focused on. As we got further, along with some of our early customers and additional prospects started to ask those CISOs. I kind of turned it back around and said, Have you experienced a promise injection and the CISO to a t1 100% will know? I said, Well, how do you know that's going to be an issue for your organization? There's not a willing answer there. It's a little bit of a cheeky response. CISOs will pattern on logic or experiences that they have experienced before. These individuals and these organizations are not at risk of any prompt engineering directly, because, remember, they're not allowing generated AI within that organization. Now for us, this is
stage startup. As we start to roll out with more and more larger organizations, we're going to find out really quickly, prompt injection is something that's being encountered across these industries writ large. We better have a response. We better be good product stewards, and we better come up with a response that pretty quickly I had brought in as a genius in that regard. But that is one area where part of our approach to working with cyber security folks, I've been with myself, so I know how to say this. They are so focused on the different problems that might occur, which is crucial. That's part of their job. And sometimes I can get a little out of control. They can start to imagine every possibility that they either read about or that they've heard somebody talk about, or they've heard somebody theorized about, and that can become a large enough real thing in their mind to prevent moving forward at all. We tend to think, and we tend to show in these industries, that the vast majority of individuals are using generative AI. Want to use generative, generative AI for completely legitimate purposes.
Let's start there.
And then, if malicious actors of any form start to show up, we will have a response that will have to just candid.
Next question, Ryan, so
I see kind of a common theme between each of your thoughts, which is this idea you're trying to preserve a rule while abstracting it, right? You have some metaphor behavior in your case, that you're trying to, you know, preserve the Genesis law of La without actually saying it's LA, but somehow avoid in future prompts like the finger printing. So I wonder how you deal with that. But someone really Sally with your strategy. I think you're close to saying something about Jane and reasoning type prompting strategies. I wonder, how do you guys think we should start to share our higher order prompting strategies and sort of that metaphor of, okay, you have a very similar profit that's being asked across different totally different domains of your organization, totally different user personas, those types of things. How are you starting to explain those strategies to non technical users? I
think that's that's just a tough, tough problem. I think something that we're doing is just kind of trying to be open about it and kind of do that knowledge sharing. I think that's one thing that will be really powerful. People continue to talk and share what's what's helpful. But I also think there's also a lot of frameworks or that are kind of taking those, like extraction. So if you're familiar with, like, ideas pi or they're kind of changing from engineering, I think that's another way that we'll, we'll kind of kind of progress forward. So we maybe don't need to think about all of that all the time. We're taking more of, like a programmatic approach. So I don't have exactly a solution, but those are the things that are
that's a really good product manager, by the way, selling that right here, I would answer that slightly differently in that I'm we're placing a bet, and it is a bet, so No, I don't know that this would work for individuals that we're serving. I'm going to try to place it that one. I don't want them to worry about it. I want the hospital administrator or insurance I'd rather wear to have to know about the intricacies of correct engineering. And it's based on my philosophy, our philosophy, that these people are trying to get their job done. So we're going to try this about a little over a month, we're going to release the notion of what we call model agnostic assistance, rather than just talking to one model, Jay, Aaron Simon, we're going to create a marketing bot, and we're going to explain how we want it to sound and what we want to think and sort of give us some instructions under the hood, we've got some empirical data on all of the providers that we support, who's better at what, functionally and topically and based on the prompt given to that marketing bot, we're going to try to route it to the correct downstream provider. Topic, we're going to see that's a fairly low that's a architectural that fits into our platform very nicely, but it's an experiment we want to run to see, can we take that burden off the end user? I don't know. I sure hope we can, because I feel like writ large, massive scale adoption dead with AI can't be asking everyday people. We just didn't write your prompts well enough. Here's the 75 rules. You're gonna have to really improve that it's just never gonna work. It's best. I may be wrong. I'll tell you about Bob.
Just a quick follow up. Luke. Do you think the same math that gets us those prompts is the thing that powers the rules to kind of follow on on that is the description? Yeah,
yeah. And so the same. In fact, I think that, well, I made the point that prompt engineer might be going away, and it was facetious, very intentionally facetious point. But I think it's not only kind of going away in terms of, no, can't do prompt engineering,
endless scrolling chat box.
I do think that's going to become absorbed into user experience in other ways as we go along. So you are not going to have to start thinking how to basically write college essays in a style that an elevator can understand, because that's what we're doing for moms. Engineering to your point, we're having to, in effect, try to find those abstractions
to translate what
word we want into a language that we know that the LLM can understand. There definitely has to be a breaking point. Now, I think when I say the tool calling is the future, I specifically talk about situations where you want the LLM to have creativity, but within constraints, right? You want to be creative about what goes on inside the grammar, but you want the rules of the grammar to be able to correctly communicate with an API or something like that. So in that situation, I think that, you know, prompt engineering will start to go, not go away, but it will start to become less important, because the constraints will be the main thing. You just want the LLM to do its thing and replicate human language, but the shape of what comes out of it. You want that to be controlled. And you don't want somebody to have to have all those instructions in their head to figure out how to control it. But that's to be a bit different from sort of the more black, creative social problems that you described, for example, or you're
showing samples of next question.
Well, I'm
going to go so many questions so little time. Let me focus on Monday. Coach, you mentioned earlier about, you know, augmentation, and that data is coming from that's got all sorts of sensitive information in it. We got access control problems with that stuff. But Aaron, I mean coming back to you on that augmented data. I mean, you're checking the prompts. If you're redacting, you're you're masking things out of the pumps. But how you solve that problem with augmented information that's coming in?
Great question. You talk about a specific situation where I didn't get to charge you. That's the third big bucket I would look at, which is apps people are building. Everything I showed you there, we provide an SDK to be able to talk to, so that's how we do it, right? And it's a yes, right? Don't talk to open AI handle that for your developers, kind of in line. But it is an interesting problem that that genie is already kind of out of the bottle, and now we're trying to roll in and say, Well, hold on a second. Try to try to come our way, one or two organizations at a time. But I think several organizations that we've worked with who have pretty sophisticated application development groups were already down the road of thinking, oh my gosh, sensitive data I'm getting from my user, oh my gosh, I'm pulling out of this. They were already thinking that way. We're trying to pitch them a lesson, opportunity, to give them the framework to take care of whether this
is what they're going to do.
I mean, observability,
guard rails evals is part of it as well.
Yeah, actually this, I do like one of my favorite sayings is, hey, you decide, deciding, rather deciding to adopt retrieval augmentation is not where engineering gets you. It's where engineering begins, because all of your classic problems have now started once you adopt retrieval, invitation, globalization, security, high availability, all that stuff is just
starting the mic down the road, conversation relating to prompting and the fact that prompting is still critically important to getting the results that we're looking for, and the prompts that go into our LMS highly influence what comes out. So the question here, and if I understood correctly, what your tool is doing, and the demo you showed you highlighting some areas that were potentially risky, so things that you were going to mask out, and then you sent the prompt off to the LLM as if it had been sent from the user, and the user never saw what you showed us in the background. And so isn't it important to me as a user, to to be able to interpret the results that I'm getting back, to be able to understand what was the actual prompt that was given to me, as you an example of California, right? Well, what if my prompt was really location specific? What it was important that it was, you know, somewhere in the city of downtown Los Angeles, and you extracted that to California, that seems like that would have an impact in how I interpreted the results. It came back. So how did you come to that decision?
Come to the decision not to expose that to you, to use it, not
to expose right? I'm
an old school like Microsoft engineer, designer. I would love to show every dial and toggle in our user because I would love to go crazy, or at least a startup. We tried that. We got vicious user feedback about how confusing it was. We hired a product designer who's world class. He couldn't simplify it. We realized early on, and I have our head of product here to thank for this, that the typical user that we were serving didn't want that, and it sort of broke into my mind. I think the danger of building solutions for you and me versus building them for the legions of others who are trying to access energy, who aren't necessarily either of a proclivity, or frankly, have the capacity to tweak and title those elements. Now it's a fair question if what we do materially changes the output such that it's not important or it's not appropriate anymore. So that's a problem we have to solve. But we actually we our very first generation of our product. You could, man, you could go crazy. In that word, even as a user, you had setting double stuff, people were folded. And so I'm a startup better, not just my fifth somewhere I had a lot of issues with past start building too much too quickly and not missing where my users want to go. So this time.
Great. Thanks.
Okay. Next question. Anyone on the back have questions? Okay, here we go.
First off, three great presentations. So thank you for taking the time over the evening. My question is around talking about controlling concerning this element, and I think all three of you have different perspectives, so I'd be ready, but you put on attachments into a prompt, and you're talking about that, and you're adding context via that, like Bucha or not if you attach it as a menu or it's like, Hey, I'm running my job. But here's an Excel spreadsheet of people's names, dresses, you know, ages, all these things. How does that
secure all the equipment as well? Oh, my
goodness. Okay. Application to do all sorts of document, upload, attachment, contextual stuff. Number one, I was stunned at like a very simple spreadsheet, how many tokens that consumes that wall back? So we don't want to go down what we do. In that case, text is very easy, right? Which is a wealth manager. He's dealing with Aaron, his client. Here's Aaron's latest prospectus upload. I don't see these. You kind of find the same. Find the same textual information. The one that we're working on currently, which is really interesting is sort of vision, image recognition. And we go back to the pharmaceutical example, say, zalia and Aaron are taking a group picture together, and behind us on a whiteboard, is a syntax for a bespoke custom molecule for that pharmaceutical manufacturer that represents intellectual property. Now, detection, that's one thing we can decide we want to blur that. Do we want to just try to drop individuals? I would say multi modality isn't one of the most critical kind of user experience workflows to be hammered on in regular spaces, because it is so crucial people expect it, and it is pretty darn difficult once you get a pure text. So what I would say for us, we handle PDF, stock spreadsheets, that's great. We're starting to get into image, audio. You can spend a lot of time there. And I would just say that two things we think about. Number one, how do you find the sensitive stuff, and what are you going to do about it? Those are the questions that you got to answer in the text. Answered in the technical
source, yeah, for covid, like, data was actually like, it's like, the cornerstone of how copilot works. We have to get data from our platform and perform the analysis. And so we did a lot of testing, of like, okay, well, how should the data that was something like, if you, if you read the blog that I threw up at the end there, we, we battled with the right format. Is, we ultimately decided that, like, JSON format was the best way for the LLM to understand it. So there's that, there's the balancing of tokens, like, okay, for us, it was, I need enough data so that copilot can find trends and find issues, but not too much, where things are getting, like, lost in context, or that we're reading reaching context limits, so it's definitely a balance. I think for us, what we found is grabbing the right data, like giving it enough so that I can performance analysis, but not too much, and then a lot of instructions associated with it. So not only just being like, here's the data, but like, here's data and describing it. What this data actually is kind of helps it as well. So I think that that's my take on that question.
And I'll just quickly mention that it's sort of a pass in the general case, adding an attachment or something like that is a bit of a pass through problem, because you look at the example where, again, all we're doing is taking away certain token choices, so it's still dealing with the same initial input. In fact, that's the whole point, right? We're not changing the prompts. So now what you're doing is you're saying, Okay, you were happy you had a tendency to do X. We're going to let you only do Y and Z. So it's not really going to fundamentally change how the LLM goes about it, but it will reduce the forms that it can put so it's less of a difference the fact that you're using the attachment versus regular text in the general case.
Okay, we have time for one or two more questions from here in the back, by the way, I'm seeing a lot of chipper faces in the audience, of people that are anxious to get out to the after party. So up there shortly. Okay, okay.
So Jason, here, just want to ask a quick question about there's been a lot of discussion about generative AI, and to me, it's kind of, I've been doing more extractive AI, and I kind of don't hear that term very much. For example, I believe there's abstract AI, then there's generative AI, and you restrict a little top of it, then when you get your AI response back, you have to do extracting again. So I'm kind of wondering, you know, why is extractive AI not the terminology that we're using?
Can I just ask my extracted AI? Do you be extracting data from from materials? Correct? So
I'm using generative AI with some of the stuff I'm working on. But for example, for strengths, I'm telling AI to put, you know, information in a specific place, in adjacent logic or whatever, so I can map it, and then when it comes back, I'm going to do extractive AI on that. But everybody just talks about generative AI, generative AI. But a lot of extra strength work that I'm doing is actually some form of extractive AI.
Yeah. So you're kind of run less confusing, but I think it comes more from the fundamental nature of the AI. So for instance, traditional AI, going back to the 60s, the biggest thing was classifier systems, right? You You're classifying images and stuff like that. So they're different than you had, like, you know, different types of gender, of neural networks, etc, with different applications. And so this whole idea came up, and we just pass data through a whole bunch of transformations, and something comes out at the end. Now what comes out of the end can be a viable extraction of what was put in at the beginning. But the reason why it's called generative AI, even in that case, is more a matter of people saying this is kind of different from classifying or sentence. Fine, too, but there's no reason why you can't use that. That term, you know, it's, you know, data extraction,
etc, perfectly good term. Sorry.
Thank you. One last question here.
Hey there. Bob Fuentes, you earlier had the example of the word salad at the Bistro, and can I order and you knocked out a bunch of words. Is that where the schema is essentially defining the list of knockout words, and do you need to essentially pre crawl, all of the likely bad apples that you're going to define as negative knockouts.
Yeah, he asked a good question. So I really struggle with, how do I visualize the process of implementing the constraints? And, you know, it's kind of hard to deal and you're dealing with a, you know, vocabulary, 32,000 in that particular LLM, so it's, how do you visualize, etc, so I just kind of pick common words and put it in there. But it's, it's less of a you're knocking out words. Are you picking words to knock out? It's more saying, This is what is permitted here, right? And then everything else that's not in that list is automatically knocked out, if you will. And that's kind of, you know, that whole idea of grammar. So, but, but one point I want to make is that there are two aspects to it, and I probably should have made this a bit clear. The first aspect is we want it to be Jason. So for folks who are familiar with Jason, many of it will start with the curly brace. You can start with the square bracket, etc. But you want to say it's got to start with white space, curly brace, square bracket. You want it so llms Are you can go off pieces are saying, here is your LLM because that H is immediately not in the options that's given. So that's kind of like a format restriction, but you can also put in content restrictions. I actually did implement the restaurant menu at very super simple with like three things on the menu in Jason schema, in order to get those numbers. The way that you do that is so adjacent schema has something called enumerate enumerations. So you can literally say, these are the things on the menu. How much is croissant, the ham neck on the etc, and and you can say that, and then not only will it get the structure of the JSON right, but the things that it puts between the quotation marks and the JSON will now conform to those restrictions that you put into the schema.
So in that example, the menu is the positive list, and everything not in the menu is
knocked out exactly.
Okay, everyone, that's a wrap. Please join me in thanking our panel. Applause.