No, I'm just going to enable social media be testing 123
Okay, perfect. Okay everybody, please grab a seat. We're gonna start the meeting everybody could please grab a seat. We will start the meeting voluntarily
so by the way
everybody Welcome. Welcome to the Rocky Mountain AI interest. Can you hear me? Thank you so much. We appreciate it. Okay, can you guys hear me okay? All right. Welcome everyone. Glad you can make it. We will be handing out snow shovels at the end of the meeting. Because the storm is coming. So thank you for coming. I heard that CU has canceled classes tomorrow and Boulder Valley School District has canceled classes tomorrow with the impending storm. So we are over 1300 members and we hit the one year birthday this month. So give yourselves a hand
on Sunday, nine of us intrepid explorers went up to Brainard lake. So this was an AI ski and snowshoe de with Daniel, Greg, Aaron, Cynthia, Susan, Ben, Bob, Jacob and myself. How many people are here that were on that trip? Raise your hands. Hi. All right. Awesome. Sorry, that was the last day of the year but we might be doing a hike when the when the weather gets better is a gorgeous day up there. And I just want to tell you, we didn't talk about AI the whole time. Okay. There was a short discussion about where we should stop for lunch. So that was the non AI part of the of the event. Okay, a couple of reminders housekeeping. We have two exits in the front, one in the back. If you need to leave early, it's fine. We're going to be doing q&a. And we are recording the talk. So please use the mic. If you have a question. Raise your hand we'll run a mic to you. Bathrooms are down the hall. If you brought stuff in please carry it out. And if anybody wants an escort to their car after the meeting, just see any of our board members we're happy to help you with that. Okay, we're gonna start with a few goodbyes. We have some departing board members. May was a student here at CU she served on the board and ended up transferring to a different school so we're saying goodbye to May and then Laura rich started to different of our special interest subgroups. She launched the AI marketing Forum, which is now run by Jason, where's Jason right over here. And she also launched the women in AI that is now run by Susan Adams. is Susan here are Susan right there. Good. Awesome. So we're saying goodbye to me and Laura. Secondly, Andrew spot started our AI engineering subgroup, which is now run by Bill McIntyre. Where's bill? Bill's right here so he's running AI engineering. Andrew has handed off the reins and he'll probably come back after a short break. And then Shawn fimea Shawn here. Sean's coming from Golden so maybe he's still on his way. Shawn just graduated with a MSc in quantum computing. From mines. He was co running our AI book club. And the AI book club is run. Is is now run by Diane Sieber from CU and then and Phil Nugent. Where's Phil? Phil is right here programming. So Shawn just took a position. He finished his master's and he took a position at Los Alamos National Labs, subpostmasters position in their quantum algorithms. Group. I won't tell you his new nickname. Um, but I do want to say that he won the Oscar for the most convincing portrayal of someone who's very smart. So okay, and then finally, our AI in product group is has decided to break off from AirMech and become its own group. And this is sort of a complicated process called our Brexit. That involves Yeah, it involves a referendum and many muscles campaign campaign. No, actually. They chose to leave it's run by Lisa Cabrera who doesn't often meeting so you know, we wish him all the basketball still coordinate with those guys. And I encourage you to look up her group if you're involved in the product side of things. Okay, let me give a shout out to somebody our mag board members. Richard Gam is my co founder, Grace Wilson runs our meetings and tech. Ana is here. was right there. Shawn couldn't be with us and then all the subgroup leaders, so please join me in giving all our volunteers
Okay, so we have a bunch of these special interest subgroups if you really want to go deep on a specific topic. We have a number of ways to do that with these groups and this group meets once a month, but the subgroups are meeting every single week and we have a calendar will show you a link to our link tree page which has a link to the calendar. So I just want to walk through this and see if any of the subgroup leaders want to make an announcement. I'll start with Jason, do you want to say a few words Jason I'll give you this mic.
Yeah, so we've actually been meeting once a month as well instead of every week because we've been heads down and creating an online community that features courses and resources and experts and things that you would typically pay for. And no, Dan was generous enough to allow me to come alongside Laura and launched the marketing Forum, which is a business and also the subgroup of this group. And in exchange for that what I promised Dan was that anybody who's part of our mag would have free access to this community. So even if you're not a marketer, you could find this actually pretty interesting. And so I'm launching it tomorrow, tomorrow morning. I expect there will be cancellations at the meetup. But if you're really brave, come on over. There. It will be hybrid, actually, tomorrow, it won't be. So just because I'm not prepared for it. But I invite you to connect with the marketing forum on the slack group, obviously. And if you're interested in the community, knowing that you're probably not going to schlep yourself at 8am Tomorrow, feel free to get a hold of me. I'm Jason. I'm marketing forum.com Cool.
All right. Thanks. So
yeah, founder Central 2000 2000 Central Avenue. Sweater Yeah, by sweater ventures. That's right. There you go. Yeah. Gotcha.
And by the way, when I say the subgroups meet every week, I don't mean every subgroup meets every week. All I mean is in a given week, there's generally one or more subgroup meeting, some of them would meet once a month, some of them meet twice a month, etc. Chris is not here tonight. He runs the legal AI subgroup. Phil as I mentioned, co runs the book club with Diane Sieber. And so you know, get get plugged in there if you're interested. Bill runs engineering. Bill. Did you want to say something? Sure. Thank you, Dan.
I'm Tuesday morning at 8:30am and I'm going to put this up in the slack and hopefully get it on the calendar. We're going to have an engineering session on a demonstration of LM QL language model very language and sort of practice. I think ways of packaging, different kinds of queries and examples running it locally. The last couple of meetings we've had really been focused on what can you run on your own hardware, and we did some things with Obama. So I think he's kind of pursuing that a little bit. Again, the slack will be on the link tree and I'll post but it's also at sweater sweater ventures Tuesday at 830 in the morning. I'll bring some coffee.
Awesome. Thank you, Bill. And thanks for taking over for Andrew. Matt runs the entrepreneurship group. Robert and can run ethics safety and AI I don't think they have any announcements tonight. And then Susan runs the women in AI, which is having an upcoming meeting. Here's the mic so you can announce it.
Thanks so much, everybody. It's gonna be the first Tuesday of every month the next one is April 2 and we are going to be in the engineering centers AI Lab. Thanks to Anna Anna who's on the board and also Tom yay, who's given us that wonderful space for us to use here on the campus but it is in the Discovery Learning Center building and I now have the correct address. The door will be locked but we're going to have volunteers at the door ready for you to go in. But you can text me if you're late. We're gonna be talking about mental health, societal impacts ethics. And we'll also do a speed networking activity at the beginning of each meeting. So please come join us and let your friends know.
That's awesome. Thank you Susan and thanks for running the group. Next grace from our board is going to announce a project and she's going to ask for your help on this.
Give your thoughts on AI you all have them you're here. I'm doing five to 20 minute interviews either zoom or in person whatever you prefer. All perspectives are welcome and the results will be anonymized and basically making a book trying to explore the speeding train that is running at us right now. No technical expertise is needed. So help out a grad student and an arm Egg Board member. There's a little form it's pretty brief. And I would be so so grateful if I can get just like five people be awesome. Thanks. Cool.
Awesome. Thank you. Great. We also posted that to the slack as well. So if you didn't get a chance to catch it here, reach out to me or check the slack. Thanks, Grace. This is the link tree. So this QR code points to one page that has all of the links for our calendar, the meetup page the subgroups. You can order a cool t shirt. You know all the different resources for the group. So that's something that you can check out if you're interested in plugging in. Okay, I just want to give a quick thanks to some of our supporters. These are the people that kind of make the thing keep running. They're volunteering, they're spending their time. They're creating this, you know, letting us meet at this beautiful space. So let's give a hand to some of our supporters. Out there
okay, what would a meeting be without free pizza? So I wanted to thank free play. Free Play is they they have a observability platform that helps software development teams. And did you want to say anything or Eric? Okay. Yeah, so en will be speaking tonight. But this is very cool, and it's great for Thank you very much for sponsoring our pizza tonight. Upcoming Meetings. Next month we're going to do a member AI showcase so this is five minute demos. And you get to show the cool stuff that you're developing with AI. And I don't know how many demos we're gonna have and we don't really know what the selection process is yet. But by show of hands, is there anyone here who thinks they might want to submit a project to do a five minute demo next month? All right, good. That's a that's a doable number. That's that's reasonable. Following month we're probably going to be doing a sort of a meet up event at this SW to con and that's going to be at Omni interlocken in Broomfield and details coming on that June we're looking at how Gen AI is transforming companies pick a break and in July and then we have some some fall topics on top. Okay, so we usually open it up now for announcements from the audience and I'll often start Is there anyone that's an AI job seeker? Anyone here this looking for? Work? Okay, so if you're hiring look for these hands up and talk to him afterwards. Is anybody hiring? Ai talent right now? Does anyone have any job openings and would you like to talk about it? Okay, Carolyn, our speaker is going to from Google.
Yeah. And you'll hear from me in a moment. But I don't know if all of these are posted yet. But we're going to have a few AI engineer job openings on my team. So we help Cloud customers implement AI systems. And David who's going to be presenting with me momentarily is in that role right now so he can tell you a little bit more about what's involved. But I will definitely post more information once those are live. They're not live yet but should be up there soon.
Awesome. Thank you so much. Anyone else hiring? Oh, here we go.
Hello. I'm actually not hiring for myself. I'm advising for a startup that is funded and looking for AI engineer. They're doing they're applying AI for a landscaping imaging get ideas of like landscaping of yard. Super cool product looking for AI engineer. If you're interested. Come talk to me.
Cool. Thank you. We also have a job channel in our Slack. So if you want to post it there or if you send it to me, I will get it out to the wider group. Anyone else hiring, hiring AI positions. Last call on that, okay. Just gonna close this door
that's okay. Okay, so what about other announcements? Other events? I know Ryan had something he'll start out with Ryan. Hello, hello.
Boulder. Startup Week is right around the corner. And yeah, it's gonna be fun this year. There's a boulder Startup Week hackathon that some of us are working to make a thing. So if you are interested in participating, we got a lot of people who've organized pacap past hackathons, some of the folks that were involved in the Rocky Mountain AI one, if we're debating putting it into two parts, one, maybe three day and one mini ones if you just want to drop your GPT that you already built and showcase that there'll be a spot to do that. So that's going to be the week of May 13. I forget the exact dates, but will probably be Tuesday to Thursday. So if you want to get involved with that sponsoring, helping organize just telling me how to run a hackathon. Feel free. Ryan at threshold labs.io or Ryan St. Pierre on Slack. Thank you guys so much. Oh, and if you want to sponsor loves sponsors.
Cool. Thank you. Are there other announcements? You know, other groups meetings hackathons? Is anyone from the crowd want to make an announcement to the whole group here? Going once, okay, so let's move on. To Let's race. That's good. So, I usually do a few questions to the audience. So the speakers sort of know who's here. Okay, how many people this is your first Armagh meeting that you've ever been to? That's amazing. That's like half the people I don't know. If that's good news or bad news. You know? How many how many have attended before? Okay, that looks like more than half. How many people like that pizza selection? Okay, that's pretty good. All right. How about like, where are you from? How many people from Boulder came from from Boulder? How about Denver? Wow, that's amazing. Route 36. Sort of between Denver boulder. points north. Fort Collins. South. Okay, who thinks that the furthest they drove the furthest to be here Okay, where did you first said okay. Fort Collins. Okay. Anyone drive farther than Fort Collins? So we all agree. What is your name? Troy. Okay. Troy, you win. A a special prize, which is an armored mug. extremely limited. You know, limited quantity. Thank you for making the drive. And we have a cot and a sleeping bag for you are in the lobby. Okay, um, who, who currently uses Google Gemini. Okay, and who subscribes to the Gemini advance that's the paid system. My hand is up for that. I'm a subscriber. So that's $20 a month the first two months are free. It's their one Dotto Ultra model their most most sophisticated model. Does anyone have experience with Google's generative AI foundation models or similar technologies has an okay couple couple here. Does anyone work on improving developer efficiency through AI or other means? Okay, quite a few there. Has anyone worked on integrating generative AI into product development? Water hands there. Does anyone here have experience in machine learning deep learning or data science? Yeah, a lot of people. Is anyone using AI tools to write code currently? Tons. Yeah. Has anyone seen Devon AI, which was just announced the world's first fully autonomous AI software engineer. Yeah, I know that's just going on. But by the way, this Devin AI might be good at writing software, but I hear he's terrible at making small talk around the watercooler. So that's that's kind of the trade off. And what about the robot from open AI? Did that just drop today? Did you guys see that? That's looks crazy. Yeah. Has anyone is anyone worked on using large language models in their projects? Like using them right now? Is anyone building an LLM into their product? Currently? Okay, probably 1012 people and has anyone used an LLM observability platform to help evaluate LLM in their product. Okay, couple hands and someone that's some of that's already creating it. Okay, I'm going to introduce our speakers. Let me just switch. Switch microphones for this. And I will hand off the lav mic. To David.
Okay, so, Carolyn and David, both from Google are going to co present. So let me just introduce the speakers and then we will we'll start that part of the presentation. Carolyn is director of AI services at Google. She leads an organization of AI consultants and engineers as she mentioned earlier in the cloud Consulting Group, and she helps us customers adopt AI and enterprise. She has held various positions at Google since 2010 when she joined in the ml and ml engineering manager, AI consultant, fiber Learning Lab lab lead and global training lead. David is a data scientist at Google specializing in machine learning, deep learning and software development. He has over 10 years of experience in system architecture, IP core networks and cloud solutions and he has a passion for continuous learning and delivering accurate data driven results to enhance company decisions. Please join me in welcoming Carolyn and David.
All right, thank you. We're going to get our slides set up here and get ready. We've got three demos for you tonight. So hopefully, that'll be something that's interesting for you. And we're going to focus really on developer efficiency. So there's a lot of different products and solutions that Google offers. But we wanted to start by focusing on that because we know we've got so many people here who code in their work every day.
And we're just getting the slides up so bear with me. No all good. And again, thanks for joining us tonight. We're going to start out with some intros as well. So you can learn a little bit more about David and I and how we kind of got to Google and what we're doing today, but we'll give you a bit of an overview of some of the options that we have for developer efficiency and then again, we'll go through three demos. We'll talk a little bit more about some other potential uses of Gen AI as well beyond just developer efficiency. And then we'll wrap up for the evening. So I know we've got another great speaker following us that also covers some of the developer and product development lifecycle so I think you got a nice lineup for tonight.
All right. And I think we're in business. Alright, so as I mentioned, we're going to start with just some brief intros. You heard a little bit about us, but we'll tell you a little bit more fun information, kind of how we got to where we are in our careers really briefly, and then we'll give you a bit of an overview about some of our solutions and products as it relates to developer efficiency. And then David's going to do three demos. So you can see listed here cogeneration code summarization. And Gemma on your device. Who knows what Gemma is? Oh, good. Lots of hands. I don't have any prizes today. I'm sorry. Next time. I promise I'll bring some some swag. And then we'll go for a few other use cases and have time for q&a. We'll also pause for questions after each of the demos briefly, just so if there's something that's really interesting to you or something that calls out to you can ask questions about that, too. All right. So next slide. So again, we already heard a brief intro on us. But I'm Carolyn, and this is David. So I think something that's helpful is to talk a little bit about kind of how we got to our current roles at Google. So I've been at Google for a long time did not have the intention of doing this job. When I started it didn't exist. So I've always kind of followed what is interesting and what I'm curious about. And so just took opportunities to learn a lot along the way. Kind of did a lot of self taught learning is it related to AI and ML, and then was able to just kind of jump in, in the hype cycle of ML and AI when it was really ramping up around 2016 and have kind of never looked back. It's been quite a ride. So, David, if you want to share a little bit more about how you got to your current role. Sure.
is working. Yep. Okay. Yeah. So, first of all, like, thank you for this opportunity. For me, it's like it's really good to be here. I'm super excited. like seven years ago, I just came to here two years and I was sitting different jobs. I was assistant to multiple of them just because I didn't have anything else to do. My My wife was working and my daughter was in the school. So yeah, so I think that this is a really good opportunity. I'm super happy to be here. Now presenting has been like a part of a really good journey during this last year. I joined Google just last year, so I'm pretty new. But anyways, I've been doing a lot of like startups things, crazy things about AI ml and other kind. All in between. So yeah, that's super happy to be here now.
Yeah. And, again, I know we mentioned we have some openings. Again, once those are live, we'll make sure we promote them to the group as well, but I'm doing the work that David does. So you know, we work across different industries. He's working with an online retailer right now and more of a biomedical research company. I'll doing some cool stuff with Jenny I TPUs, GPUs all sorts of cool stuff. So if that sounds interesting to you, again, David can tell you more about his experience as well and we'll have those job postings up once their life but hopefully that's helpful in terms of kind of how we got here today, and why we're excited to present more to you. So again, we'll start out with a few just overview slides around developer efficiency and Gen AI. And then we'll jump into those demos. So again, when it comes to developer efficiency, there's really a lot of opportunity. We've already seen some pretty compelling results when we look at the impact that Jenny I can have when it comes to a lot of the tasks and the activities that you might be doing your day to day. So we've actually seen 20 to 45% improvements in productivity and we also find that based on the adoption that we're seeing, I think within the next three years, they're estimating about 70% of all developers will be using Gen AI. So again, another great tool in your toolbox. I'm not sure if he will be replaced by some of these AI developers that are being launched. But I see this as a way of really enhancing your productivity and your team's productivity as well. We'll talk a little bit more about that. But you can see some of the common tasks where we think Jen AI can really play a role are listed there. code generation code completion, code documentation, code refactoring, code explanation and code testing. So again, these are all really core to develop a role, a software engineering role. And these are all things that Jen AI can really help you do more effectively. You can also see there's also a business impact. So again, if you're not a developer yourself, but maybe you manage developers or you have a company where you've got developers that are working, there's also a big business impact of this as well. So you can see some of the kind of levers within your organization that this can affect scale. Obviously, there's more that you can do cost, experience and revenue. So I'm not going to read the slide to you but again, lots of really interesting ways that Jenny I can kind of fuel your business too. And this next slide goes into some more information as well on kind of some of the other outcomes and some of the other ways that we really see Gennai benefiting developers, obviously there's risks too, but I think it's also important not to lose sight of some of the positive benefits that we're really trying to bring when it comes to Gen AI. So you can see here capabilities, we've talked a lot about that already in terms of how we can help you with things like writing code faster, or improving the quality of your documentation. But we also hope that this will help with things like culture, right, because you'll be able to distribute work maybe more efficiently, more effectively, you'll be able to use your talents to do what you do best. We also hope that it brings users to the focus of what you're doing. Again, hopefully it'll free up more time so you can really understand those user journeys in more depth and how you might be able to integrate that into your development workflow. And again, hopefully if you adopt these tools, you'll have greater job security. Again, I think it's a really great competitive advantage to know how to integrate some of these things into your workflow today. And then beyond that, you can see some of the impacts to software delivery performance as well as organizational performance. And again, ultimately, we hope this will also help your well being there's a lot of tedious tasks that I really think Jenny I can help with whether you're a developer or you know someone who's just churning out some emails in the corporate world. So hopefully this will be ways again, that you can spend more time on things that are important, that are part of your kind of core capabilities and your core talents. Alright, so again, I think we, this one's kind of repetitive, so I won't go through this one in detail. But one really interesting stat that I like on this side that I think it's important to highlight is teams with faster code reviews have 50%, higher software delivery performance. So again, that's just one example of one area where Gen AI can really help you to be more efficient, not only as a developer, but also as a team member of developers. So again, these are some of the things that we're hoping we can help make better as we're looking at how you develop in some of the tools and tasks that you do. And then I know a big part of these meetups, Seltzer and education. So Google has released some great books over the years and some thought leadership. This is not really Jen AI specific, but this is more around software development best practices. So I thought it was worth highlighting some of these. I don't know, has anybody read any of these books? Like the SRE book from O'Reilly? I think it's a really good one, if you haven't read that one. So again, just some more food for thought. A lot of these kind of best practices apply in the world of Jenai. But they're not new. Right? So we don't want to kind of forget some of those best practices around software delivery. All right. So with that, now we're gonna get into the fun stuff. So I'm gonna turn it over to David, who's going to take you through three demos. These relate to some of those key activities where we think Gen AI can really move the needle when it comes to developers. So I'll turn it over to you. Thank
you, Carolyn. Yep. So here I have like, three different demos. First, we're gonna go over this UI that is just a GUI that is interacting directly with the API's of Google for cogeneration LLM. So I already have some kind of pre loaded some some of the examples. So I'm going to click here. So for example, here we are saying, write an API endpoint to fetch the record from the query, or the given customer ID received in the request from the client side. And then return the response in JSON format. So then, we're going to just click here. This is connecting to the Google API's that are exposing the LLM for our generation. And here we can see like a pretty detail. I will say, endpoint where we are receiving like the customer ID and based on the customer IDs tuna acquiring on the query, and based on that also, this has been exposed in this endpoint here. So this is like of course, like some of the capabilities, but also we can see another of the capabilities that are also available is like to call we can like, have some kind of explanation about the code. So I'm going to just copy here, and I'm gonna go to this code explanation base here. And then I can get some kind of like, explain explanation in detail about the code and also check if there is like, some kind of errors. So first of all, yeah, that's an issue here like a fifth dimension that is exactly says something pretty similar to what we were requesting. And then there is like a detailed step by step like what is doing every part of this particular code. And at the end, there is like errors and unknowns. So in this in this case, is talking a lot about the you know, if the customer is not provided the request for the parameters API will return an error or we are not handling very good by the ballast of revision and insufficient permission for example, if they if they be correct client does not have necessary permission to access the specified table. So there is a lot of error handling that is not happening in this code. And this is like a good way also to type to address the different kinds of stuff that you can see. So, of course, I have another example. Here, for example, I just click there and I populate this. This is like the classical Dijkstra algorithm to get the shortest path in a row. So again, I'm going to ask to explain how this is working. Let's say they call implement Dijkstra algorithm to find the shortest path and also something that we can see here is that like the input to this for example, to this function is our is A is a matrix of six six by six. And then when we go here to the errors or announcing stuff, we can see here that the code assume that the bat is a six by six matrix. If the graph has a different side that four will need to be modified. So I'll provide you some insight about the code we are providing. So this is the first part of the demos. And now I want to jump into something a little bit more interesting because of course, you are not going to be developing and asking in another interface. So here we have like the typical Visual Studio code. And we have also implemented a dual AI that is like a plugin that can be attached to the Visual Studio code where you can also do all of these interactions while you are coding. And so when when we are working on a team on a team of different software engineers, something that we usually see a lot of order like different qualities of the code. And sometimes we get and we find something like this one. Of course this was not written by any of you here. But so we'll get a function that it say get something. Okay. So what it is that there is not so much clue about what this thing is doing, right. So I want to just try to get an idea of what this thing can be doing. So I'm going to click here and say okay, you can use play me about this code. And after some seconds if the internet allow me we should see Yeah. Yeah. So, here they say the cause depends. You provide some function that calculate the intersection over union any of here is Amelia with the IO U. 's. Yeah, yeah. So typical measurement of cold tubes on the boxes are overlap. So when you're evaluating, for example, object detection models, you need to know like, how how good your prediction of your bounding box is, between the ground truth and and what your model is baked in. So here, even when they this function doesn't have like, are really good comments, right, or I would name anyways the this is getting in some way. What is doing it right. So of course and you're in your team, okay, okay, this somebody played this function. Now, we know what that this thing is doing. We can say okay, but what about if we create some kind of like? Of course, there is no test cases. So let's try to generate some uniqueness. And again, here, we should see that the test cases can be generated automatically or automatically for you. And something interesting here also is like, it just not is looking into the inputs and outputs. It's really providing test cases that make sense. So for example, in this in this case, when you have one box, there are edge cases when one box, the box is here and the other here the intersection is zero. So that's like the first test case, no overlap. And then it's doing some when there is like complete overlap that you want. The boxes are perfect. So is expecting a value of one. So that's good. And also there is also the it this test case where there is not a whole overlap. So it's a partial overlap, and also starting to figure it out. What is the develop. So this can be something like super powerful, because there are like, there. There have been like different trends of people that say, oh, yeah, you should first write your unit test and then call but I don't do that. I don't know how to do that. But usually it's the other direction. You just write some code and then okay, I need to write the test case on at some day of the week in the spring serious. Start working in your unit test. So this can make you like. And of course this is not perfect at all. So it's not perfect at all. So here maybe everything looks perfect, but the partial overlap between one and the others. And one in this case is not 0.25 to be like 0.12 for whatever are the number but it gives you like a good baseline and a good starting point. And I realized that it was not perfectly good. But then then I you can also start here asking I think that the overlap case is not correct. And you can start interacting and and get into the also to get a good result. So this is like the first two demos and I have like the bonus track that is the gym apart. So
Western pause for some questions. And you know, we did show you kind of like a custom UI that was built using Google Cloud vertex. So and then this is a little bit more of a consumer version. But most of the work we do uses the live version, but I'm gonna walk around and get questions with this mic so we can hear on the recording.
Yes. Hi. Thank you for the demo. It's really interesting. I use some of the code generating GPS for that I do at work actually a different brand. And what I found is there's some tends to be sometimes some lack of contextual awareness or situational awareness with the GBT. So for example, if you're working in developing a class and you've defined the class variables already and have some methods already defined, if you try and feed it text it say, Please generate a method using class variables are so forth, that sometimes doesn't have the awareness to actually use the correct variables that are already defined in the code or from methods that are imported from other API libraries and so forth. How does this work on tackling that challenge
at all? Yeah, so So the typical problem about data about it has been it is because of the context window that the LLM model have. So in the, at the beginning, the first version have like our limited content. I don't I don't know about the chargeability but probably was 16k tokens or something like that. And at that thing has been expanding. In our case also we have for the code generation, there are the current one that we have, like API's that we have available, the maximum that you can put there is like 32,000 tokens of all of context. So that may make you 32,000 tokens can be like 1000s of lines of code. So it's it is supported now you're going to be able to provide like more context and and make these things work better. But there is also a caveat also, when you are using like when you're typing and you are getting like suggest to him about coding, that also sometime is different, because that use are usually given that you need like a better response time. They use like a smaller version of the model, that in even given the number of parameters that model is just a smaller modal, because it's just just just because of something that is about the delay or the response time that you need. You don't want to be waiting five seconds to get like a suggestion. Right? So that's why sometimes when you are typing and those kinds of things probably is using a lower version of the model. But when you are like asking directly to an API without the context, you can get better, better like context and better answers. Right.
So maybe just one quick follow up question. I'm sure other people have questions as well. But do you feel like in terms of like where the next advancement will be on this is that it has more token access? And the ability to actually see more contextually in place? Or do you feel like it's going to require kind of a different architecture to move it to the next level for cogeneration? The reason I say that is for example, if you do have a large complex method you want to code at some point you feel like you're writing more, you're describing the method so much in detail to try and get to generate the by the time you're done. You should have just written it yourself. Right? So the benefit is gone. Right? Yeah.
No, yeah. And you're right. I think that, that therefore that everybody is like trying to do is first to try to because there is some kind of race about who will like the biggest model, right, who have like the biggest number of parameters and put that to people. But something that we have realized during the last time is like you don't need the biggest model for all the tasks. So, so what we are trying to provide is different sizes of model different flavors that you can adapt to your different kinds of tasks that you are doing, because sometimes you're some of the advantages that you have today with an LLM that you can create LLM to do a classification of whatever you know, you can, you can put out a good problem and you can have a classifier and you can build a classifier without any kind of data, because he just kept inheritance. It kept that knowledge by itself, right. So for some applications, a big model Yeah, can be super, super, super good. But some applications that just need something that is not so big. And that's it that the good thing is about to have the right size for the right device for the right environment and the product that you're trying to build.
Yeah, and just to build off of that, too, so there's always trade offs right if you're using a large model, you know, as he mentioned, sometimes the performance and the response time might be longer if you're using a large model. So this is something you know, again, if you dig into the Google Cloud options, there's a lot more opportunities to select different model types and different model sizes and to customize. So a lot of the consumer products don't let you do that if you want to kind of fiddle around under the hood a bit more. I think what's nice about the cloud options is it gives you a pretty wide selection and we host first party but also third party models too. So you'll find a lot of different options. And again, I think you can play around with that. Make sure you're finding the right model for the job. Let's take at least one more question before we jump into our next demo. So
can I get an extension for cloud code today? In Visual Studio,
it'd be as David, you you you you have the extensions but this needs to be much you need to login into your GCP account in order to make it in order to enable it. Yeah, but that's the extension is available is the name of our you install the cloud coders extension. And then you need to log in with your GCP credentials of the or the cloud because this is using your prior
Yeah. And what's your context window? Is it one file or is it everything you have open?
It you can be humbling the contrast window for example, by highlighting something but if you can just type here a question and then the contact window is all the stuff that you have open. Yeah.
Okay, so I think on the recording, you might not have heard all the questions, but we were just getting a question about this extension and how to get access. So as well as the context window. So in the interest of time, we're gonna do this this last minute demo, and then offer some time for a few questions as well. Again, there's obviously a lot we could cover. So you know, definitely feel free to connect with us on LinkedIn as well and we can do some follow up but hopefully this you know gets the wheels turning I know all of you are very smart folks probably see a lot of opportunities to apply this in your job. But I want to make sure we have time for this demo demo Gemma demo, because this is one of our newer releases. So David over to you.
Yeah, so the Jeremiah is one of our new releases. I assume that was just one or two weeks ago. And when we released like this open source LLM that you can use, and we released this in two different sizes as seven video parameters more than 2 billion parameters. So again, someone that can be used as in different kinds of devices. So what what I'm gonna show here is that you can run this model, even the seven video parameters model in your like this laptop, this does not require any connectivity that is not internet. Everything is happening here locally in your laptop. And the way that we can run these 7 billion parameters model or in this case, using the GPU of the MacBook is by doing our quantization of the models. So lower in the position to in this case 2444 by four bit so and so then I want to run here First I need to hear and make sure that I have access MPLS is the GPU of the MacBook here so I have access to this. Of course when you run this run this the first time it needs to download the model from from internet, but you can of course have that model for you and you don't need to download it anymore. So then here is I'm loading and this for this particular case, are seven, 7 million parameters, the JAMA version and the quantized version of this model. And that was quantized specifically for this GPU of this Marco you can do the same for other kinds of NVIDIA GPUs. And and so there are like quantized version also in hugging face. So, after we load the model, again, we can ask a similar question. Just to follow in the topic about generating software. I am writing here and this is all happening just locally. So it's creating in this case is creating a polygon class with metadata and get the area the perimeter and also drawing the polygon. So you can see here that is aiding you can also specify like almost the maximal amount of tokens and the temperature of the model to help co create co creative Do you want that is that the answer of the model is going to be so here in this case you can see that is providing that information. And again, this is just all running locally. And but of course the context of this model before in the previous models that we were interacting there were models that were trained specifically for calling but in this case, this is more like general LLM mode. So can be used for multiple tasks as for example here for calling but here for example, we can just add a chain the question let's say that I'm gonna do a travel to London for three days and give me some kind of feat in Harare. And please consider that I'm fan of football, of course. And by football I mean the real football
let's see. Here. They one morning we see that our that our of London that will reach and I think it knows because here's the catch up Premier League match at the iconic Wembley Stadium so that's really really really cool. So, so yeah, again. We this is something we're also for, I think that is super useful for for us to when or when you're doing research or when you don't want to spend money shooting API's to get the same because you can just run this in your machine and you are not paying anything. You can do all your experiment here with this size of models. And later yeah, if you decide that the performance is not going so good on you and move forward, then you can move to some kind of payment, right? But this is like something that I think is gonna help a lot and this GMM model in their version also is superior to the other competitors. Models, by monitors. So yeah, I think that this is, this is the kind of thing that I liked because, yeah, I remember when I was studying the past, you know, you don't have the money to spend in stuff. And then you can do this in your laptop.
So I think this is super powerful. Yeah, who thought the cloud people would teach you how to run the models outside of the cloud. But we actually we have a lot of flexibility in the options we offer. So hope that was helpful. Again, we showed you kind of like a custom UI created through cloud, also the duet version, you know, which is more consumer facing and then even this open source local option, which I think is pretty cool. So lots of options for developer efficiency. I know we've got about two minutes left. So I think we can take just a few more quick questions on the demos. And again, we're happy to talk to you more. I'm just going to bring up one last slide as well. I'm going to skip ahead a little bit because I know not everyone here is in an area where they're coding all the time. So we did a deep dive into developer productivity today, but you can see across an organization and these different functions. There's a lot of different ways you can apply Gennai so again, take a look at this slide. We did focus primarily on developer productivity, but you can see a lot of other areas where we see a lot of promise is marketing, digital commerce, customer service, as well as streamlining those internal processes. So I think we've got time for one or two questions. I know there's also usually time for questions at the end. But with the impending weather, we may not want to stick around too long. But does anyone have another quick question? And again, we're happy to connect and keep talking so firsthand, I saw was over here. Let me come over to you with the mic.
So I'm going to address the elephant in the room. There's been rumors, not what I'm faking, but there's been rumors that chat GPT roll replaced Google. And not that I think that but other people have said that. And I just wanted to see what your your answer would be for that.
Yeah, so I can offer an opinion. I wish I could give you a better answer. I'm not usually very good at predicting the future. So I'll just say that first of all, but you know, I think chat GPT is really powerful. It's been amazing thing to see how much adoption it's had worldwide with consumers. If it will replace Google, Google does a lot of things. So if you're talking about search, I think a lot of people are using it for some things that they used to search for. But I think there are still differences and reasons why some people might prefer a search interface versus chat to PT in its current format. So I think over time, we'll just see better integration of chat GPT like technology into search options. So I don't know if it will replace Google. I think it's exciting because I think it's driving a lot of technological advancements. But yes, I'm not sure. We'll see. Only time will tell. I actually don't really work on search. I've never worked on search, so I'm probably not the best person to answer that question. But yeah, I think it's it's definitely been amazing to see how quickly the world has adopted it. Your thoughts, David?
Yeah, so that's like, I think it's a really good question. And and like in general, like what we're seeing, okay, first of all, I will say like, the technology that everybody is using today, to do LLM is transformers. And the transformer were created by Google engineers. That's the first thing. Okay, and the other thing, yeah, Google is some kind of elephant right? So we were moving probably too slow. And this is just my personal opinion. And probably we were moving too slow in some aspects. And when new companies like open AI or whatever companies came with this, like great idea or or they are tend to put these technologies to the public in a in a faster way, but this is also making it nice to change. And I think that's something good and so it's gonna depend on how we can react to that. And that's what we're trying to do. We have been maybe now everybody can see that. We will have been a little reactive in some aspect, but also we are trying to do the signal in a good way, you know, a follow we know principles and all that kind of stuff. So I seen that. Yeah, we we don't know what's going to happen, but I think that we were anyway, we are in a pretty good spot to fight. Yeah.
Good question. All right. Are we out of time? Yeah. Okay, we're gonna move on to the next presenter, but again, happy to stay connected, and talk more. So thank you for your time.
Thank you. I'm going to introduce our next speaker. David, I'll ask you to hand off the lav mic to en and then we'll get Ian's laptop connected. So in Karen's is CEO and co founder at free play. And he's going to present a talk called Intro to MLMs for builders, challenges and opportunities of using Gen AI in your products. In his free play is an AI infrastructure startup based in Boulder. They build experimentation, testing and monitoring tools that help product development teams make use of generative AI in their products. He spent most of his career in product management for developer products. Including as a pm at the boulder startup Knapp. And as head of product for the Twitter developer program. He's also a University of Colorado graduate please join me in welcoming in cans
about oh, look at that fancy, cool. Course my mouse is like gone away. I don't quite know what's going on here. Try one thing real quick. Someday, we will all learn to use Zoom efficiently. But until that time I'm just making sure that I can switch between screens and like try to make sure I'm using the right mode for doing that.
Yeah, did that so that's working
am on
Alright, well, we're just gonna wing this and you know I won't do anything that embarrassing I hope. But you know, always dangerous to do a live demo. Cool. Thank you grace. Resume share
you really know you're living on the edge and you just present your whole screen. Cool. All right. So hi. I mean, Dan introduced me I don't have a lot else to say. Other than just it's cool to be here. I have known what Dan and crew are doing. We actually see Dan a lot and a bunch of you too. We run an AI builders meetup in Boulder that is kind of like the good close cousin of this and we'd love to get to support the boulder tech scene. Spend a lot of fun what's happening in Boulder in the last year. It just feels like there's this new life in the tech community. So cool to meet all of you that I haven't yet. Like Dan said, I've been doing developer products for a long time. Been around boulder since 2006. Doing tech companies. So it's like it's fun to see this new era. And we bought pizza tonight. So it's like cool, cool to start you off. But so yeah, I mean agenda wise, like we have this side thread. You're actually gonna see it in a minute as part of the demo, but we're just talking about like, this group's a real mix of people. I think a lot of hands went up when people were talking about who's an engineer in the room or who's building a product. So maybe, like those of you in the room in that category, you're gonna be like, Oh, this could have been more technical. And I bet there's people in the room that were like that last thing was way more technical. And this is like immediately technical. So we'll see if we hit a good note. But, you know, like Dan said, like wanting to talk about just like this intro to building with MLMs in a product and what it looks like to build a good product experience. And for people that are in this space, building software thinking about getting in this space, because I think maybe when the question was asked if he was like building today, not as many hands went up, maybe what might you run into? And if you're not building software, maybe it's just interesting, and you're curious, but this is the stuff you're going to be using soon. So I wanted to just start with like a question and would love like anybody through three people that are willing, who's used software product that you like use on a daily basis otherwise, so like, not Gemma running locally on your laptop, but like, you know, something you just like? logged into an app that you normally use? And there's been a great AI experience that you thought was really cool. And will you like, be tell us what it was and like yeah, what made it great?
For recording, yes. Then they're using. They're using AI to summarize call, to interpret sentiment and to interpret, and five willingness objections. Something that used to take us quite a while to do
Yeah, and that's, there's a recording happening. So like, what was your name? Holly Holly mentioned Gong, which is like, been around for a while doing cool AI stuff before Transformers were cool. And like now I think they probably are using transformers and doing better call summarization, and then helping Mike detect intent out of it. Who's Who else has a good one. I saw your hand up in the back. Oh, sorry, Dan. I like I threw this on you. I didn't tell you I was gonna ask people to say things. No worries.
So I actually just saw this right before I came here. My wife runs a short term property management company. And she was just showing me on VRBO that and now automates property descriptions, which I hadn't seen before. Yeah, so that was pretty cool. Like,
basically listing your property. Yeah, she was. So she was like she was just
starting to list the property and just put in some descriptive words and it just automated that and she edited it of course, because it's just kind of it seemed pretty wonky. Like the description it provided. But I thought that was pretty.
Zillow does something a lot like that two people have seen so there's like one more over here somewhere. Somebody had a good one or if they've given up someone else, Ryan Ryan raycast doing really? Yeah, tell us what raycast is. Yeah, no.
raycast is a spotlight replacement if you use a Mac so or like when you pull down on Siri and you gives you quick actions you can think about that way. So anytime you hit Command space, it just makes a good guess what you might want to do and it just makes whether it's deep links into Spotify to be like, you probably want to change your song right now or queuing up confetti to after a build is done or like it just chaining a ton of workflow things together, and getting ready to do a lot more deep links into apps and getting apps to play together that weren't done more. They're doing integrations with chat GBT. And they're also doing some just good, I think classical suggestion work so it's a nice mixture. There. You don't have to pay but if you do, it's good. That's awesome.
All right. So like those are good examples. Those are the kinds of things that I want to talk a little bit about. A little bit more about here. This is my like, what happens next when I demo my full screen? We'll see but I wanted to show you all some stuff that we have been impressed by because we also we get this question a lot. People are like, what's good out there. So first one I'm going to show you is loom there's a video I share with myself in Slack, testing and evaluating rag systems. With free play, demonstrate how to build valuations with prompts and free play and I'm using the arc browser which you're gonna get a double demo. Oh, come on. Yeah. All right.
Why is it doing all right. I was showing you my slack. I'm like, I don't know what else is in there. But loom slack. This is how people most people get loom. If you don't know what loom is. It's a way to record a quick video and share with people. So a whole new way to communicate. Look at this nice title here for me and this description that's gonna matter in a second on a open arc. Which also has like some bonus cool AI features for us. But you know what's going on here, like the loom team, basically have this problem, which is they've got a really cool format for sharing short videos and doing screen recordings. And like most of us don't like to record ourselves talking much less on video. So it's like a big barrier to entry. You've got to get over the fear of recording. People don't like editing stuff that takes too long. And so the looming AI team, I think you've done some of the like smartest integration with MLMs just inside of their normal product of anything that I've seen so far. It's hard because this one's already been recorded. But like first of all, this title was generated by AI read for the recording ended by taking the transcript through and that sounds really simple. Except it was their biggest growth driver in the history of the company for every single feature ship. And like why was that the titles of these videos used to all be generic timestamps and nobody knew what they were. And when they all get shared in Slack and you see some random video you know, it's about then why would you open it so just like doing that one little thing alone drove big growth for the company. There's also this thing in the sidebar where like I can respond to Jeremy. I don't really know what I think about giving, like fake authentic feedback to somebody but I can do that. And then down here you see like the summary Where did the title come from? You know, I can read about what this videos about their chapters. All this came out of just like reading the transcript using normal transcript of technology and then summarizing it and then as an editor, right. So you know, I use this a lot, I record videos, send them to my team, they go to my library, here's one that I can edit. There's a couple other like bonus features. So I just recorded this, the title is already made, but I get some of their stuff in the sidebar here. So you know, this is a message I can share with my team. It's written that for me, so I can just copy the link and go back to slack. And here's the loom about new features in free play. It lets me know that some chapters were added. It lets me know that 24 filler words were removed and a second of silence was taken out. Which is like an amazing feature if you use loom on a regular basis. I don't know how many times I've said or like in this conversation already, but at least 24 times and three minutes, the last time that I recorded something and then you just kind of get a good feel for like this stuff's happening and it's generated by loom AI. So if you're building products, like I think this is a good one to look at as an example of just like little simple things that you can do to make the experience better for a customer. And it's not like it just generated the whole video and an avatar for you and looks magical and like AI. It's just it's just a better user experience. One other thing that I was going to show people and then we can move on superhuman. Does anybody use superhuman for email? One, two, yeah, so superhuman is a little ridiculous because it's 30 bucks a month for email. And if you have a business where you can justify expensing, that it's like, I think highly worth it. I've never been addicted to an email app before. But here's a good example. Right? So like, what are they done with AI? There's this whole summarization thread at the top you saw I'm scrolling I was just doing just trying to look for the one example I wanted to show you. But here I can see like, Hey, what's going on back and forth here. Big thanks to grace and Dan for coordinating us all. And then you know, this feature I wanted to show you here if I go to reply, all right, I screwed it up because I started going back and forth with grace and Dan after the started. All right, probably with live demos. So what I wanted to do see is a bunch of little snippets down here. And this is a good opportunity for me to toggle back because I had to like just pull the ripcord option in case the Internet didn't work, and they're in my slides. So what did this look like before? There's just slides. So you see at the bottom it says like, do work for David. That's for Carolyn. From Carolyn. And then it says great news or need more details are understood, well prepare. These are things that I could have clicked in instantly center response, except it would have said Hey, Dan, thanks for the info. And that's just wrong, right. So this is also a good example like how things fail. One last example and move on to the rest. of the slides. They also give me a good feedback opportunity here. So there's this like, little thumbs down icon that I clicked. They've given me some categories, they're kind of all over the place, but one of them was replied to the wrong person. So you can tell these categories are actually coming from the problems that they see and regular basis. And then I can actually like tell them what I should have done to make this better. Pros and cons here, but like, I love that they do this. And then I also think anybody building in this space, we'll talk about this in a minute. You're going to want to build a feedback loop like this because stuff will go wrong. And that's one of the key differences of like building in this space. So you know what I think was cool about this, like, simpler, easy workflows for people just like normal flow of doing email or using loom. A little bit of delight that made it more fun for people. The viral promotion part, if you don't know this, like we pay five bucks a month extra that's 33% Extra for Luma AI and then pay for that for my whole team and it feels totally worth it. If you're building a business like there's a good upsell opportunity here. You want to promote it. There's definitely some poor quality answers and that happens and that's part of life like this is different from traditional software where you build it, test it, ship it and it just keeps doing that thing again until you stop, you know, until you change something. And the feedback loops important. And so I wanted to talk a little bit about like, what did they do to build these products like what are they doing today? Teams like them, you know for thinking about like, what's different building software with MLMs. Then like before, unless you're an ML engineer, and you've been working in this space for a while, but like everybody else has been building web apps and mobile apps and distributed systems for a long time. One is that like your data really matters. So you know, we just looked at the feedback loop. Like you need to know what an LMS is producing what it's sending to your customer. You need to be able to keep track of that in some kind of structured way that lets you actually build a learning loop and improve with it. And this is really different from if you think about, like most applications prior to now where you need to capture some data. People are engineers and using data dog if you're working in Web world news, you know, Google Analytics, it's like good signal. About how your application is doing. But it's not like a core part of making your application work in the future. marks here. Like if you want to be able to capture data and then do fine tuning later. Like you need to keep tabs on all this data that's being generated. Another thing in this space, it's I think, confounding to people when they're getting started. We talked about a bunch of different models. Google's amazing, there's this other company that builds a thing. Bunch of open source stuff out there, right, like you could just turn knobs for days. I think this is the next part of this where people kind of pull their hair out between prompt engineering, different models, different configuration for those models. If you're doing anything, like some people are gonna be like what like reg systems, you know, where you're retrieving data from another database and trying to pull it into a large language model that has a bunch of configuration around it that you need to optimize. And I think one of the things we see in this space is people get really frustrated, and eventually just kind of give up and they decide it's time to ship it and they move on, or they don't ship and we're lucky to work with a lot of people that already have shipped. But I think that's a part of this world to just be prepared for and think about how to address it and related to that, like you need to actually really think about and define what good means. And if we look back at like the loom summary, or the email generation where I was like saying something wrong to Dan, and categorization of like the problems that were happening there. If you're building traditional software processes kind of been the same for 15 years, like somebody writes a product spec and produces some mock ups and engineers build it and somebody tests it. And like I said before, if it works and you ship it, it probably won't break unless you do something wrong.
You know, whether something works or not in this space is very multi dimensional a lot of times and we talked to a lot of teams where you say like well, how are you going to decide if it's good quality output at the end? How do you know superhuman if that email is good, and they say, Well, it's kind of random. All right, and like that works when you're just getting started. But you've got a problem at some point. Once you have, you know, like they do millions of these flowing through the system. It's almost like you need a social scientist mindset. I'm sure there are people here who do this kind of work. You know, people who design great surveys, learn to ask really good questions and be really specific about what they're asking, is it addressing the right person? Is it a good tone? Is it a good format like these all become dimensions to evaluate? Related to that you've never done? So? model change, even if their owners say that they shouldn't I think we see that like publicly hosted models change behavior. In some cases, maybe you go deep into it, you know about the seed parameter for open AI or things like that. You can control it a little bit better, but like things can change in unexpected ways. Certainly, if you're building your own data pipelines, things can change. You need to keep an eye on what your systems are doing. new models come out. You learn new things from your customers. So there's this like need for a learning loop that is much different from traditional software. Again, like you're not opting to optimize something because there's a business opportunity sometimes you're forced to with MLMs because stuff just keeps changing. And then the thing that we have found, if you're working in a company environment, whether it's like a startup or big company, this is no longer just the domain of software engineers building software. And we see not just product managers and designers and QA people wanting to get involved in product engineering and data QA. We also see like, just people from all over the company who you know, I see customer support people who are saying I see an opportunity to fix that weird thing that it did with Dan and superhuman, like, give me give me the chance to do that. It's just English, right? Like, can I write a prompt that will actually fix that problem? And we see that happening a lot. It's been really interesting. Like, that's every paradigm shift and, you know, technology, like roles and responsibilities change and like this is really happening again in the sign of how big it is that like roles are changing. So what do you do about it if you're building I'm not here to hawk free play. But this is what we're doing about it. And I think some of the things we're doing help, right. So we've built this platform that is designed for software development teams, including the PMs and the designers and the QA people to use together with engineers to ship better products, and I'll walk through kind of like, how have we thought about this? Whether you use a tool like ours, or you know, do it yourself, you know, what are you doing here? So I talked about like, Hey, your data really matters. We need to do about that, like capturing the inputs and outputs that you send to a large language model and being able to make use of those becomes really valuable. And we see people doing this not just in production, but from early development days through staging and being able to really get a sense of like, what's my code doing? What's the model doing, knowing what's happening and being able to capture that as an asset so that later you can use it for not just QA but for testing for fine tuning. Other reasons. So this is one big part. You want to know, like, what's going on in your system? You especially want to know what's going on in production. And I think this is the thing we see with a lot of ml teams. There's a long history of like building eval datasets. And, you know, looking at the same couple 100, a couple 1000 examples for a model and just testing them over and over again. But if you're stuck on a couple 100 Or a couple 1000 examples, and you're running a million, something through your system, like who knows what's happening with your customers, especially if you're building a chatbot or something and they can just say anything to it. So being able to keep an eye on production is important. Another thing like I mentioned, you could just turn knobs for days what we have found here if if there's anyone in the room who has done this, like it'd be fun to raise your hand but so many people come to us and they say, Yeah, I keep track of prompt and model versions and all the different knobs that I'm turning in a Google spreadsheet where I have kind of like, you know, a row for every different configuration. And then I have a column for every different test case and I just kind of copy and paste stuff into the open AI playground and see what I like better at the end so we've built a system to help with this where basically, if you're familiar with feature flagging or like server side experimentation platforms, we let people push prompt and model changes from the server to your code. You can use an environment variable and call like a different version of a prompt or model. And do that with an SDK where no changes are needed in the code. So a product manager wants to change something. They don't need to ask a software engineer to update a string and do a deploy like it can just happen. So that's been a big part of helping out there and then we log from that last screen, all the data related to those versions, so people can really like measure and track change over time. The next part, like defining what good means. There's this phrase I mentioned before evals in this space are evaluations. Basically think about like, how do you look at the behavior and performance of a model and decide if it's good or not? In this space, people call that an evaluation. One of the things that's been interesting folks have worked in traditional NLP or ml for a long time. There's a bunch of like common practices there. That just aren't working in the same way with MLMs. Getting like really nerdy like a roof score or like a blue score. These things that are about semantic similarity, sometimes just don't matter at all when a model is producing a valid sentence for a human and there could be a totally different phrasing of that sentence for human. And the similarity of how much the words overlap doesn't matter. So what we see a lot of folks doing there are some metrics like that, that look at semantic similarity Levenshtein distance those help you still need to curate a ground truth and be able to compare to something. You know, we see a lot of folks just using MLMs. And this is what we're showing on the screen where people can use free play to configure basically prompts that study prompts in order to develop metrics around what's happening and can ask very custom questions like, Is this response happening in my brand voice? Or, like, you know, was this interesting on some criteria that matter to them? So being able to just like figure out what are the criteria that matter? Run those on a regular basis, including on production data, that's very important to the You're never done part like this builds on the last one. We talked to us a little bit. You need to be able to build a feedback loop. And this is unfamiliar, I think, to a lot of product teams that have been operating in traditional software. There's like a little feedback button in the corner of a lot of apps and maybe somebody in the support team reads what gets submitted. Building like a pipeline that actually leads to product improvement is a very different thing. And like you saw a little bit of what superhuman was doing as an example. We log all that data back into free play. We use it as part of measuring test results, how many people identify issues so they can improve? And then that also leads into being able to automate testing. So I mentioned before, like one of the practices in this space is to curate example datasets, you know, both representative good examples, edge cases, maybe failure scenarios, we let people curate those kinds of datasets on freeplay. So they can see something happened in production, save it for later, they can write their own whatever they want to do, but then be able to play those back anytime that they version their code or change their model or change their prompt and be able to score instantly with these kinds of eval criteria and see how the old version do versus the new version. And that's a lot of how people decide whether to ship it's very rare. I haven't met anybody yet. Who like only decides to deploy software with an LLM. When it's perfect. Generally, it's just like, is it better and are the failures not terrible? Some some criteria like that, and then the last part to like getting the whole team involved, you know, whether you use our tool or something else, I don't know I capitalize the P. Just not editing my slides carefully enough. But you know, we've we've really focused on trying to build a workflow that like gives developers full flexibility. We have very lightweight SDK, we think like if you put us on a spectrum there are frameworks like laying chain that are very opinionated and try to control business logic and tell you how to handle you know, working with an LLM. We're on the other side. We're kind of like us length chain or not do whatever you want. We basically do two things. We give you the right version of a prompt we like you record what you send to us. And then we do the rest of the work on our server. But then you can get a product manager or designer or QA person in the loop and building with this integrated system. It's kind of hard to see what's happening here. I should have probably just done a live demo. Maybe you can if people are interested but what this is showing is like a playground that people have seen before, but integrated with this whole test case concept where you can actually load up your saved test cases in the playground, try different models, try different problem changes. And the whole system just kind of works together in large batch tests from the UI. So those are some of the things that we're doing to solve these problems. But again, like just things to think about solving if you're building in this space, and that's the end of me talking. Looks like we got plenty of time for q&a from Dan's timer. glad to answer any questions. Thank you.
Thank you. Yeah, yeah, I'm glad I didn't use any bad words in those emails to
let me didn't ask for permission, but I figured it was safe. Let
me trade. I'll take the last night you guys and let me ask all the speakers to come up and join us for q&a. We'll start with questions for Ian's since we haven't done any. Okay, thanks, Becca again. Okay, so we'll start with a few questions for en specifically. And yeah, what so why don't we all come up right here and for q&a, okay.
Really bright yellow screen but I don't know. We'll leave it there for now.
Yeah, so it's interesting. You mentioned how LinkedIn is really opinionated. And of course, that means you're doing everything it's way if you're doing a rag, you're doing rackets way etc. Now, I saw in the slides for instance, you had an example prompt would rag So the advantage of that is opinionated and I don't agree with a lot of its opinions. The disadvantage however, is that does not mean that you need to like the knobs are not only in the prompt, right? So for instance, in rack are using the ranking contextual compression. What's your top k that sort of thing? Are those also part of the things that you would be able to track and to manage with the software that you're developing? Yeah. So
how many people in the room know knew what he was talking about? That's like half that's, I mean, maybe a third. I don't know. Decent? Yeah, that was that. That's good. So what are we doing at that level and Zack, like we're basically giving people the ability to log those different experiment configs with us as part of all their sessions, and then being able to group by and filter by those things. And then, you know, look at those aggregate scores. We're also giving people the ability to run evals on the rag content. itself, and you can do like raga style comparisons in free play. We're not playing a role in the ranking. We're not so it's like, you know, we have customers using Mongo customers using pine cone customers using all kinds of different things. And they're basically like controlling a lot of those knobs and we're giving them feedback on those experiments that they run. So yeah, we're not deep in the rag instrumentation. So,
Ian, I have a question for you. There was a recent case with Air Canada where their chat bot made promises. To a customer about a B Reedman fare. And then the person of the customer came back and said, Okay, I want my refund like, no, no, no, that's not how it works. He took them to court. And the court said, Yes, you'd have to honor what your chatbot said. If they were using your system. Like where where would you have flagged that or how would you have helped them avoid situations like that? Yeah.
So it's an example of how people are using our system for things like that. I mentioned you can create different data sets and you can write these custom evals. So we're not being really opinionated about what that solution should look like. We're giving people the tools to do it, if they want to do it. I think just it's such a diverse space. But if you're studying a chatbot, you know, things that people are looking at, like they'll create like a red teaming dataset of, you know, attempts to get something to, you know, bad to happen. Various versions of bad, right. So, making promises it shouldn't make, you know, making like harmful statements doing whatever it might do. And then being able to run an eval panel on those so like you can you can run that whole set of like, you know, manipulative or whatever statements through the Chatbot as a as a batch, and then score them and see, is it giving like a deferral answer, or is it answering when it shouldn't be? So that'd be like an example that people are using batch testing in our system. Cool. Thank you. I wish we could have helped them in advance.
I I am asking this question with a humility that I'm not a technical person. I'm a small business owner. And I guess, as I'm watching this presentation about AI products, you know, just from the very little that I understood, so there could be a lot of gaps here, but in the loom example, it's like this, the software that that helps, sort of make it more efficient to create and share videos and information about the videos. And it was funny because as a non technical person, I was watching you, and you spent maybe five minutes or more trying to haggle with Zoom to get it to work. And so I guess there's
a big opportunity in video conferencing, to start a company.
So I guess my question is, and I do this with love, but I'm gonna trash talk Google a little bit. I was trying. I spent probably an hour today trying to copy like a list of groceries from Google Sheets to Google Docs, and I copied the table over, but I couldn't figure out how to get rid of the table without Wow, keeping the data and Google Docs. So this is just one of the one of the many examples but I guess my question is, is to me in a certain way, the efficient the added efficiency of having an automatic labeler for the titles of my videos, seems relatively trivial in the face of a lot of other just general dysfunctions with software that perhaps only affect the weak minded such as myself. So I'm curious if you have any. I know that's just survey of one as a non technical person. I'm just curious if you want to speak to that a little bit.
I'm going to talk about Google because they probably won't say the same thing, like in a defensive way. I mean, I have a feeling that somebody at Google working on the Drive team is building that feature into Docs right now, that would allow you to just like drag and highlight the whole thing and say, like, reformat this as a bullet list, and it would just like, do it and get it out of the table. I'm just gonna guess right? And I think like, that is on par to me like it's a level more helpful maybe than titling things. to my point like, looms titling thing was actually more probably to their benefit, because it helped. It was a growth hack for them. But like those small little things that happen, I don't think we were like giving enough credit to like the accumulated benefit of just having these little embedded moments that don't feel like a chat.
Like, make it better. And I think that would probably be like, the point I would hope you take away from some of what I was talking about is like, do you haven't seen those things yet? Like, I think you'll start seeing them a lot more soon. And I don't know how many people have like, had an amazing experience with a web three app, probably not as many as raise their hand and the first you know, your this trend right? So I think there's just gonna be more and more cool things like that. I don't know if you want to add anything. It's
like a direct thing at Google. But ya know, and I get it. And I think we've all had those struggles from time to time. So even the more technical folks I think in general, though, another trend is just more multimodal stuff. So I think we're trying to make you know, different formats easier to kind of flow between right, just in terms of the way we're thinking about AI. So hopefully, in the future, we'll make that sort of transition right? Easier where you won't even notice, right, and it just something that can be much more seamless in your experience. I don't know the drive engineer that's working on that feature. I think I do know, a way can do that. But again, I think we're just trying to really appreciate that people get information and need to move it around to different formats. Right. So again, I think multimodal will hopefully help as we see better technology in that area too. Then David, I don't know if you want to add anything.
I'm not not not really okay, about pace the table into chat GPT or Gemini and say blow away the table and just give me the data.
Okay, other questions? Let's open it up for any of the speakers. Sorry, not gonna go back to you. Yeah. Thank you. So, you had mentioned feature flag, right. And monitoring and as you said, a lot of times, especially with MLMs, and chat bots. What I've seen from clients is that they're reluctant to switch over to a new model when they have something that's working and so mentioning feature flags can be really important. But are you also accounting for like Canary releases so that you can let out 1% Into the Wild half a percent into the wild because, like you were talking about things change over time. You might not see a great you know, the first week might not be your best week for that. But that doesn't mean that you should go back to the drawing board. Are you working on that? Or is that part of the feature flag? I know sometimes, like with rags, we've
talked about forecasting, and there's different depths of these things. I mean, in short, I think there's an opportunity to do a lot more there. It is actually possible with our SDK today to do what you're describing. And we have people that are working in that kind of a scenario where they're like, Hey, I'm gonna do an A B test and like run two versions of a prompt. If they wanted to control what percentage that would happen with they could do it in their code. So I think we have an opportunity to make some of that kind of classic experimentation stuff more of a first class citizen. This is like way in the nerd weeds. But yeah, just the trade offs that we've taken. We're really opinionated. Then we constrain some of our customers flexibility. So that's not a place where we've built a lot of
opinionated functionality. Yep. Thank you. Other questions?
Ryan in the front.
Maybe for kind of the nexus of you guys is so in to your point about feedback loops, like the query what you could be asking of these systems is pretty broad, right? And so one user's intention and journey, but like, is not the same as like going to kayak and just being like, it can be a really broad landscape. And so I think this is where the Nexus is, like Google's dealt with these problems. Before of like, your search journey might be just exploitive, or it might be I just need a thing now. And so I wonder how do you think about that feedback loop as something that can be really discreet and have a you know, tight? You know, value proposition for one user, but has a very different user journey that could be off in the weeds for somebody else? How do you think people should start to create schemas around that? Maybe? Yeah, I'd be open to everybody kind of thinking about that. I'm curious also, David, because you mentioned like your Dykstra is out there. As a as an example, I don't know if that was your example that you wrote up or someone else's, but it seems like there might be an escape hatch here to looking at more classical algorithms to look at the landscape of what people are trying to do, as opposed to saying that the answer exists within
the LLM itself.
I don't know if that makes sense or not, but two very different questions.
I don't know if I understood the question. Yeah, you can ignore the second part. If you like, don't worry. I was it just
can we look to some of the other options with AI and machine learning? How do you think about like pulling? How do you think about pulling out the feedback criteria? If you're not optimizing if it's not obvious what metric you're engaging for? If one user you're not necessarily optimizing for duration on page for one user, you might be looking for something you know, what? The first thing that I do when I start interfacing with a new LLM is trying to tell it this is when you should tell me I've spent too much time with you and when to get back to work. And like that's a very useful thing in my workflow. But I'm curious how you guys think about designing
features that get people to their answers. Yeah, yeah. So. So I would say like, in general, this is like, quite a new field, right? So everybody's like current to to do a piece of it and but something that we have like discover a lot in in the different kinds of engagement or different kind of the way that we will want to use the other than it is, is that like the prom engineering part is like, a super important like, is one of the key because in that way is how you can try to get like the best from the from the LLM. And, and different and also, regarding the question about the feedback that's also some something like super important point, because all we can evaluate, like the answer of the LLM are good or not, right. So, so something that once things that can be like sounds a little silly, sometimes but many times you evaluate the quality of the answer of the LLM using the same LLM. So so that it's something that is
yeah because in some for some
Yeah, can be Yeah, of course can be some issues but but especially when use of have like grown data, like don't true data, or when or there are ways to mitigate the problems for for example, some of the questions that you can ask to them to evaluate something is to say, is the information that the LLM is providing is complete or not based on some other barriers that you have, you can ask about the completeness of the of the information that is providing, you can ask about that is information correct with the context that you're providing? So there are like different ways, but again, it's it's better to because it's in LLM? It's not like the answer is yes, no, or it's a number and you can easily compare right? Here is like, can be like a sentence and how do you avoid that sentence, right. So so the best way to evaluate a sentence is just provide enough context in another question to the same LLM. And say, given this context, what do you think about because in that way, they can like reason about the context information that you're providing. But and but yeah, I assume that there are like different kinds of stuff that are appearing in this area. about moderation but in other aspects about how to
extract information and provide better context.
Thank you. I think we have two different questions here. Yeah, this is for so the developer productivity. So the demos are cool. A common technique for minimizing your webpage or something is to kind of minimize or obfuscate your JavaScript. And so the first question is, do the LLM give you the same results? And you're writing test cases are summarizing on obfuscated code and a follow up? There's reasons why yes or no. And the follow up is if that's the case, could you then do that? So you have more tokens to throw
in and then unmap or whatever? Yeah, that's like, that's a really good question. And to be honest, I haven't tried. But it's something like should be like really cool to try, right. Yeah. Because it's like to say, Yeah, this obfuscates the code, what it's really doing, and then can you provide like a clear version, right? Yeah, I haven't. I never. I haven't
tried that. But something that I think it's a good idea to.
Yeah, I was thinking you could also shove way more context. Yeah, yeah. Yeah. Now we're the Yeah. The context that we currently have, like for public now is 2000 tokens for the calling LLM. But probably you can even use the old version of Gemini. That is a bank that I seen that have like 128,000 tokens and yeah.
So hearing that if the problem there is that, oh, I can take this table and Copy To another place in the old world before MLMs. A like the way we would address that or like, oh, yeah, we have a bug. We have a feature. We'll go We'll fix this. And then the next problem, we'll go and we'll fix that with an LLM. Now, suddenly, you can just throw it in into an engine and say, like, make it better, like fix it. So suddenly, kind of the same solution can be solving a lot of different problems. But I think like is the result kind of software becomes like much more less deterministic. So like, you can't really predict as as well like, what's going to do. So it's easy to say what's, what is bad behavior, like what's wrong, like it's doing things wrong, but is it even going to be possible to define what's good like, what like, what is good output? Like, what are we just kind of getting this for, like, yeah, like, we're both we'll throw it to another LLM. And we'll try to evaluate and maybe it's maybe it's okay, maybe it's not, and we'll just leave with it. Or do you see a world where we can actually evaluate the outputs of like this AI powered products?
reliably?
Yeah, I think that's a good question. I don't know if I have a final answer for that. But, you know, I think that that's a almost like a philosophical question. Like what is good? Who makes that judgment? Right? What values do you embed when you think about those things? So I think those are questions we haven't totally grappled with, I would say across the industry, but I do think over time, we'll be able to use MLMs to help us answer that in a more intelligent and nuanced way. But I
don't think we have that yet. I mean, I would just add I touch on this a little bit, right. But the point you made about like, there's a temptation today to just throw a bunch of code at an agent and say, fix it. I think that's probably like, not the right mindset at the state in time, right? There's still a need to like know, what the variants are and like, what are you controlling for? And a lot of times when we see people building products in that space, like they can actually go from what we just want to if it's good to they can quickly describe 20 things that they would use to determine if it's good, and then they kind of narrow in on or there's probably four or five or six that we need to really pay attention to and there's all we can give attention to. And maybe if you spend time there for a bit you optimize for those. I was talking to like big Fang company that runs a huge at scale service like this and like they were talking about how as they do their human evals over time, they learn about new criteria that they start to care about more than they cared about two months ago. And I think it's kind of the same thing like problem solving is still a little bit constrained by our limited human minds until the robot overlords takeover. So, you know,
I don't know. Scope it fix that thing. For a bit
some questions in the back
may be a question for both but more apt en is what's changing the development process really, that there's more actors in it, or that the cycle times faster and not and as such, we're pulling in support? What used to be or could be docks, which is now maybe knowledge management teams, which is then tied to data analysis to things which are doing predictive pieces on the software. Is it more actors or is what's going on in the teams that you're seeing in session? Or is it just because it's so fast? We've got to have all these people coming in a way that we've never had to have
them in a in a traditional deterministic code coding cycle. I bet there's a lot of people that wish it was because of cycle times. I feel like cycle times are actually slower right now. Okay, because I you know, we talk to a lot of folks in this space. And a question we often get is, do you know anybody who's actually in production with MLMs? Yet, gotcha. And it's always a little bit of a shock. And we say, well, the only people we talked to are already in production with all limbs, but it's also a relatively small group, right. So I think there's a lot of that where people I mentioned the knob turning like they get far enough down that path, and they still don't feel good, and then they just get stuck. And like I talked to a very big company like you guys who don't know the name where they made a big investment all through q3 and q4 last year, and they just don't feel good about it. And they're just sitting on it and haven't, they're like, We don't know, we're gonna do they're just jaded. Dun dun. So, I think that there is some of like, it's harder right now. I think other things I mentioned, stand out more to me and like, just what I've experienced of why there's always folks involved is less about the cycle time and more about the like, just the continuous cycle like this is always on there's always something to be paying attention to. And I also think that the point about like your whole team is participate. Part of that is just like there's just some new candy, you know, stuff to play with. And a lot of folks just and you'll even see like teams that should not be working on this kind of stuff are because they want to touch it. Or like on the other side of the spectrum. The CTO at HubSpot like built their whole first integration. You know, the CTO at iron clad, which is a big legal tech, like he built their first integration all by hand. Everybody's
just like wanting to play with this stuff. Yeah, and I think we see a lot of that as well. You know, to be honest, I think there's still a lot of hype right around Gen AI specifically. And I think folks who actually using it into production are fairly limited, right? And I think some of that is the ROI needs to be there. And right now there's still a lot of investment upfront. I think that's required from a lot of people. So, you know, again, I think that's still important. A lot of people want to experiment, but you also want to look at the business value that it's going to generate over time. And I think we still need to figure out exactly where that is for a lot of organizations. So they'll pull through things to production, because I think we saw the same thing years ago with more classical machine learning approaches and just some limitations and getting those things deployed as well. But I think over time, right as more of these issues get addressed, and as we identify some more ways that this will have a positive impact in return for folks, you know, customers users everywhere. Hopefully, we'll
see more that's pulling through. So any thoughts? Good question, Ryan. Thank you. One more question. In the back.
We'll just do one or two more questions, and then we're going to run to our cars before they close the roads. Okay. Yeah, this question is for Google. At the end of your presentation, you talked a bit you had a slide about the business capabilities and using Gennai for for the different business functions capabilities. Can you talk at all about the enterprise architects and the solution architects roles you know, that's often a diagram based deliverables?
Are you looking in that area at all? Um, I think that areas still super important. So, you know, again, and you know, David can speak to this. He's working with some customers who are deploying right, I think you still have some of the same challenges, right with deploying these solutions that require that, you know, systems design solution architecture background. I personally haven't applied Gen AI to solving some of those problems. So I think there's still a huge opportunity there and especially for folks who get more sophisticated and understand some of the unique challenges and trade offs associated with different model sizes, right, some of the monitoring, right and the other things you want to do over time. So I think there's still a huge opportunity there. I haven't seen a lot of Gen AI applications for those specific roles. But again, I think a lot of the efficiencies we pointed out still apply. And a lot of the efficiencies generally for knowledge workers, right that we've seen kind of start to take place as well, but I don't know if
you have any examples on the solution architecture side. Yeah, so um, well, first of all, I assume that we have been joking with our friends like software engineers and saying that, yeah, probably they're going to be some kind of AI software engineer as like the one that was released. And, but but I think that there is a lot another kind of Baillieu on the people that just put things together. You know, like, like, that kind of stuff is just for me, I think that that is just like another level of stuff. And when you when you just put different kinds of things. Together require, like, just more contexts require more expertise. And so I think that that part is like, harder to just like automate as other kinds of tasks. And there is going to be like still a lot of value on doing that kind of job. And I think that's the kind of thing that we do that I do in my in my role, you know, like this is putting hidden different kinds of API's, maybe some time and do something, but also preparing a lot about how we can create some kind of piece of information infrastructure, in order to solve the problems that the customer want to build. Because this is like, yeah, again, he's, you can create like, a lot of products over the top of his just feeding the API of chargeability of Google of whoever provide to you like an LLM. But I think that that's going to be there's gonna be a lot of samples like like that, per se, there is gonna be a lot more also of companies that are going to try to create their own alarms for some kind of specific cases. Or maybe not even turning from from scratch, but at least to kind of like, customize the output or just to some kind of like, fine tuning of the models. And so,
I think that there is like a lot of
opportunities also on that on that field. Okay, I think David gets the last word. Let's
let's thank the speakers for a great presentation.
Night everyone Drive safe. Thank you.
Ones.
Know most programmers, reviewers and evaluators, maybe somebody that over to the