LlamaIndex Fireside Chat - Eddie (Glean)

12:03AM Apr 6, 2023

Speakers:

Jerry Liu

Eddie (Glean)

Keywords:

models

people

search

document

generative

retrieval

question

data

embeddings

queries

user

build

generative models

summarization

thinking

lm

customers

bit

approach

domain

We're happy to invite you this. And so I think someone asked is this going to be recorded? Yeah. So it is being recorded right now. There's like a discord bot that does it. And so afterwards, I'll clean it up and send it out on the discord channel. Sweet. Well, without further ado, Eddie, do you want to give a quick intro about yourself and what you do?

Yeah, sure. Thanks, Jerry. Thanks for inviting me. This is an awesome community. And I'm happy to talk to all of you. I think we'll have a really great, great discussion today. So I'll tell you a little bit about myself. I lead our AI team here at clean didn't clean for just under years. And we actually started clean just around four years ago as well. And so just kind of slowly building up this you know, the our AI stack, you know, even before all the generative stuff in the past six to 12 months, kind of setting those AI foundations, which I'll talk a little bit later about. But before I leave, I was on the brain team at Google. I was on one of many different applied teams. We kind of focused on driving research from publication focus groups into product so I worked on the first deep learning model we put into web search back in 2015. worked on that for a while as well as Google duplex, which was an assistant feature and several other kind of classic Google scale big problems. Like ads and YouTube. Yeah, so that's, that's, that's kind of what where I'm coming from. I can also set the stage a bit about clean itself. Jerry, if you feel like that's productive.

Yeah, that sounds great. Cool. Okay.

Yeah. So So at clean we're basically trying to build workplaces are enterprise search. And we really think about that as part of our vision to bring technology that people expect in their day to day lives. For example, you know, using Google to find information, or you know, more recently interacting with chat GBT type interfaces, we really want to be able to bring that interface and that technology to everyone who is a knowledge worker, basically anyone who works with a computer. People have just gotten used to not having access to this kind of technology at work, asking their their their co workers repeated questions, looking for documents that no one can find. And so we connect to dozens of different data sources, your kind of typical suite of enterprise SaaS apps, and provide a search and question answering experience on top of all of them, and and in a consumer grade web app. And Chrome extension for user. So yeah, we are a Series C Company. We raised our Series C from Sequoia about a year ago. And before that, yeah, so we've been around for four years the from a customer angle there's I always get confused which customers we can we can't talk about, but you can check out our website for logos that are definitely agreements that for Republic definitely a bunch of names most folks have probably heard of across a midsize tech companies, both private and public.

Super cool. So we'll talk about the stack in just a little bit. But before that, I'm actually curious because you know, pre Garner, they I was actually quite interested in this space itself. I actually had a pretty similar idea. I just never execute on it, especially in the space of enterprise search across your workplace apps. Why do you just add curiosity? Why do you think like Google or existing players like didn't get into the space earlier because search has been around for a while, but like, what was the challenge of Enterprise Search?

Yeah, that's that's a fantastic question. And I will say Google actually did try to get into space. There's a product called Google Cloud Search, which several of our prospects customers have either tried or heard of, and so they kind of dipped a couple toes in there. If you asked me to opine on why Google didn't execute well on that I could go on for a long time about it. And mentioned things like just general big company, dysfunction or inability to execute I think, at the end of the day, it just wasn't prioritized by them. That doesn't mean it's not, I think, a massive market and something that's solvable. But they also didn't build their product from kind of a product first standpoint, a lot of and this is the case for a lot of other players out there as well. They're selling search infrastructure rather than an end to end product, right. They give you some sort of platform where you are expected to invest, you know, several engineers time for several weeks to kind of get things connected to knobs and stuff and, and we're not trying to sell search infrastructure. We're trying to sell a product that you can plug in and get your your employees work using right away. I think there's also something to be said, Sorry, Jerry,

did you ever? Oh, no, I wasn't saying I got it. Yeah.

Yeah, I think there's also something to be said for fundamentally, the Google Cloud pairing with its own kind of office G Suite being I don't want to say it's a disadvantage, but you know, they're gonna focus on pushing the kind of G Suite offerings and treating that as a first class, right. And it's the same story with Microsoft and the kind of search through Office 365 They drive, but we know that in reality, there's just massive SAS fragmentation. You know, sure. Recently, in the past few months, companies with the macro climate have have cut back a bit. But, you know, in general, there's just this massive sprawl and a proliferation of SaaS apps with information spread across everything. And you know, if to Google, your Confluence integration is always a second class citizen, or slack, you know, all your information in Slack, then they're never going to build a product that kind of puts you as a user first meets you where you are. mean to you where all your information is. And I'm trying to think of there's other stuff I can rip on Google for but I think that's probably the most

okay, maybe makes a ton of sense. Awesome. Cool. So given that pre journey of AI when you were first starting to build a company, how were you thinking about search what were some of the AI techniques that we're using, both in the space of NLP and maybe even like recommender systems, you know, my knowledge of search, obviously, is not going to be as deep as yours, especially with respect to this space, but curious to how you were thinking about

it. Yeah, yeah, definitely. And, you know, in the past, we've been a little bit more cagey about this technology, but I'm gonna try to be pretty transparent. I think, you know, we're not in the in the space anymore, where we feel like oops, if we talk about what we're doing here, someone's going to be able to just build all this right away. So we definitely want to engage the community and share out a bit more kind of how we've been doing this, but you know, we were built with a lot of Google DNA. Besides myself, a lot of folks worked on on Google, in web search and and other places. And so, kind of from the start, what we built was your classic hybrid search system. So in the, in the, I think, vector search community, it's pretty well understood and the literature does reflect that, especially in out of domain settings. Vector search is actually not better than then hybrid or even just your sparse bm 25 Space researches space methodology. And, you know, I think it's difficult to tease apart the product offerings and the way people market stuff from the real research that and the kind of benchmarks and what works in practice. But even though you know, vector driven methods showed massive upside for, you know, certain benchmarks and stuff. I think it became very clear that there there were limitations and, and so we went with a hybrid approach that kind of you know, people kind of shy away from the phrase keyword search, or TBM 25. Like it's, you know, this, this really, really naive thing which it is in isolation, but once to throw in very aggressive and powerful query expansion, as well as a lot of more rich documents side work, a term based approach can get you very far right. Now, that being said, there's a reason why it's hybrid, and that's because we also fuse it with your classic, you know, vector driven and type approaches. And that kind of fusion, I think has has led to us feeling confident about our net search performance.

That makes Yeah, so that's actually super interesting. So you're already thinking about hybrid search. I know that term has kind of started pop up a bit more recently, as people are thinking about retrieval, battle retrieval with albums. For some of our listeners, do you think you could give a just a slightly more detail explanation of how hybrid search works? Specifically, how do you actually combine keyword block up with semantic search with embeddings? Yep,

that's, I'm happy to do that. I think there's a variety of ways that it can be done. And I think a lot of different vectors, database providers are kind of figuring out how to take the best of those combination technologies and surface them as API's or whatnot. I think some base level things you could do or are on the retrieval side, you can easily have retrieval be driven by Okay, how many terms including my expansions and whatnot, are in this document, along with a top k approach? And so you can do an or over those two sets, right. And then from a ranking standpoint, there's many different formulations you can do. We have a pretty, pretty deep and complex integration. Because our approach is, is kind of not simply item level, if that makes sense. You know, when we think about our documents, we think about the richness that backs what a document stands for. It's not just one piece of text. You can't simply chunk a document into multiple paragraphs you have to understand you have body content that could be chunked and such. You have you know, many many other pieces of of support on that document that you also need to think about, some of which are on the document. And in parsing, the richness of that some of which are off the document, right, thinking about how documents linked to each other. And that that classic kind of anchor signal of linkage is also incredibly useful. Both in the vector setting and in the sparse setting. So first of all, I say I think there is a world of ways in which you can marry these two technologies. And you can think about, hey, what kinds of queries do I want to apply, you know, a sparse method for more heavily and what kind of queries do I want to apply? A dense method for more heavily as well? Awesome. So

two follow up questions from that. One is, do you like especially since this is something that users interact with, like how do you think about baking in user feedback into this process? Or is this something that you just try to tune on some ground truth data set? That maybe start with that question?

Yeah, that's That's a That's another great question. You know, everyone who's worked in production, and all system knows that. User feedback is a double edged sword. You can put in a bootstrap system with no user feedback, collect a bunch train a model, but then you've entered the kind of feedback loop and you need to make sure you handle those things like position D biasing and understanding how your systems fitting to itself over time. So it is a can of worms, just like it always has been for the past decades for these these kinds of systems. But we try to take user feedback in a pretty measured way. So you know, there's implicit and explicit feedback, right? explicit feedback being an actual report from the user saying, Hey, you guys just totally missed this document I was looking for or this question that you surface stuff is not relevant, right. And that's kind of a very powerful report. It doesn't scale. It's not huge volume, but it helps you actually understand what went wrong and you competent went wrong, right. And then you have implicit feedback, which is more of the form of how a lot of these large scale systems are trained. around measuring okay, you know, did someone click on this was all after we surface to them? Do they not click on results above it? And using that as training data in the appropriate way? Right. So for us, we definitely do a bit of both. I think for us the the kind of implicit feedback we're talking about around queries and click data is something that can't be a crutch because our product in an enterprise setting needs to work from day zero. And it does, you know, we of course, it will get better over time with his feedback, but you can't give someone to you can't give a product to a company that's paying you a lot of money and say, Hey, we kind of suck right now, but I promise it'll get better. And so we have all kinds of things we do before there's actually feedback flowing through glean all kinds of things we can do to make sure that the models we've trained on day zero are actually of value. And I'm happy to dive into that a little bit later, too.

Yeah, it makes a lot of sense. So I mean, fun fact is like I mean, I used to work almost exclusively on recommendation systems actually with I think someone who's on your team whose name is trout, and back during my corridas. And so I learned a little bit about you know, kind of the cold start problem with recommender systems and implicit explicit feedback. Yeah, and so I guess what you're talking about and correct me if I'm wrong is like, there is some sort of weighting between doing kind of retrieval on search based on like semantic matching with like, for instance, like similarity search, but also trying to incorporate some of these like Racket REXIS concepts in there as well.

Yeah, I think, you know, there's a bunch of different ways you can you can model the problem, but as a whole, you have measures of how relevant you know, is this document to a query, you also we also, you know, AquaLine have a very rich medium of data at pairs well with this text data, or I should say, actually, there's there's really two two mediums of data that that actually make our search, fundamentally something that that just works, right, because again, that is the goal here. I'm not you know, we're not you're trying to publish benchmarks that show we're better than people, you just want to deliver a product that works, right people are tired of, of enterprisers that doesn't work at all right? So those two mediums that data besides content, which is a given and content, as I mentioned, is rich content, right? It's not just a static document, but it's how that document is formatted, how its structured, how it links to other documents, building that kind of graph, right? But there's actually multiple nodes in that graph. You also have people right, so understanding authorship, of which people are authoring what documents what kind of content people are writing about. Because we crawl everything and unify on our identity on single sign on, you have a unified representation, you can train like user embeddings you know, you know what kind of personal works, what kind of things and how to map correct the people, right so that's your second medium of data. The third one is activity, right, an activity I'm talking about beyond activity at the at the kind of insight gleaned feedback level, right? We connect to our data sources and pull activity data so we know when someone for example, in G Drive, visit the document, right? It doesn't have to be within green at all right? As long as you've connected to this, this API, and you built it out, you know, this really robust crawl infrastructure. You can actually use that data to power things before people are even using green right? And combining those and those three mediums kind of all linked together in this way, that form really good training data, even before you get that implicit kind of quick feedback later on.

Awesome. Yeah, that makes a ton of sense. And I think this is a nice segue into the next section, which is okay, so you have all this great stuff, you know, you have the kind of Becker certain hybrid search capabilities, you have some incorporation or user feedback. I'm assuming you were preparing these training datasets and training your own models. And so now with the advent of LMS, and the explosion of popularity of generative AI, what are some of the biggest changes either across the bonus that you just mentioned, or elsewhere, that they've introduced your sack?

Yeah. We really needed to rethink and examine kind of where the best places to apply these generative models are. And for us, we kind of liked to make this distinction between LLM and generative LM and now what feels dated old school year old classic like encoder only LM s which are, you know, small things, right. And so, you know, we we've been taking domain adaptations approach across all of our customers applying. It's referred to in different ways in the literature like domain adaptive retraining, domain adaptive pre training, but basically taking language models you know, in the MLM, and next sentence prediction space, and continuing training on raw text from the enterprise, right. And that basically compresses the enterprise language into these language models while maintaining some world knowledge from the base side, right. And that serves as a foundation of all that we've been using to fine tune things like for semantic search, or various other query understanding tasks. Pretty much any NLP tasks you can imagine it's relevant to our problem space, you now only need you need way fewer examples to kind of trade off of because you have a good foundation model that's built off the language. Right? So that intersected approach has been close to close the heart for us for a while. But on a generative side. It's definitely opened up a bunch of new possibilities. Right. And you we announced that feature on on Monday or actually heavily yesterday. Our first kind of generative AI feature where we're actually generating answers to questions previously, you know, with with the data and the models we have we can run extraction models on you know, finding mining question answer pairs offline and surfacing them when people ask them questions in search. But generation was always kind of behind, right. We had explored generating model generating answers or questions, you know, pretty much every six months for the past four years and the quality was just never there, obviously, until this latest suite of models right. You had a kind of a step function in incoherence. Right. So that's the kind of primary it's kind of the obvious one, right? You've searched in question answering our cousins, and we've already done extracted question answering. So now we want to layer on generative question answering. And that's the kind of first thing that being said, there's kind of two I would say two phases of projects going forward. We're investing in one is more more generative features, right. And there we think about what do people you know, what do people want from us? Right, what kind of unique advantages can we provide? We have a, you know, this, this kind of cross Data Source view, right? We have this horizontal view across all these different places people are in not just within Oh 365 Or not just within Slack. So how do we leverage that and bring general technology experiences on top of that, that kind of horizontal view as much as possible? So that's one space, the thing is building new features on top of that, the other one is basically figuring out okay, where are there places where we have such little data? We can, like learn an effective model when we want a zero or a few shot a different path, right? How can we solve some of our existing tasks better that do power surge with Alan's in a in a more offline or under the hood? Way? Right. So those, those are the kinds of two families of things we're thinking about is is user facing features that are newly enabled, as well as kind of continuing to improve our foundation, make all the parts of search and our core products better with those generative technologies.

That's that's and double clicking into the drive component a little bit more, is kind of like the primary consideration just being able to generate or synthesize a response given that you've retrieved the right piece of information as i Harry thinking about it, like you surface some features that hey, like, we'll just be able to give you these answers. Instead of just like raw sources or related answers, we'll just be able to synthesize this for you as well.

Yeah, I think we were kind of and I hate to confuse with another parallel fourth year, but I'll say we're, we're kind of again, hedging on two approaches to applications of general technology. So the first is what a lot of people are pushing, which, and I was I was at Sequoia event a couple weeks ago and and Sam Altman said this about opening it he was like, people need to stop treating these models as knowledge databases, but instead as a reason gadgets, right, Jerry, you're out there that too. So I know, you've heard them say. So. So in that sense, you know, people using LLM as a retriever, or kind of relying on this implicit world knowledge inside that model. There might be value there but the kind of retrieval augmented generation approach and using the model as a as for what it's good at, right, which is, and reasoning is a loaded word. I will say, you know, I think reasoning in the literature, there's formal definitions of reasoning and certain people, especially on Twitter will jump at you if you can, because you claim that, you know, today's models are doing reasoning, but certainly they're capable. They're more capable than the past generation models that doing something that resembles reasoning synthesis summarization. So how do you compare that with knowledge and retrieving knowledge in a separate system? That's the first suite of approaches that being said, we're still exploring the second second space of approaches, which is how can you actually drive more knowledge into these general models? What happens when you do that? Right, what happens when you actually push you know new information into the model directly? Via all these new the all these different ways you can adapt the weights of that generative model as well. And when you interact with that generative model, most instruction tuning posts or lhf, or whatever, how does it look like what what's actually happening? Right? This is honestly an open research question. And so we're just trying to kind of look at both ways and think about what can emerge from both sides.

How does that second approach you're described, which is you know, just like baking in the knowledge into the weights of the network, how does that play with some of the stuff you were just describing, which is like distilling these models into kind of smaller models like are you also trying to bake knowledge and also distill it? So that's kind of a gleaned specific? Are you trying to just fine tune the rock base model?

Yeah, it's a good question. And, you know, in how fast all this stuff is moving today, it's honestly there's just a lot of overlap. There's a lot of different directions that we were trying to go I think, there right now, you take GPT four, it has these amazing capabilities. But they're also kind of overkill in many cases, right? And so you will no undoubtedly have task specific models that are smaller, faster or cheaper that actually perform better for the given task, right? They're not generalist models. And so we for us, we're trying to figure out okay, with in which places are there, you know, is there space for a shared task tasks? We said we can fine tune a model that does several tasks or or is it worth it to fine tune on one task specifically, but I think, you know, from an iteration pay standpoint, taking the largest model, the generalist model, which is capable of doing all these things, allows you to build it really quickly. And that's super important, too, right? Because at the end of the day, you need to actually put this in a product and see if it works. And then after that happens, now we say okay, how do we make this better? We've achieved some bar that users are getting value from how do we drive the level size and cost down or how do we keep pushing for the quality but I think these these big generalist models are, are definitely like a totally reasonable starting point for a lot of that

makes sense? Let me dive into the whole evaluation piece and just talk a little bit more. But before that, the one thing about some of the, like, new kind of advances, for instance, in our labs, or at least like you know, a common paradigm these days is that to do some basic retrieval augmentation for language models, you would call open eyes an endpoint to get back an embedding. And then do an embed into your data and store it in like a vector dB. And you just do basic semantic search, like concave retrieval and fetch it and then be able to pass that to unwind with modeling or to synthesize an answer. So I know at the very beginning, you were mentioning how Hey, like these embeddings like they don't actually work well on like outer domain stuff. But like, how general Do you think open eyes embeddings are like, do you think they can, like the embeddings themselves have advanced such that like, hey, semantic search over these newer embeddings are actually better? Right versus kind of, like, you'd still need to have some notion of hybrid search or to work while I do retrieval? Well,

yeah, I think. I think it's a good question. I think it kind of boils down to how you actually leverage those, those embeddings. Right. So, you know, if you, if you, if you're thinking about ranking documents, documents are, especially in our setting in the enterprise setting. They're too rich to be boiled down to a single div in that it doesn't matter how large it is right? Now. You say okay, I want I want chunks of embeddings. Right. I want paragraph chunks and all that all of them. Right. But then from that point, I think in this kind of relates to the broader topic of how we handle search and question answering across dozens of different effectively modalities, right? You have wicked Wiki pages, pay publishing complex of JIRA tickets, your Slack messages, right. And each one of them has a different way of coming into our system. And we have a standardized format that we mapped everything to figure out a way to encode the various fields on that. And then and then run our index stuff, right and then are multiple systems and broad search opened up. So I think kind of I'm rambling a bit here, but to go back to your original question about open the eyes embedding. There's no doubt like as a general text and better that there embeddings or any honestly any of these providers are massively useful and and for many use cases will probably work well enough. Right. But for us, you know, we have done these benchmarks, right. It's kind of an unfair benchmark, because obviously we're fine tuning to our task and we're comparing to a general text and better we're going to be better at that task, right? And so in that sense, like, you know, once you've done that the task specific embeddings you've literally optimized for the task, so it better it better, be better. But for again, it's kind of a question of like, how much better do you need it to be for your use case in your application? Right, and for us, the answer is like we need it to be the best possible.

Doesn't make sense. Now going back into the evaluation piece, yeah. How are you thinking about how to evaluate these components because like, in my mind, it's an entire system right? You have like this a set of models. You have, for instance, and batting based on techniques, whether from like a fine tuned model or like a basic batting model. you've retrieved the right piece of information. Obviously, before that, you have to structure the data in a certain way to and then you know, now you have the speaker where you could synthesize a response. How do you think about evaluation either to like, you know, within each of these components or across all of them, and what metrics do you track and how do you actually kind of like diagnose, which features you should or which components of this you should improve given today like, top level degradation and the evaluation metric?

Yeah, I think it's, it's amazing to me like ML ops, and telemetry just started kind of, at least outside of the big players just started kind of maturing enough. And then we went through generative models and that makes it now a mess and hard to figure out again, I don't have any silver bullets for you. I don't have any like perfect answers. I think things are so early on that, in many cases, especially for the generative applications themselves. It's very clear you don't need a huge amount of granularity to know that you're improving your prompt or improving your model. It should be very clear, right? But as the easy the low hanging fruit starts disappearing, I think bringing more rigor to these emails are going to be a lot more appealing I think, and I think for us, we look at things end to end and, you know, as any good engineer does, you break it down to understand which part of the system failed. And if it's the former, you can frame it as a search problem and figure out okay, when did we not retrieve the right content? But then we went and again, it's not just document retrieval, but then with talking to people content retrieval itself to write and then you can think about they can be sequenced in either order, but but at the end of the day, you need you need actual subsets of content from documents, right. And then after that, if you depending on you know, if you're taking a multi hop approach or a single prompt approach, you know, you have to track it down to, to where, you know, where the model broke down in a given application. And none of it scales. Frankly, right now, like none of it is scaling yet. And especially when you have a like a model API, that's fundamentally not making a discrete prediction, like in the old world of predicting score for a document or predicted number or whatever, and they're your eval sets, your golden labels, you know, are not our event directly comparable. Now we have to get to like okay, how do we are the old text generation metrics like Roos and blur? Are they actually useful or not? Right? And everyone's scrambling, I think, to think about how do I evaluate generated text because the applications that they're being used in are not just like translation anymore. They're, they're actually like applications people didn't think they needed to evaluate or because they weren't really possible before. So some things that we're thinking about the classic kind of, I'm sure everyone's been through this using LLM itself to evaluate the more you can scope down a given task or an LM the better it does add it so using a generative model to separately evaluate the output and compare it to either a golden output, or compare two different outputs is one method of evaluation. Actually, like, you know, using our same domain adapted encoders to measure text similarity between what's been generated in a golden table is also effective again, that relies on the premise that you have a good encoder that can distinguish between these things. And kind of point out when you know there's a small nuance between these two and that actually makes a big semantic difference between what you're what you're presenting in your in your generated output.

Whilst that makes sense, and yeah, all on based evaluation, obviously, super interesting. It's kind of like solving its own problem. Like how do you evaluate unstructured output this feedback into a language model and just let it figure it out? So this is super, super interesting. Awesome, I think what's what I think we have about like five minutes left before you turn into questions. In these five minutes, I kinda want to talk about in your mind, having done a bunch of search and retrieval, and also play around with Alan's what are some of the biggest challenges and failure modes and maybe like kind of things that have yet to be figured out? Like for instance, like there's certain types of questions you want to ask that just like don't work right now. And what potentially lies ahead in the future that you're excited to potentially see happen to solve some of these triangles?

Yeah, so

you know, I think to beat the dead horse of hallucinations, but it's an important that horse and someone in the chat already asked about it. In the enterprise, you cannot afford to tell wrong things to people. Right. And so that I think Logan had a question here about insurance answers on factual correct before being presented the same data foundations we have to train models to build search to give relevant content can also be repurposed to train like entailment models right and, and now again, again, you could ask for an LM just say like, could you entail this information from this other information? entailment is a well studied NLP problem. But at some point, you're introducing another model call, right? And the user is waiting for their response, right? So you kind of have to balance this. The fact that you need to ensure correctness or run generic post generation verification, I call it right anytime you want to run anything on top of what's been generated, whether that's another generated output or not. There's gonna be a latency implication for your product. But it might be the thing that's needed, right? And so how can you drive down pre generation hallucination and kind of set them all up do well to minimize the number of times you need to run post generation things as well? defects? So I think I think handling and kind of processing there's a whole ecosystem honestly around processing output into structured output, and also like, sanity checking that so that's, that's one one thing I'd highlight, I think, in terms of what lies ahead. I am very excited about the kind of agent abstractions that that have sprung up around, you know, having access to tools and it's I don't just mean I think these are extractable, not just to doing actions, right, like not just the hooking into API's, but the way we like to think about it as, hey, if I'm someone at work, and I have a question or I'm gonna get something done. Right now, I have seen as a tool, I can issue some searches. I can have some questions, but I'm effectively doing multiple things there. Might be issuing multiple queries that each one of which is well served, right but and then I might go read some documents and I go ask a co writer about some things right. And and what these kind of frame Asian frames allow us to do is actually model an LM ft agent doing that, right. And so we're really excited about being able to take complex, you can call them queries but complex things and figure out how to decompose them into multiple actions that when you have a very powerful search behind you, you can interface with repeatedly and eventually, you know, do multiple hops or combine things in a different way to actually get to an action or an answer that a human would write after executing in the similar similar hops.

Awesome, yeah, but that's super interesting that you mentioned that because that's one of like LlamaIndex is so was to is I mean, like, obviously, there's this like broader internal structure, but specifically Hey, like for data retrieval and synthesis like you have all this corpus just like different types of data, how you best retrieve the relevant items given to those complex tasks or buried and break it down into smaller components, interact with your knowledge sources, and retrieve and supplies the right piece of information. So that part I'm also super excited about we actually have some of those components in LlamaIndex, through like, kind of decomposing queries, like breaking it down through multi step processes. But it's good to ensure that you guys are thinking about this too, because obviously, there's a few challenges right now in terms of cost of latency, like but more you can be as close together. The bigger the weapon, the higher the latency and a higher price. Right and so how do you actually scale this such massive volumes of data?

Yeah, yeah, I

think, you know, you know, you can say, hey, I want to use 10 off of GPD for but then you're looking at what a user waiting, you know, minutes for a response. Right. And as well as you know, forget the dollar value, but that's, that's its own thing. And so, thinking carefully about okay, now you've explored the, the kind of headroom the kind of tests that can be done if you let the most powerful thing do every single thing. Which one of those can you pull out into existing frameworks you already have, right? For example, query rewriting, that's a well studied problem that Google has been doing for a long time. You know, doing that with a large generative model is probably overkill in a lot of places, right. So how do you slowly pull out very classic tops etc. In a way to actually quickly make this something you can put in front of users and again, real value out? Right.

Right. Yeah. And I cannot wait to 30 seconds to get to that. Right. Awesome. I think that concludes the main portion of the track. But you know, let's maybe spend like five to 10 minutes going over some of the questions from the audience. So, thankfully on us, how do you measure the value created for your customers and just buy it How much do you charge? What was the price discovery process

like? Yeah, that's a that's a great question. And for us, as a company,

we we strongly subscribe to you may call it 90, but the the kind of philosophy that you build a good product, and honestly, right, and you don't ever want to undercut you know, like, cheapen your offering. But you put the product in front of people and like our experiences, they get really mad, could you take it away, and that just drives the value? I think there's all kinds of ROI calculations you can do. But at the end of the day, especially in the face of tightening macro, you know, it's it's, it's a kind of a subjective process. In terms of how you take, oh, here are these numbers, we're throwing that up, how much time you know, your team is saving, how many, you know, actions they're taking, et cetera. But and you can baseline against, oh, here's the Sonos app provider you're paying for. You know, we do that and more and charge less. But I don't think there's any sort of canonical process there. So I'm also not our sales team. Although, you know, as AI speeded up in the past six months I've been on a sales call, so I'm not the best person to answer that. But just sharing a few thoughts for my

next question is how do you handle permissions for enterprise? Maybe this gets into like, how data access

sorry, Jerry, my airplanes just ran. This was taken years.

Yeah, of course. Alright, so the question is, the next question is how do you handle permissions for enterprise search? So I think this gets into like data access. Yeah.

That's a fantastic question. It's like a first birthday. We kind of built the product with that in mind from the beginning, right. So every data source we connect to has a permissions model, we mirror that permissions model, building the technical infrastructure to crawl permissions alongside content do so you know, with order of second wave and see propagation down to the index is very challenging. That's why we've had you know, many people working on it for four years. But that just comes as table stakes for search, right. So you know, at query time you have user you nobody Are you know, what buttons they have access to and across all the data sources, and that extends the generative, right, that's the nice thing about our ag is, you know, assuming your base model, by the way has not been trained with proprietary information. That's where the domain and allocation approach is to get a little bit trickier but assuming you're using some base model, which has no knowledge of private company info, then as long as what you feed into it is only things that you know, are a given person has access to what is generated will also be added and you can further run telling them models or other checks, but basically, we have the infrastructure crawled, needed, permissions and map that into our own permission, our unified identity and group membership framework. So that search time we can enforce them correctly.

The next question is these are own all and for productivity. I think you might have answered some of that in the previous part. So

yeah, we train our own language models in the nongeneral sense. Right now.

We're also exploring trading and our own generative models. And we also use the whole suite of all API providers out there and kind of just been going with whatever works best for our product and for our use, given time. So for example, like our AI fans, or AI answers feature right now, for our pilot customers we we have deployed with, you know, there's an option to go with an open AI data model, or if depending on the security posture, we also have we have access to Google's poem as well. And so, we, you know, offer different ways of doing that, but it's really, we're not tied to one or another. I think it's not super meaningful to anchor on one and we've been talking to all your classical corporates on cohere don't think etc.

And then we're gonna factor in was around valuation. Are you doing anything special to ensure answers or platform Correct? Are you actually implementing these evaluation techniques? Or are you mostly experimenting with them? Right?

Yeah, so the last question I think I touched on already we do have a system that runs on top of generating data before presenting it to the user. And independent of our actual activation, like our evaluation is also like, hey, we have some threshold where we say, you know, we can't present the wrong or confusing answer to someone more than x percent. of the time, and the system helps doubt. So both from an evaluation and actually at inference time what we're doing involves solving that problem.

And then, Bob asks, Are you largely fine tuning open source models are also using sample on providers think the second part is actually pretty interesting and relates solely on this question as well, which is like our enterprises open to the latter given data concerns, and also how you think about like protection of customer data, given our adoption

fantastic question. For the latter, so I'll say You know, we definitely are fine tuning our own and thinking about how we can host them. The way we works is we actually today, you know, pre generative models to pre LM providers, like we host we have customers either hosted their cloud or, or we host for them depending on what they're comfortable with. And so everything runs within their cloud projects, you know, our language models from, you know, older times, also run inside of the cloud deployments.

We run it for stereo and TPU training there, etc. So, their data it never actually leaves and

that's why we've been able to, you know, nail several security companies themselves as customers, getting through the wringer on their kind of security reviews. Lm providers definitely open up a new can of worms in the sense of in many ways, even if the kind of pen and paper actual agreements aren't that different people are primed to be scared of this generative tech especially, you know, open AI only ends up I guess it's been like nearly three weeks now. I made the default opt out training gave exemption. You know, that's something that was table stakes, but even that right, just going through latest and most the retention, you know, encryption at rest, all this kind of stuff is something that we expect, like the whole ecosystem to tend towards, but certainly off the bat know most, like many enterprises are not immediately comfortable. That being said, you know, we've been in calls where the customers like, there's no way I'm sending my data, open the eye, and then we

show him a feature and say, Oh, maybe that looks pretty

damn good to me. Maybe we're open to it, right? And so, you know, in this world,

nothing is black and white. And people can always be convinced, but certainly more providers, all of those providers becoming more enterprise ready.

Is to the benefit

of everyone here.

Yeah, I think that's a pretty good bet. It'd be it's pretty much in everybody's faces. But I think the next question let's say there's four question marks for the second time. The next question is how we approach our summarization. If you need to extract emerging trend info from 1000s of new documents every day, or answer queries or cover hundreds of specific documents. Doing so via closed source outlines take too long cost too much. It's kind of like summarization over a large amount of data have been bad. You've dealt with it or something like that you've played around with

Yeah, we've definitely thought about it. You know, I would say not to hammer the surface to hear but search is a tool and a way to find what subset of your corpus is salient or relevant for a given concept, right. Concepts evolve quickly, as well. And so building a good index that can support that and running, you know, either open and closed source elements on top of a subset of data that you already strongly believe it's no different from the kind of classical retrieval and then ranking right. You have a lightweight retrieval mechanism. You have a heavyweight model, you can run on

that subset.

And so that's hopefully kind of useful, but I think search is a key part for even things like summarization. Yeah.

Yeah, I think speaking from the LlamaIndex, we are have actively investing and both ways to make have the summarization balls a bit cheaper and more efficient. But and also to make this hierarchy basically propagate. And what I mean by that is like, let's say you have some updates, and you want to kind of update the aggregate summary. Have you basically have some sort of like real time streaming capabilities where like, you know, you have some document updates coming in, you're able to not just update like the immediate index, but also any like summaries that depend on this index as well. So for instance, like if you have like, I think probably the way to think about this is, yeah, like some sort of hierarchical summarization, where you have a summary for each document, you have a summary for each topic. You have a summary for this entire tree of data that you have. And then every time some new updates come come in, you want to kind of refresh the parent notes on some frequency. And so I think that's probably the best way to think about this, then you're having to like, rebuild and re summarize everything for every new update, as they're just like having this habit

match. Definitely, I mean, for us personally, like, you know, we deal with documents changing like customers. Obviously, are writing investors, creating documents editing them. I like in very, very fast paced, right. And so, all of our economy, existing pipelines models, they train, they have to be carefully. They have to be set up for incremental learning, right. You can't just always want everything to pass repeatedly. And so this is the same abstraction it just, it gets a little bit fuzzier if you don't define your API's for data dependencies as well. But it's definitely kind of been front of mind to be able to do this kind of update to learn artifact in an efficient way. Yep,

exactly. Kairos What do you think about using albums, that's symbolic filters to macro search? Is it similar in spirit to taxes, or writing?

Yeah, I think if I think this is definitely an interesting area. I do think, figuring out

for us, we we have served users that query in a bunch of different ways, right? Whether it's, you know, giving more explicit instruction to saying, you know, I really want this term or I want, you know, results from this person. And so for us figuring out how we can map back and forth between natural language questions, and that space is meaningful. And then also on the technology side. You know, thinking about how some of the kind of hybrid approaches we've been thinking about can also be rewritten or leveraged leverage to be better with, with with rewriting VL ABS is definitely an interesting idea. I don't think we've necessarily invested time into it, but it does seem similar to Texas equal and spirit and a lot of great stuff. We do is is really good as well.

I think speaking from a LlamaIndex side, that's something we're probably thinking about adding to because it's cool. I mean, if you think about just like the NFL and that you think of all the parameters that go with a question like how do you not just like kind of convert the question into, you know, a search result given affects that forever, but actually kind of convert it into these meta constructions you've got like determine like, okay, or you know, just like some like general parameters or filters that you need to go forward in order to interact with your, with your existing, like knowledge base, right, but that's in a macro store or somewhere else and so like, I think that would be super interesting, this idea of like some sort of meta API that allows you to basically kind of like, distill this question not just into the results given an existing structure, but like, figure out the structure and the parameters themselves.

Yeah. Yep.

CORBA How do you ensure a customer gets the right answer and their central destination and I think you might have already answered that. You have, basically the like, guardrails for funder outfits. And then symbolic filters. What do you think combining Mala crafts with on this building contrast with all ones I guess you can maybe speak a little bit to this. Yeah, you're, you have graphs to the data?

Yeah, yeah. It's definitely like a really, really deep area you can go into

building a more explicit knowledge graph is incredibly important. For a lot of our use cases, especially because you're building one per customer, they have a different one. And so thinking about how you can use LM to build kgs or actually layer them on top is something we've been thinking about for a bit. as well. So it's great idea.

And then last question, we're a little bit over. What are your thoughts on using domain specific ones? Is that overkill opposite if given the performance of more mainstream models and

yeah, so I did touch on this a bit earlier, but I think for us, for our non generative applications, domain specific is incredibly important, right, like you, your general models will not understand the semantics that are necessary to actually provide that search foundation right. In terms of domain specific generative models. I think that's an open question and we're kind of exploring both routes, right? We're trying to, you know, domain adept generative models for specific domain and again, that domain ID bio, as you mentioned, or finance or even legal. But I'm not convinced it's going to be strictly necessary. Obviously, like everyone knows that the kind of strong baseline performance of generalist models, especially when you're applying for kind of reasoning, it makes that that kind of incremental improvement, perhaps less desirable. But you know, if it comes free, or it's cheaper, it's a smaller model. Those are all like no brainers that we would do as well. I would say, you know, this is even in the nongeneral setting. There are cases where base models work well for certain tasks, right, like take named entity recognition, you put aside the Bank of Gadgeteer stuff you need, that the patterns of which named entities showed up and what the models learned are pretty robust against domain are pretty robust against, you know, background shift words that that models never been seen before they still occur in the same context. Right. So I think it kind of varies from problem to problem whether domain is a deal breaker for a client model.

Awesome. I think that's it in terms of questions, and I know we're a little bit over time, but I want to, again, thank you so much for joining us. I'm gonna go to Brian play poll. Yeah, got a lot of people in the IFM career there been a lot of interesting topics. So yeah, thanks so much for your time. Of course. Yeah. Great to meet you all.

Don't hesitate to reach out on Twitter. I think Jerry's LinkedIn as well. So yeah, happy to be here. Thanks for the adventure.

Awesome. Thank you and have a good evening or in North America, or Africa. or anywhere else.