AGl Belongs to The Community in Open Source: Democratizing LLMs in The Service of Life and The Planet with Sri Ambati (CEO, H2O.ai)
5:16PM May 29, 2024
Speakers:
Sri Ambati
Keywords:
ai
agi
gpt
open source
document
customers
labeling
incredible
model
great
supply chain
dow
detect
data
world
predict
cost
stage
small
today
Thank you for having us
how many of you are here for the first time so quite a few alumni here, we started h2o in San Francisco almost five years ago, as one of the first non AI domains. We want an AI like water everywhere, and it has come as gonna go backstage to the demo, because the computer is not connected here. So please, bear with us
so before Mr and bad t go off stage and run some errands. I just have a little announcement to make. So of all the guests here, feel free to take pictures and tag us on Instagram or Twitter x x Gen. AI, and we will find our three prize winners to get our free T shirt with this year's little prize. So you may tag us anytime or use hashtag Jen AI summit 2024 And then we will find the lucky person to get our little prize thank you
so before
before we welcome our next speaker, let me just do a little quiz with all of you. How many of you know what is generative AI? And how many of you are using AI tools right now? I don't see a lot actually do know what is this event about? And boy as our main host today? Yes. I got a few. Yes. So yeah. Let me just briefly introduce to you about GPS to you about GPT Dow at Dow real quick, real quick. So GPT Dow GPT. Dow is actually a is actually a decentralized, decentralized, autonomous organized autonomous organization, a nation that is volunteer led is voluntarily part of the practice participated by the paid ed by a lot of anti lock printers, students, professors and scholars in the Silicon Valley area who are interested in generative AI contents and artificial intelligence. It was initially organized by our founder Rambo, Rambo Ren. He is an excellent engineer by himself. And then many of the students who joined dbt Dow were actually PhD students from Berkeley and Stanford University. So we formed this little doubt to you know, educate people a little bit more about AI. And we hope that people will all of the world can learn more about Gen AI. Since last year, we have our first annual conference. In the Santa Clara stage. We already welcomed more than 10,000 participants in this conference. And this year, we will be welcoming more than two hundreds of firms VC firms as the firm's and more than 300 and more than 300 activators of the betters on our stage in our stage. So before so before we welcome our smooth welcome our speakers. If you have pairs. If you have any questions about questions about dbt del dbt del, feel free to feel free to search for us search for us on Gen AI on Gen AI, Thank you
say this way to say this twice to help me out with that to help to do that
to the AI to join GPT Tao joining GPT is out thank you so much. Thank you so much thank you all for waiting patiently. While you guys wait. I want to share some more information about our speaker that's coming up here shortly. So, as I mentioned, h2o dot AI has raised a total of 251 point 1 million in funding over nine rounds. The latest funding was raised on May 12. It shows that AI focuses on ease of use and accessibility, automated machine learning automl scalability and performance. And currently, h2o dot AI has a valuation of 1.6 billion thank you
it looks like our speaker is almost ready to come to the stage. Please give him another hand.
Thank you, thank you so much for your patience. I think while we're waiting for our guest speakers to get ready, of all of the attendance of our conference, you may take a short break for five minutes. And we will have the slides ready. I don't know what other things are showing us fully small vLLM on the phone. So I don't know between the demos showing is the actual the QuickTime the same LLM running on the phone. So an AGI let's get to the next slide please. Like music. Math is
a generational wealth. Our civilization has built over
generations. And so dealing like
All the data is code is AI. Why? Let's take a look at our journey today, right? So, journey today, data scientists mostly focus on 100% data, but storytelling is where boards and people are really about tell good stories, and our story started almost 12 years ago. If there's one thing you want to take away from this in between time is the only normal source most of us live through covid. BC is before covid, we democracing ai over the last 12 years, 20,000 nominations use a product, about 30 product. They make our products because the customers get obsessed with customers and make a culture. Our products are all must be filled out. Also. You tried on your smallest machines. You can get a hand on. This is about five the top 10 tackles and more than 30 members of the world's top AI talent is that HDR, and they come together to build incredible elements. Daniel, two element I was demonstrating earlier, BC is before covid. A lot of folks have historically forgotten that wars viruses and fake news or superstition is still what gets the world, and AI can help. Every nation will need its own AI. AI is already being every person will need their own land. And what is IP in this world? It's the prompts. People think that prompts will be engineered itself out of prom, but prom is is a true idea in the gender, athletic, and I think every organization is wounded on GPT number GPD was being played.
this this tennis qualities page
which was open source government nation is actually involving Russia that landmark not fully actually trying to find a way you'll see that but but I think the crux of it is that you will not be able to replace AGI we're all here the past has AGI not just a few open source projects out there that are bringing this together Gaia benchmark is one of them autogen several quoting general assistance as we open them in greater marks our obligation is still we can see in these benchmarks at least benchmarks is still very early it's still very early days expand even for GPT-4 us architecture. So I think he's still very very early days next for many years ahead of us we do want to
because it's still
the most scarce resource in the world this is how he was nominated for and he is a very valuable resource loves a very valuable resource but he knew how to use AI for good I started the company because my mom had breast cancer and we want to fight cancer and that's to do incredible transformation open source or food source we have so
many master handling supply chain distribution
to predict hurricanes is the most important part great responsibility my daughter's inspired me to work on supply chain with the delta I voted with the commander in the way that startup was working on columns and aggregates in supply chain supply chain there has been still very familiar
and today
some of these ways to predict which telephone poles going to hit by medication and again first response
you can think of what happened is not
healthy for us move on to working with Equifax is one of the number one for subprime credit score and tenure record is really related to how can we summarize our generations revelation help fight cancer join us
alive lot as it'll bring a lot of abundance. Let me get go create one of the core missions
but it's predictive AI that's going to make orders of magnitude outcomes, we can predict the future and tell a great story, you can learn a lot of people together to follow you on that journey. So I have to chain mostly
in open source some, some private,
but mostly open source. Labeling is now a bad word for AI is happening in labeling, and labeling the transformer machines and allows you to label a fraction of the cost that other companies have spent our models costs, roughly the cost of it, or $30,000 to 1.8 billion. But I think, connecting the dots between the general
generic AI, and the predictive AI,
that's where
you're gonna see, value, and document
form. And together enterprises, documents and data documents, how you document your process, company, and data is predicted, bringing them together and is actually where the maximum ROI for your customers. We want to raise as far as not just see what's happening here is the power of open source.
But he was not an open source was the fastest earliest sources, and then
by Python, and TensorFlow open source for the leader. But we are currently on the shoulders of giants. And BDR, we'll deal with them. But, but I think, to bring all that together, you did an incredible team. And I'm so grateful for the team we managed to over the years, but as part of that team, executing work. And that community, of makers pastors in the future and the community, a lot of the folks out llama tree and other projects from the earliest times the
technology AI was actually built over the years, forcing everyone willing to be more valuable than building in large companies. The cost of AI this is exactly where we focused on SFM really, multiplayer and data labeling, obviously latencies small models have latencies of less than four seconds. validating your models is going to be super important. About studio as a product, we have these open source. What it does is allows you to create emails, email for us first design your applications developers design of your vLLM is fine tuning vLLM Almost super important. When generation is free curation is more valuable. So the way you rated with three advanced studios metrics. So the foundation maybe leads to twice actually every season. This model is already FDA, we use Mr. Incredible in the missile that preceded us. We use both methods architecture, but also the tokenizing. We dropped the windowing architecture that actually but in terms of trying to build this model, it really cost us less than 30 $35,000 to build the model from start to finish retraining to finish model and maybe release it consistently in February the second, this is the work of Roskamp and the rest of the team, but they release in the boardroom, 3 billion, 3 billion tokens, tokens and so a little bit more a little bit more brain power, brain power,
right and also
still come less than 50 come less than this, this border is open source, open source open very open very, and is ready to, is ready to run on your phone, on your phone
the most common use of most common use of the smaller lenders and smaller lenders to be God with the god relevance and relevance when sometimes when someone asked me a question, asking a question in your back, back system, should should you answer that question or answer the question or not? That's the game. That's the game. If we can revenge, we can prevent MDI, from coming out of you from coming out of your system, system. That's God, that's God. And I think it's important, I think it's important to have to have a level set, or a levels or levels. Other use case, other use cases you've seen detective, Detective, ti, ti detecting, detecting content generation generation, for example. People, some people submitting RFPs RFPs with vLLM children at the government, the government wants to catch it or wants to catch it. Or people want to know what to
do the second one. The second one is it's easy. It's easy to detect human given index human
generated content and better content
than to then to detect integrity. So traditional machine learning methods of classification methods that can actually with high degree of accuracy, detects children written co written documents, or doctor's documents or lawyers documents, it's easy to detect. So that's what they use as a technique. Prompt recovery, when someone makes a phone call for a call center, you want to know the intent not just respond if it's a complaint or not, you want to know the intent. Why did they actually call and that's as simple as that similar to recording the recording accordingly affordable, affordable MSM. So I think the complementary that's a technique that many of these are Kaggle contests. So many of the contests are, our own folks are using any tool, but we would love for you to try it and let us know what you find. But the goal of small lamps are better on the environment, or better on the footprint, they can run on a cell phone, they can run on a computer, right? So for the enable to probably produce 200x in cost, where it isn't the cost of retraining, we have actually allowed our customers to resign data. So that in the hope that you're not trading costs downstream guardrails not as an order to do but of course. Also, when you find your new model, you can introduce new ways to to kind of break the doorways. In some of our workshops, we actually will teach our customers and our users community members to break away one of our upcoming meetup audiences very hands on we can have you break in by fishing and say things that it shouldn't be saying and then and then progressing all these 11 Studios a great open source tool that you can use
Give yourself intuition, this thing working. The incredible paper that came out from from the paper on the Golden Gate is a great way and insight into how revenue functions can be used to understand which features or which part of the matrix or weights are actually impacting the concepts in telehealth agents is actually something of a third off, you'll be hearing a lot for this agent. Architecture is probably one of the things one of the doorway to go into HCI at the current stage of contention face conformance. So I think we are going to use automatic machine learning that was built with AI, and very close to generative AI. Government AI, we've actually been documented several times now, including using supervised learning as well as machine learning. But what we found is annotation and labeling are
the most expensive parts of any document AI project.
So our LM zero Shot and Shot learning using color levels, has actually matched the performance of traditional supervised machine learning. Using generative AI, so I think that's kind of where from deep learning, which is saving the problem and expecting learning expecting what to extract from your document is actually a very powerful tool. It's available in Python. So this again, obviously, all product demos are available on our website is still not able to see the document. The idea is, today will be convenient as a scan for fixing copy of what came up from a scan text. But I think we can do better.
I think the previous slide kind of said, data is actually a sensitive word to give. And I think it is important to remember that AI is actually there.
And it's gonna share
its abundance, space, as well as outer space and in that space is not mine. So if we can actually capture the core of what we want to do on the planet, to eliminate something great for the future. That means you need to build your own way of building AI. It's also evidence of manual energy, where we go down space today misusing the same materials
of the today, the new
materials will let us go even farther forward with great responsibilities. And we think that using AGI is going to be coming from the community. So empowering homemakers is going to be the crux of it. So long time ago, they once were Vitess wars and superstitions. And today we call the virus Watson and fake news, we still have the same problems, AI can make a difference, we can make a difference. And we are always sitting on or busy dying. And I think to some degree, we usually usually at night. This will be fun, right? Don't have gratitude for your for the audience and the world. The universe has opened its way to make AI possible. So that be for us, for me and many others before me all the way from Alan Turing. Right. And before and so in that long journey, we've come a long way. But it's still just the beginning by 2030 $100 billion economy with a US income. I think we will be recreated some of our customers supported us over the years. And I think it's just the beginning of building a very rich tapestry of customers globally. And I think it's still early days for all purposes these boys that's why it's important for relevance to really be at the forefront. At length it's important if you leave with one thought for this conference, make data and AI first class assets for your business for your organization. Thank you all open source ecosystem that brought us this far, young Lego and others. This is just the beginning. I think I'm gonna I'm gonna leave with one final thought which is the use AI to do AI but I left my phone there with the devil. So I'm gonna borrow someone one of your phone to actually say, you can be myself, I can be yourself, and I borrow your phone. Thank you so much Mr. embody. And might I just say, your vision of bringing your vision of bringing AI to open source is a really good idea. And I think that everyone deserves AI and SSL
Selfie time.
Thank you. Thank you so much. Give it up for Mr. mozzie.
As I was saying, Mr. Modi's vision of bringing AI to open source is a really great idea for everyone. Because as the technology advances, even more people are going to be left behind and not sure really what's going on in the world of AI. So
it really aligns with our mission here at Gen AI
Summit, San Francisco, 2024 and GPT doubt. Thank you. Thank you so much, Mr. And baddie, and I hope you're enjoying this really tech oriented and domain focused keynote sharing this morning, and I hope
it's not too dull for you. Actually, we designed a little game for you, but since our