Innovating Infrastructure for LLM ByMonica Zou (CMO, Dekube.ai), Denis Skrinnikoff (CTO, IREN), Xin(Bill) Yao (Serial entrepreneur, AI investor, Novita.ai), Paroma Varma (co-founder of Snorkel)

11:25PM May 29, 2024

Speakers:

Keywords:

gpu

ai

infrastructure

decentralized

run

models

gpt

decentralized approach

company

side

year

today

day

utilized

compute power

cost

inference

enterprise

api

consumer

Question will focus on innovation and challenges in the infrastructure for large language models, for example, the development of new chip designs and computing architectures, as well as breakthroughs and so on. So let's welcome today's moderator, Alex wies and the following panelist, Monica Zhou, Dennis spinelkov, yo Yao. Welcome you.

So when you're done, don't turn it off. Oh yeah. All right, hello everybody. We have a really interesting panel on innovating LLM infrastructure. And to me, infrastructure is very exciting because it gives us a window into what is possible, into the future. It's the base layer of what future applications will be built on. And I have with me two esteemed panelists to help share their thoughts on what's leading edge. So without further ado, let's start with quick intros, Monica,

thank you. Hi everyone. My name is Monica Zhou. I am the co founders and CMO at D cube.ai so we are a decentralized GPU network that transforming consumer level GPU to an enterprise grade GPU that can be utilized for LLM training and fine tuning. Very glad to be here today. Can't wait to share.

Okay, hello, everyone. My name is Bill Yao. I'm the co founder of the noveta ai Novita. AI is a model hosting API platform, and we have integrated most of the famous open source project, and also we can providing the easy to use functions to help the API providers for the AI applications we have using the stable diffusion models, mixture lalama three models, and we are just mainly focused on to reduce the inference cost, to help the everyone can easily to use the AI. Before starting this company, I'm a serious entrepreneurs. I found you the largest peer to peer based video website in China called pbtv Around 20 years ago. Yeah, I also do Android investment in broad ventures around 10 years ago and focus on the early stage technology innovations. Yeah, go to you.

So I am General Partner at Verda ventures, and we invest at the intersection of tokenized infrastructure and AI. So first question is, what are some of the key infrastructure challenges, and how do we break through that? How do we innovate out of those current bottlenecks,

okay, yeah, I think the most day, the most important thing is the AI is too expensive for most of the user. Case, for example, for the latest update, GPT four. Oh, we have a calculator based on today's price. And if we have 1 billion user, can be used. You know, in the world, I think the 1 billion use is a it's a normal numbers. Eventually, at least everyone at least using 7000 tokens per day, and we have calculated every day, the inference costs nearly about 100 oh, 5 million US dollars per day. That means for one year is nearly about 14 billions. So you know, last year for open AI's revenue is only two billions. The too expensive to for most of the use case to using the this API actions, although the, I think the day, the Jai performance and abilities can solve a lot of problems, but we have to reduce cost at least 10 Times, 1000 times. So can we unlock more and more scenarios? Yeah, this is a the most challenging problem we think that we face.

Yeah, I totally agree on that. I think the GPU power right now, what is, well, we all know the GPU oil, right? And then I think a lot of the startups, AI startups, or individual developers there running to this GPU poor and I think not just, you know, AI startups. I was reading this piece of news a couple months back saying that a top tier Research Institute within Stanford University Lake got 65 GPU for entire years of work. You heard it was 65 to 5000s where, you know, meta Microsoft, they're getting 10s of 1000s of that. So I think that create a huge gap between those, you know, company and those individual projects, or even just, you know, research institution. So I think that is something, I hope that we with, you know, some innovative infrastructure that we can have that fixed

technical difficulties. And I know your company has taken a decentralized approach to help with some of those scaling efforts. Would you help us understand, what are the advantages? What are the benefits to this decentralized approach?

Happy to I've already explained this probably 100 times today at our booth. By the way, we have a booth at 839, so feel free to come up and then have a chat with us, and then grab some days in giveaways. So DP, DQ is a decentralized GPU network, so we sort of utilize everybody's compute power. So for like, for example, everybody sitting down there, you have your computer, sleep at your place, doing nothing, where you can actually do is download a client and then just connect your GPU to our network, and then at dq, we will be utilized. We will be able to transform those consumer level GPU to a more powerful enterprise level GPU to be used on the large language model, fine tuning or training. I think that's a very innovative way to utilize those idle GPU power. And, you know, like we were discussing this, you know, because for a lot of the AI startups or individual developers, they're just not financial, financially ready to support their team with the GPU that they actually need for their work. So I think this is the way that they can have the GPU with a very cost effective way, and also us, as, you know, like a consumer, we can have our idle computer power utilized, right? So that it, I think it just benefits a lot of ways, like not just, you know, cost wise, but also energy, so that we don't actually have to, you know, get in the line, wait for GPU products. We don't have to relying on, you know, some very centralized whereas we can just utilize our own GPU, your laptop, my laptop, at home, and then support the AI innovation. Yeah,

maybe I can give you some more information about the decentralized because previously I found the company called pbtv also using the decentralized way to aggregate out of the bandwidth. Because we think every PC, they have the bandwidth to connect internet, but most of the usage, we are just using the download bandwidth, but now using the uploading bandwidth. So my first company is to we have buildings are peer to peer streaming networks. That means everyone can get a streaming not only from the server side, but come from the other side. So so that is much more like a sheer economic of the bandwidth. So I think maybe for today, the because of the development of the AI and we are lack of the computing resource. So the but we know a lot of persons, they have a play games, they have the graphics card, and they have a lot of the investment on the GPU side, but they are not fully used. I guess, for example, we have a Novita when we created this project last year, and we found the most scenario is useless. Scenario is a generator image using the stable diffuser and the first we found the stable diffusion model is not a huge parameters, it's less than 10, 10 billion parameters. That means we can run the stable diffusion in graphics card such as 3090 RTX, Nvidia, and also, last three years, the 4090 they have the 24 gigabytes and can easily run most of the modern language models and also the civil different models. So that is a we also want to aggregate more and more decentralized CPU networks, maybe for the decentralized way it's harder to use for, for, for training, because training much more like a centralized 100 of GPU, 1000 of GPU together to do the silver computing. But for the inference side, I think this is also can run in the distributed ways, so that is a can fully use the decentralized GPU networks, yeah? So that is, can help our AI to reduce a lot, yeah?

Well, 100% I think for decentralized compute power, something we can definitely do is to do some fine tuning work for some domain specific, large language model. So for us, for example, we're going to cluster in our task pool for some similar tasks to deploy on the same size or same power GPUs, just to make it stable and reliable for the developers. And just how funny that you mentioned the gaming community, right? We all know that gamers, that's the best computer in a world. So, yeah, so, but not just the gaming community, not just the gamers. I believe every one of us sitting here, we have the laptop, we have the GPU power that can be utilized for the AI renovation. And the best part of that is, I think each one of us should get rewards for that. So that's how, yeah, that's what we do. We want to reward our community for that.

All right, you've heard it from Monica. Everybody can use their GPU to earn money and leverage all the compute power to solve our infrastructure bottleneck. So all of you have homework to go back to tonight. Tell us, Bill, how do we balance costs with maintaining the accuracy of the llms, I know your company, Novita. You've got 1000s of API pre trained APIs. How do you manage those costs while still delivering very accurate results from the llms, yeah,

I think we have embraced about the open source and the community today. We think there for the AI, there are two solutions. Why is the close AI, but we are much more embraced about the open open source projects, because we found, for example, we last year this time, the lalamas one released, but at a stage we have evaluated the performance of the lalama one, it's less than the GPD 3.5 but around three months later, GPT Lamas two already launched, And we found that we can do a lot of more based on the lalama Two. Recently, some of our customers just move their their application. Previously, based on the close AI, nowadays, just move to a lama three, especially for the recently lalama Three, they have two versions, large version called 70 billion parameters, we have to found most of the user case the last 370, billions, much more have the same performance, much more Like the GPT four. So because if the application developers based on open source side, they can they only need to pay the cost for the GPU and the infrastructure, not to pay for the sole algorithm and the operating cost, so that can be much, more cheaper than the previous using the close AI. So that's why we embrace that. And our platform, we have a lot of the fine train, the low run models and other models is also contributed by the communities. We have cooperative means model, many economic creators, they have a create a lot of the their own characters, for the for the chat about, for other things. So that can be easy to use. We also do the revenue sharing with all these creators. So that is a build a good business models to have benefit every side so that, that is why we adjust a platform can run, not only to do all the things in one company. Yeah,

did you want to see anything? So how do you think community and Transparency can potentially resolve, you know, some of the ethical issues that come from output from from these llms, does the decentralized approach you take, does that help resolve some of those potential ethical concerns? Or how should we approach this issue?

I think that's definitely one of the top priorities issues we are running, especially with the decentralized network, because everybody's talking about security, right, the data harvesting, and how can we tackle that? And I think this is where we can introduce decentralized way to do that is to run that data layer on the blockchain so that everybody just have a decentralized way, an angle, a ledger to control their own data. So everything, for example, for gcube, everything is stored on the storage node, not even our administer have the access to it. So everybody have the full control, and only the user have that full access to their own data. So this is the way that we want to make sure. We want to make sure that our users are fully, fully safe with their own data. So this is, I think, the number one ethical issues we were talking about, says is the security and the data and the privacy and such.

So it sounds like decentralization can become a solution to some of the ethical concerns. So final question, what are the emerging trends you're seeing that will affect or impact infrastructure going forward. What are the most exciting kind of future trends that we should be looking out for everybody in the audience?

Absolutely video, but it's a long way to take nowadays. Think we have to see the traffic about the the AI influence usage for most of the user case. We think if you want to earn the money, the best way to to do the not just for the consumer side, but to help the enterprise side to improve their internal efficiency, for example, for the sales marketing, for the service management and other things, this is easy to get as many but for the consumer side, recently, we monitor The traffic, and we found that kind of the Chatbot said, just we can see a company grows very fast, but most of the we basically the worldwide, not the growth speed faster the chat about, not only like the character AI, is a normal AI, but For the for the, for the, for the specialized or specific domains chat about, for example, for the ACG, and also for, we can see is very in Asia, a lot of the want to do the role play, and they want to have the maybe AI girlfriend as a as his companions this, this kind of the chat about grows very fast. But finally, I think video will be the ultimate way. Yeah, the when the video generated AI comes more, much more affordable. We can disrupt the knowledge most of the social medias and also the video platforms that will be a huge market and a huge potentially today. Yeah,

absolutely. And I personally are looking forward to see more applications to be integrated into, um, mid size, a small sized business. So like everybody, we need our own AIS system. I think what I'm looking forward to see is like all sweet solution to be integrated and implemented to a lot of the small business to help them to run their business more efficiently and less costly. So this is something I think there's a huge potential, both on the research, both on the product side of that. So this is something I personally would love to see in the near future, and I believe it will come really quick.

I think we have time for one or two questions from the audience. Does anybody have a question? Right?

So, great talk. Thank you very much. Let's check between enterprise and consumer facing infrastructure, especially for consumer facing LLM infrastructure, what's special about it, and are there any innovative ways today to our hosting for consumer facing apps such as digital companion are all personal agents. So

so we actually can hear clearly there. So my question is, between enterprise and consumer facing application, what's special about the consumer facing LM infrastructure, especially for digital campaigning and the personal agents and also other innovative ways today to host,

yep, so that, sorry, a bit of additional technical difficulties, but we're looking at the difference between enterprise grade solutions and individual consumer level solutions. What are the differences in how we support that infrastructure, especially with regards to personal agents, and is the infrastructure today able to support those those personal agents looking forward?

I think the to the enterprise side and to the consumer side is difference for enterprise side, normally, we are using the enterprise or their self, own datas to do more fine training, although also to the rack, just retrieval, agreement, search. So that is that is a usage is much more like the example, sales, marketing, service management and also, but for the consumer side, it's much more like a general purpose. So we have to more enhance RM abilities. That is why I just mentioned, just like the GPT four levels, models can run in the consumer side, for example. That's that's why we are hosting the Lama three nowadays. But previously, we just unissue the demo not tend to be the commercial use. Yeah.

Yeah, I was gonna say about the GPT four, oh, I think everybody was in awe when we see that came out that day, that they can enter interpret a that your facial expression, your tone, and then emotions. I think that definitely require more data, more training and more customized and personalized fine tuning in terms of an enterprise level model, yeah, I

think we have Time for one more question.

Sure. My question is, there just so much talk about llms and Gen AI in general for a CIO or a CTO. What are some of the things they should be considering as they think through like decision making process to set up a infrastructure for their enterprise? I

Okay, so from the perspective of a CIO Chief Information Officer, CTO, what are some of the key decision making factors that go into deciding the infrastructure that they want to apply?

I think the cost is maybe the most important one? Yeah, nowadays, especially, I think, for example, if you do the investment on the infrastructure side, we have faced a lot of problem. The one problem is acceleration of the technology side. We have to determine what kind of the GPU we have to use. For example, the Nvidia and others view cards just one year later is price. Maybe half goes half down. So if you do the investment and want to use for a long term, that means your the investment on the capex side will maybe have the lost. So that is why most of the, I guess most of the enterprise side will choose to to run from the cloud service provider, not build their own. That is why we want to using much more cloud services side, not like the traditional if you only want to find training or just to train a model, just using the on premise infrastructures for the inference side. I think more and more will be rented by the public cloud.

Yeah, I think they're gonna be a more alternative way to be in the market as the as the compute power provider, as a traditional way. And, yeah, I'm also looking very looking forward to that.

All right. Thank you everybody for your time. And I think the panel so and myself will be off stage if you have any further questions. Okay.