This is Ben gertzel, the CEO of singularity, net true AGI and the artificial super intelligence Alliance. And a lot of people have been asking me the last week or so about deep seek, the the new large language model out there, and so I, I thought I'd take a few minutes to to share some of my thoughts. I mean, I've played with with deep sea, just like a lot of other people, and I've I've also read the research paper behind it. It's it's cool, it's quite interesting. I think it's a significant advance. I mean, it's not like a incredible, overwhelming revolution in AI. It's not even as big an advance as, say, the the invention of transformer neural nets, by by Google Brain way back when. But I mean, it's, it's, certainly, it's a big step forward. And I want to talk a little bit about why and what, what kind of step forward it is, because I think a lot of the reactions I'm seeing out there are quite a bit off base. I mean, deep seek, it's a, it's a significant efficiency gain in the LLM space, right? And that's cool. I mean, that will have a big impact on the the nature of LLM applications, on the economics of LLM applications. But this, I mean, this efficiency gain, it's not like a fundamental breakthrough toward AGI. It's not like a fundamental shift in the center of gravity of AI innovation. It's more like a faster than expected leap along a trajectory that we knew we were following anyway, rather than like, some sort of
disruptive paradigm shift, right? I mean, it's, it's much like phenomena we've seen many times before, like I was doing computer graphics in the early 1990s and if you wanted to render an image in the early 90s, you know, it took a high end supercomputer to render an image. Now you can render an image on your phone. Face Recognition used to be an expensive niche application, you know, now it's a commodity feature on a low end smartphone, right? And so you knew the same thing was going to happen with large language models. You know, they start out expensive, they start out needing huge amounts of compute power, and they wind up cheap and commoditized. So none of that's surprising. What's interesting is just how fast it happened, right? And how suddenly it happened. But wait a minute. Hold on. We're supposed to be on the dawn of the technological singularity, right? I mean, Ray Kurzweil foresaw singularity coming in 2029 Well, the notion of the singularity is, as you get there, technological advances having faster and faster and faster until it seems to be happening, you know, almost incomprehensibly fast to the human eye. Well, this is exactly the kind of thing you would expect to see as you get closer and closer to the singularity is advances happening way faster than your intuition would lead you to expect, because our intuitions are they're sort of tuned by evolution, for linear thinking. They're not tuned for exponential thinking. What we're seeing now is, you know, the kind of exponential advance you see as you as you get close to the to the singularity, right? And I mean, deep seek is just one of many, many interesting moments like this that that we're going to see as as the last years before the before the singularity unfolds. So what's going on under the hood in deep sea? The main achievement here is some very clever techniques for optimizing efficiency, rather than some sort of redefinition of of the basic transformer architecture underlying language models, deep sea uses a mixture of experts model, which is a well established kind of ensemble learning technique. It's been using machine learning for years. But deep sea uses mixture of experts alongside some other efficiency tricks to minimize computational costs in a quite clever way. So I mean mixture of experts, you have a whole bunch of different sort of internal agencies, which is good at a different thing, and you combine together their results to get an answer. And this has been used in. Um transformers and language models for a while, but the the way deep seek uses, it allows just a small percent of a large network to be used at any given time like to give a certain answer. So if you have like, 671 billion parameters, maybe 37 billion are needed to answer a given question. So you have like 118 the the compute power is needed to needed to answer the answer the question, right? So you're wasting a lot less compute power by not using parts of the neural network that aren't actually needed for answering the question, in order to, in order to answer the question. And deep seek makes much heavier use of a learning technique called reinforcement learning, where you reward a network for doing what you wanted to and sort of punish it for not doing what you wanted to, and then this reward and punishment propagate back through the whole network to modify all the the weights between the neurons in the network so all, all large language models are trained partly using this sort of reward based reinforcement learning and partly using other sorts of supervised learning and training on data sets. Deep Sea sort of doubles down on reinforcement learning and uses mostly reinforcement learning. In particular, they use reinforcement learning to train on reasoning, much more so than any other previous LLM approach seemed to and that that seemed to work seemed to work super well, and that they also used multi token training, training deep seek to predict multiple pieces of text at one time, which increases, increasing the training efficiency. So putting all these sorts of optimizations together, you get deep seek to be like an order of magnitude cheaper than open eye and throw up with and so forth for training and for inference, right? So this is, this is a really powerful and meaningful engineering refinement. I mean, it's not, it's not a conceptual leap toward AGI. It's not a whole different way of doing things. But I mean speeding things up and making things cheaper by a factor of like, 10 or 20, is pretty amazing. And I mean, and it helps you, it lets you rethink the whole economics of what you can do with with llms, right? Another really interesting thing about deep seek is its embrace of the open source approach, which is quite a contrast to the walled garden strategies of for example, open AI, which, in spite of having open in the name, has never really followed a terribly open strategy. In fact, if you looked at open AI's website back in the very beginning, when they first launched, you could see they were hemming and hawing and saying, Well, we'll be mainly open, but when we judge it's most appropriate not to be open, maybe we won't be it turned out, well, they judge it was almost never appropriate to be open. Anthropic, most others in the US AI scene have followed suit, with a notable exception of Facebook, which, under the direction of young lacunae, has opened up their llama AI models and a whole bunch of other amazing stuff. Well, deep seek has opened their model, they published a research paper describing what they did. And I mean this, this, I think, is an amazing positive thing. You know, open source AI fosters rapid innovation, broader adoption, collective improvement. I mean, you know, business wise, proprietary models do let companies capture more direct revenue in some ways, but the open approach, on the other hand, leaves plenty of business avenues open, and it allows a much broader community to contribute to development. It makes tools available to more researchers, companies, independent developers. I mean, the hedge fund high flyer that developed deep seek is not a charity, right? I mean, they know open source AI is not just about philosophy and doing good for the world, then they know there are quite valid, solid business models that come along with, with open source you can do services, enterprise, integration, hosting, you can use, you can use your own model to make make money on the markets, right? So, I mean, I think, I think it's fantastic that this huge step forward in the efficiency of of llms has been rolled out open source. And,
you know, I think it's quite cool that this big open source thing has come out of China as well. I mean, China has been i. Doing a lot in AI for the last couple decades, but has not been such a big mover in open source. So it's really quite interesting to see China moving in a big way in an open source direction, I mean, via private enterprise rather than governmental action, but still bubbling up out of the whole Chinese AI ecosystem. And you know, while the combination of open source in China isn't all that famous and in common, I'm not, I'm not at all surprised at a breakthrough like this on the science level came out of China. I mean, I spent 10 years living in Hong Kong and spent a lot of time in and out of mainland China. So, I mean, I saw firsthand the huge scale of investment in AI research in mainland China, huge number of brilliant AI PhDs there, and you know, the intense focus on making AI powerful, cost efficient, and rolling it out at huge scale across the whole country, right? And obviously beyond just AI, this is not the first time China has taken Western innovations and then, you know, optimize them for efficiency and scale scaled them up in an unprecedented way, right? Like China is actually often very good at that scaling up efficiently. So you you could look at this in a geopolitical rivalry sort of way, like, okay, China is not going to be left behind in the AI race. But I would look into more as a step toward a more globally integrated AI landscape. I mean, I think beneficial AGI leading on the beneficial super intelligence is just way more likely to come out of open collaboration and open source tools coming out from parties all over the globe than to come out of proprietary work being done in nationalistic silos, you know, a decentralized, globally distributed AGI development effort, rather than a monopoly by a single country or company, it's going to give us a way, way better shot at making AGI systems that serve humanity and all sentient beings. What are the broader implications here? Well, you know, first of all, I think llms, while powerful, they're not the future of AGI anyway. So I mean the the implication of faster, better, cheaper llms for AGI are, are somewhat limited. You know, as as Gary Marcus and others have pointed out at great length, I mean transformer neural nets, which is the key technology underlying large language models like deep sea chat, GPT, llama and so on and so on. I mean transformer neural nets lack core cognitive capabilities. I mean, they, they lack the ability to direct their own reasoning based on an understanding of themselves and their relation to the world. They lack the ability for grounded compositional abstraction. I mean, they're, they're not going to really grow into autonomous human like minds that can take big leaps beyond their experience. Llms appear to be general because their training data is so general, but they can't actually leap very far beyond their training data, the way that people can take wild, crazy, imaginative leaps that said, I mean, llms can automate a huge number of economic tasks and transform maybe every, every industry. And I mean lower cost, faster llms can speed up that process, which is fantastic, but in terms in terms of the pathway to AGI, I don't particularly think faster, cheaper llms helps that much. It may help a little bit, because llms may be part of the scaffolding that helps you help helps you get to AGI. If we look beyond the AGI, though, just in terms of AI investment and the impact of specialized AI on different industries, you know, I think the commoditization of llms, which deep seek exemplifies, has tremendous potential to shift AI investment in in various ways. I mean, deep seeks efficiency gains. They accelerate the trend of LMS becoming commodity, right? And as LM costs drop, first of all, investors will start looking to other stuff toward the next frontier of AI innovation, okay, llms are a commodity. Well, maybe we go to neuromorphic chips, maybe we go to humanoid robots, maybe we go to neuro symbolic or evolutionary AI, different architectures, beyond, beyond transformers, right? But, but also, I mean, now that we have you. Smaller, faster, cheaper, llms. I mean, we can deploy them on decentralized networks. We can, we can put them on the, on the, on the edge, right? It becomes much easier to deploy models in decentralized networks like we have at Singularity net and the artificial super intelligence Alliance. So I think, I think deep sea and the other models like it that will surely soon follow are really good for the decentralized AI ecosystem, for every AI blockchain project, because they make it just so much easier to roll out big decentralized networks with llms and the nodes putting these points together. I mean, overall, I think deep seek is sort of an interesting milestone on the path to what I think of as a kind of AI Cambrian explosion, right? I mean, the Cambrian explosion was a point in the history of life on earth when you had a huge proliferation of different forms of life and all different shapes, shapes and sizes, right? And I think having faster and cheaper llms In a way, it can do that right. It makes high quality AI more accessible
and more affordable. It reinforces the inevitability of exponential progress in AI and the the international scope of cutting edge, cutting edge AI development. I mean, not that long ago, Sam Altman was saying, Well, it's kind of hopeless for startups do anything with merely 10 million well, now it's pretty clear that's not so true anymore, right? We can have a much greater diversity of startups, research projects, nonprofit efforts, you know, making real advances in AI. I mean, for my own research projects, with open cog hyper on with singularity, with the ASI Alliance. You know, previously, people were saying to me, Well, sure, you're a reasonably big project. We don't have hundreds of billions of dollars. Like, even if you have better ideas and algorithms about how to build AGI like, how can you hope to compete when you when you don't have billions and billions of dollars of centralized servers at your disposal? Well, since deep sea, I mean that that idea doesn't seem so sensible to people. People consider it a lot more plausible that, you know, with the the relatively modest amount of of wealth at our disposal, like a crypto project with merely a few billion dollar in market cap, could be able to compete with the big boys and make the breakthrough to Ai, ai and super intelligence, right? So, I mean, I think there's a practical role that deep seek makes llms, you know, cheaper and better and more widely accessible. And there's also a psychological role that that deep sea indicates, yo, there's not just a couple companies that can be at the cutting edge of AI. It's a much more open landscape than that. And I think that's a very powerful message. I always knew it was an open landscape. That's why I'm working as hard as I can to build, to build a beneficial thinking machine with my colleagues at the ASI Alliance, right? But that, but I think that point is much more clear to much broader variety of people post deep seek. I mean, I think that sort of market crash and AI stocks and tokens after deep sea was announced is, in a way, a little bit silly. I mean, I can understand it from a market dynamic standpoint. And like in video, stock had gone way up because people thought there will be a huge shortage in GPUs, because all these big companies would need to buy all the GPUs, and they'd be vying to buy them. Then in video, stock went down when people realized that ridiculous shortage might not be as bad as they thought. Then many of the same investors who invest in other AI companies and token projects also had money in Nvidia. They lost some money, so they pulled money out of other AI projects. I mean, you can see how the market dynamics would lead to a crash like that post deep sea. But fundamentally, you know, the the rational impact of deep sea on the the equity and crypto markets for AI is great. It's like, you can do more AI with less money. Now, this is tremendously good for everyone in the AI market. Like, it's just one more piece of evidence the singularity is, coming fast, right? And you know, if we want the singularity be beneficial, we need to ensure it remains decentralized, global and open. And the fact that deep seat came from China, from a different side of the world than most of the recent AI innovations, and that it's open source, I mean, this is a view. Full piece of evidence that the singularity may well be beneficial, because AGI may well be decentralized, global and open, right? So, I mean, deep seek certainly isn't AGI. It's not like a revolution. It's an efficiency optimization. It's a step along the way, but it is an exciting step in this, like overall dance we're all doing in this final few years before the the singularity, it's going to bring us AGI and ASI and a whole bunch of other amazing consequences. You