And now if we divide these three era into four aspects, the data, learning capability and the model, we get this table where, in the classical era, we use little to almost no data and the learning. There's no learning. Mostly it's engineered. These things are engineered objects, and the capability is very brittle anything outside the templates, good luck. And the models at most are shallow models, or no models at all. And then the neural era, we have these constrained benchmarks, these small scale data sets, and maybe ImageNet 1.2 million is already the largest scale. And for the learning we do mostly supervised, where you have the label data and the capability are more like domain specialist. And these models are the deep models. And the next error, the generative error, we open up the data into open vocabulary, right? Any words, not just 1000 categories, anything is fair game. And we scale to the internet, scale of things using internet text, right? All of Wikipedia is too small, and all of YouTube, all of everything, and the learning now is self supervised, right? Transformers predict the next word. You don't need a human to label the next word. It's already in a sentence. And diffusion models, you just need a collection of images. And the capability is we can now do domain general prompting, right? You can tell the model what you want in natural language, and we call them foundation models. And now, what is the next era? I know you're attending a joint AI summit, but let's talk about what's beyond the summit, and that is the agentic era, where, on a data side, we want infinite data now, because internet is too small for us, right? Internet is not enough data. We want infinite data, and we do that by using worlds, right, interactive environments in simulation of the real world, and we use self explore and self bootstrapping methods to learn autonomously in these worlds. So we don't collect data, right? Humans don't curate data and send it to GPT. You collect data yourself. And the capability, the core capability here is decision making. And what is the model, and that's the next chapter of the talk. So this is the agentic era, where it's categorized by a world, and the agent, and the world sends the agent a bunch of tokens, as you know, sensory, right perception, and the agent sends back a bunch of tokens as actions. So for me personally, my journey in the agentic era started in 2016 when I was taking a class at Columbia University, but I wasn't paying attention to the lecture, and instead, I was watching a board game on my laptop. And it wasn't just any game, it was a very, very special one. So this screenshot right here is the moment when DeepMind AlphaGo beat Lee Sedol, the reigning champion at the game of Go, right? The AI just won three matches out of five. It became the first ever to make this achievement. And unlike chess, go has a much larger search space, and it's more intuitive than rule based. It's impossible to use classical methods, and AlphaGo is this end to end neural network that plays go. So I still remember the adrenaline that day, right? Of seeing history unfold, of seeing AI Asians finally coming to the mainstream. But when the excitement fades, I realized that as mighty as AlphaGo was, it wasn't that different from Deep Blue 20 years ago, right? It can only do one thing and one thing alone. It is an agent, but it's not able to play any other games like Minecraft or Dota, and it certainly cannot do your dirty laundry or dishes. But what I truly want are AI Asians, as versatile as wall E, as diverse as all the robot forms, or we call embodiments in Star Wars, and works across infinite worlds, virtual or real, as in Ready Player One. So how do we get there in possibly the near future. And this is your hedge hikers guide to the agentic era. Most of the ongoing research efforts can be laid across these three axes, the number of skills an agent can do, the body forms or embodiments it can control, and the realities it can master. And here is where AlphaGo is, but the upper right corner is where we want to go. So I've been thinking for most of my career how to travel across this galaxy of challenges towards the upper right corner. And earlier this year, I had a great fortune to establish the gear lab with Jensen's blessings. And I'm very proud of the name gear stands for journalists, embodied Asian Research. I also had the honor to co found gear lab with my longtime friend and collaborator, Yuko Zhu. And this is a picture we took almost eight years ago at Stanford, when we were still PhD student students at Fei fei's lab. At that time we did, you know, hackathons all the time, especially before deadlines, where we were most productive. And just look at we were so young at that time, right? What PhD did to me? Right? The pursuit of AGI is all consuming, its pain and suffering. All right, so let's go back to the first principles, right? What essential features does a journalist agent have? First, it needs to be able to survive, navigate and explore an open ended world. An alpha goal has a singular goal, to be the company, to be the opponent, and it's not open ended. And second, the agent needs to have a large amount of pre trained knowledge, instead of knowing just a few things for the specific task. And third, a journalist, agent, as the name implies, must be able to do more than a few tasks, and ideally it should be infinitely multi task, right? You give it a reasonable instruction, and then the agent should be able to do whatever you want. Now, what does it take? Correspondingly, there are three things that are required. First, the environment needs to be open ended enough, because the agent's capability will ultimately be upper bounded by the environment complexity. And the planet earth we live on is a perfect example, because Earth is so complex that it allows an algorithm called natural evolution over billions of years to create all the humans in this room. So can we have a simulator that is essentially a Lo Fi Earth, but we can still run our lab computers, and second, we need to provide the agent with massive data, because it's not possible to explore from scratch. You need some common sense to bootstrap the learning. And third, once we have the environment and the data, we need a foundation model powerful enough to learn from these sources and this train of thought lends us in Minecraft, the best selling game of all time. And for those who are unfamiliar, Minecraft is a procedurally generated 3d voxel world. And in this game you can do whatever your heart desires. And what's special about this game is that Minecraft defines no score to optimize and no storyline to follow, and that makes it very well suited as a truly open ended AI playground. And as a result, we see some very impressive things, like in this one, someone built the Hogwarts Castle block by block in Minecraft and posted on YouTube. And then someone else, apparently, was nothing better to do. They built a functional neural network inside Minecraft because Minecraft supports logical gates, and it's actually a Turing complete game. And I want to highlight a number. Minecraft has 140 million active players. And to put this number in perspective, this is more than twice the population of UK and it just so happens that gamers are generally happier than PhDs, so they really like to stream online and share. And this huge player mass, they produce an enormous amount of online knowledge every day. So can we tap into this big treasure show of data. And we introduce mind dojo, a new open framework to help the community develop general purpose agents using Minecraft as a kind of primordial soup. And mindojo has three parts, a simulator, a database and an agent. And here we design a simulator API to unlock the full potential of the game for AI research like we support RGB and voxels as the sensory input, and you can do keyboard and mouse actions as output and mind. Dojo can be customized at every detail. You can control the different terrains, weathers and Monster spawning, and it supports creative tasks that are free form and open ended. And for example, we want the agent to build a house, but it's the same question, what makes a house a house? Right? It's impossible to use Python code, simple code to define that you have achieved building a house, and the best way to do it is to learn through data so that the actual concept of a house can be captured.