Everything Everywhere XR #08 - Generative AI in Augmented Reality
1:00AM Apr 13, 2023
Speakers:
Alessio Grancini
Keywords:
ar
ai
diffusion
generative
image
stable
glasses
called
model
question
bunch
hololens
day
prompt
xr
hearing
creating
based
3d
bit
afford the dev anti my Discord so we'll try to give time to everyone and maybe we can start if you guys are down sure all right so I'm gonna start recording and then the results of the author job gonna keep a meeting things meeting there is probably a better way to do this where your meet everyone in wants but I'm just gonna regulate because if someone start to share that stuff and
so the hammer bring the hammer.
Hammer bring it in. So Hello everyone. This is a event organized by IX by my Discord that is called everything everywhere XR and also Norman's Discord is called RMR and XR. This event is an event where we feature speakers that are working in the industry. I just want to say because we will have a limited amount of time. I just want to let you know that on the meetup event. There are additional information you can eventually just continue some conversation and ask them questions on the discord. This is also an educational content because it's not really related to my job. I'm not promoting anyone activity. I'm just doing it for the sake of just having fun and creating a community around amazing, amazing work. Speaking of like topics that are current and informing everyone of things because things are moving very fast in the industry. I mean, they always did but especially now with this AI wave I feels like it's very tough to keep up with things every day. So I just want to say that this is also for informing everyone of what's going on and if they're a question let's keep it for the end. Because I would just like we have limited amount of time and I would like to give to every speaker the availability and possibility to describe their work, what they think about certain topics and then if we have extra time I'm also down to stay a little more and just go to a different question. Let's see if we can have everyone speaking in this time before then doing that. So yeah, let's let's get started the the event is also in the meet up and there is just an introduction now and then it seeks then there is been a while see do with reality capture and generative AI emerging reality imagination with nerves control net and stable diffusion. Just a very quick introduction. I think you guys all know a little while from amazing tweets and amazing contents. I think it's very cool that now we have developers slash creators slash you know, like engineers slash designers and all of these figures that are coming in, in the industry do quite a lot of all the work and it's very inspiring to see because kind of like a trend setter for the industry indirectly I think because also the companies look at all of these tweets and what what this technology can deliver. Also, as a solo developer, I think everything starts to move from from there. So please go ahead, introduce yourself as you prefer. And please start your presentation. Take the stage.
Awesome. Cool. Alicia, thank you so much for having me. And for that kind introduction. Let me jump into the 2d Metaverse and present my screen Alright, so hi, everyone. I'm Ben Lovell. I'm an AI creator entrepreneur previously spent six years at Google spent about a decade in exclusively in the AR VR and 3d map space along the way, have built a following of a little over a million folks across YouTube and Tiktok. So I love all things creative technology. So when you were describing the multi hyphenate whole multi hyphenated nature of the field that that definitely resonates a lot with me. Alright, so the topic for today, in about 10 minutes, I'm going to talk to you about reality capture and generative AI why it's a very powerful combination that hasn't been explored as much yet. And once you can perceive and model the complexity of reality, how generative AI gives you this amazing ability to reskin and augment it all the things we've been talking about sort of the AI, AR is like overlay future of the world is now suddenly within hands reach. So I'll walk you through a couple of my experiments and along the way, I'll keep peppering across their key takeaways for how you can start integrating this into your own work. Alright, so first and foremost, let's talk about nerfs. So unless you've been living under a rock, you've probably heard the term neural radiants fields. What the heck is it? Well, before we answer that question, let's look at some pretty visuals. They look amazing. And certainly they do an amazing job of modeling the complexity of reality as I alluded to, but what the heck is on earth? To answer that question, we kind of need to go even a level up and answer the question of what is reality capture what is photogrammetry photogrammetry? Very simply put is the art and science of measuring things in the US in the real world using photos that's the photo in photogrammetry, but also other sensors like LIDAR. What's cool about nerves is it's this like ML learned based approach to doing reality capture. So essentially, what you're doing is you're asking this AI to take like 100 photos of a static object or scene and essentially eyeball ray tracing. So this is what it looks like when you train a nerf. It's creating this like true volumetric representation, where for every like voxel in 3d space, you've got a RGB color value, but also transparency associated with it so you can model a lot of interesting and complex effects. I'll skip over the history of how we got here, but there's like obviously, like, I encourage you to read the blog later on sort of what the history is leading up to nerfs. But the unique thing is instead of ending up with this, like 3d mesh or model, which you can certainly eventually make, you know, with nerfs to, you're basically modeling the complexity of reality by the weights of this like multi layer perceptron. And that's kind of amazing. Why is that amazing? Alright, so well, how does this work? Nerf and photogrammetry sort of start off the same way you pose some imagery, and then that's where the similarities end. What do I mean by posing imagery? So let's say you got about 100 photos of this static object is Buddha statue in in Chandigarh in north India, beautiful place. And by the way, this is taken with like an iPhone seven plus. So this is like, really, really old imagery. You can pose it basically figure out the relative position of where these images were taken by just triangulating those positions assuming that the environment is static. So once you do that, you end up with something that looks a lot like this. Let's compare it with what the photogrammetry version looks like. If you run these exact images through photogrammetry pipeline and optimized model looks like this right? It's really good, very detailed. You can drop this into a 3d engine. Certainly I baked down all the high frequency details into a normal map to so you can see all like the nuances. But it's still like this abstraction of reality. And also I've lost all the background context, right. Let's jump to a point cloud representation which is sort of this earlier stage in a in a classical photogrammetry pipeline. And here you can really start illustrating the differences between nerves and in photogrammetry itself. So this is a dense point cloud. So you did the posing, you created these depth maps and then you created this dense point cloud. As you can see, anywhere there's like reflections or there's no textures. On the surface. You don't really have all that information. So it's picking up like the staircase here on the back. It's picking up some of the dimensionality and the structure there. Again, this is amazing if you want to maybe measure stuff like how tall is the statue answer like factual questions about measurement. But if you want to encapsulate the sort of the vibe of this place, all that background context is lost and gone, right. And like, there's certainly a lot of trickery and hackery you could do manually in a game engine to take like a 360 panel like filling in a bunch of these holes. But you get all of that out of the box with nerves because unlike solving this problem of measuring stuff, the thing that nerves are solving for is this problem of view synthesis. So basically, given that bag of about 100 images, how do you create this volumetric representation that lets you interpolate between all of those viewpoints but also go outside of the capture volume? So I love how like the reflections refractions all that type of stuff is nicely modeled here. So you know, as Alicia mentioned, like this industry is moving fast. So this was about 10 months ago, and at that point, the state of the art was using Nvidia instant NGP you could train something this was like what a four and a half gig nerve. You have to wrangle some like colabs call map scripts, all this type of stuff. Fast forward to today and you can basically do this all with the iPhone in your pocket using Luma AI right, so I made this capture on an iPhone pushed off the processing job on an iPhone. Did the camera animation on an iPhone and you end up with phenomenal results. So it's amazing how accessible this tech is just within 10 months. So you're seeing these timelines for to compress. Obviously, you can do a bunch of cool things with nerfs here, night types, you know all these sort of like, hard to do scenarios for photogrammetry is suddenly unlocked. Even things like reframing stock footage, right. And one more thing I know compared to photogrammetry you don't need that many images. So like with 150 images, you can get really, really good results. So that's the reality capture portion. Let's talk about re skinning reality, right. So you've probably heard about stable diffusion you've probably also heard about image to image which is a way of giving like, you know, these diffusion models at a very high level work by de noising. The image and imbuing structure and meaning into the image based on the text prompt that you give it right. So the most naive thing you could do is just run it through the image to image but you end up with something that looks kind of like this, you know a little bit of a flickery maths, maybe you're leaning into that aesthetic quality. So how do you get something that's a little bit more temporally coherent, that respects the 3d structure of the scan that we're giving it. So stable diffusion actually launched something late last year in November, called depth to the image as a part of stable diffusion to Dotto and basically, what that let you do is provide a depth depth map in his in addition to this RGB image to guide this diffusion process. So think of it like gray boxing in game development, right? You've got this core structure of a scene, like you might box things out, but then now you can suddenly type in a text prompt and get something far more realistic. To really illustrate this. Let's take this example of like the classical holodeck. This is me just toggling through every possible style combination, right so you can see where I'm probably going with this right? If you've got a you know, a more complex composition, we will try like a Lego style aesthetic, easy type and Large Hadron Collider. Suddenly, with the same like geometric structure. You can start stylizing and exploring different things that respect the underlying def map. Fast forward to early Feb just a couple of months ago, and a seminal paper tape paper called control net came out which not only was like faster and cheaper to train, it also allows you to use any checkpoint of stable diffusion. So you can no longer restricted to stable diffusion to Dotto. You can use 1.5 whatever variation that you want, and then use a bunch of these like purpose built models right like depth estimation pose estimation, and use that in concert with a frozen checkpoint of stable diffusion to guide this like diffusion process to illustrate this like let's say you want to have this exact shots, let's say of an anime character sitting in this chair rather than spending like 100 iterations trying to get this exact shot out and saying like mid journey, what you could do is decompose this into the depth map itself, that nicely represents all the structure and then the full body pose of the character and start combining these things together in an interesting way. So I use that to create this like interior redesign experiment. So my workflow was roughly do a photogrammetry scan, use the depth map, then use Control net to figure out different style keyframes basically restyling this interior like this drawing room effectively, and then using something called Absinthe to smooth out their transitions. This is what the final result looks like. Essentially me toggling through a bunch of different styles.
Here's what's happening under the hood. That's the input scene like literally a photogrammetry scan and Sketchfab and then you're seeing like the various restyling results on the top. What's cool about this is like with controlant, you can use multiple methods of control. So the Depth Map encodes all of this like unique type of, you know, Indian furniture, basically, you know, into the diffusion process, but what about the wall in the back the paintings, all this type of stuff. So there with control net, you can use something called Kenny a bunch of other models exist to do edge detection. So that'll that'll pick pick out all the nuanced details of like the flat surfaces in the scene. You can see where I'm probably going with this. When you start composing these things together, you can start doing some very, very interesting things. In fact, again, a testament to how fast things are moving. Some of these requests were made to the developer and like the developer implemented it. So now you don't even have to go through all of the hackery that I just walked you through. And you can end up with the results that look kind of like this. Again, all the things structures, the graffiti on the wall, all of that stuff is retained and you've got the beginnings of this awesome video video pipeline. Here's a little bit of a peek under the hood. Nerf gives you like these immaculate depth maps that you could exploit for all sorts of different purposes. Of course you can apply this to video as well. That works really well there are tools like gen one that are making this even easier. And again, just a testament to how fast things move even more. I'm just talking about like six or seven months, how much things have advanced in video to video. Khyber is a startup that's made that entire process even faster in one click so that's going to be coming out of beta shortly as well. So there we go bringing this all back to the end. I'm at time here. Hopefully that gave you a flavor for how reality capture and generative AI make a really powerful combination and how you can start capturing the complexity of reality really just think of it like kitbashing reality together. You don't have to start from scratch. You can take advantage of these like democratized techniques for scanning objects, spaces, people, places that you care about, and bringing that into your artistic creations and taking that to the next. Oh, I still have five minutes. Oh my god. I'm like doing way I'm way ahead of time then. Honestly, I don't even have much more to add other than like, you can do a lot of dope stuff with it. Maybe one other thing that it add is all of these things that we talked about are, you know, essentially post capture processing, right? And we're talking about like, you know, two and a half d manipulation right like so. Yeah, you're encoding the structure of this like 3d scene from that specific viewpoint. That's what's encoded in the depth map. There's some really interesting paper recently coming out about going more into the input imagery itself. So what if you could capture this nerf and then take a technique like instruct pics to pics to type in a text prompt or restyle it right? Like, let's say you want to explore seasonality time of day, if you want to reskin the statue to look like different animals. You can do that by going back to the input image and what the researchers are doing they're they're doing this in an iterative fashion. And that over time is is kind of enforcing like multi view consistency. So you can see after every iteration, you know like the bear is looking far more consistent from these different viewpoints. Then you go about processing on Earth as usual. So you end up with like this like volumetric and you know result that's like useful for all sorts of things, but you're doing it now to this input imagery, all sorts of fun stuff you want to do like you know, frickin you know, costume redesigns, all sorts of fun stuff is like completely possible. So all this to say, six months ago, you have to wrangle call map incent NGP FFmpeg. Stable diffusion 1.5 frickin depths to image Epsons all this type of stuff. There's a bunch of tools out there that let you do this in a one click process. Just take a Luma and Khyber and you can replicate all of what I'm doing with just a couple clicks and that's frickin awesome. And of course where this will go next is more more native 3d representations. But yeah, I'll stop there. If this was interesting to you all, feel free to check out my substack Creative Technology digest. And of course all the things that I walk through are also available on my twitter feel free to follow me. I've got a bunch of these threads broken down into great detail and with that I will hand it back to Alicia.
Thank you so much. This is This is so great to see and I just want to like bring up something that like I don't know, like I personally always kind of did the work of prototyping. I didn't know there was a title until now but like it's exactly like taking a lot of different tools, a lot of different things and like put them together. And every time you kind of creating a concept for like a product like what you just showed is based might be maybe in like one or two years. I think also, like probably less time like a product that you see outside. So I feel like it's an exciting moment now because you see all of these kind of little components that are, you know, with them one by one together, and people assemble it and they come up with a different solution. And this space is like very exciting because you're basically just like doing your recipe, you're creating your own recipe for your app. And and every time it's like mind blowing that you see someone that did it slightly different and it is giving you different results. And you're exploring all of that. And it's you know, very valuable to see the fact that you release it the fact that you're very present, the fact that you share it, that I'm impressed by, for example, that kind of manipulation of the 3d modeling based on depths because kind of like solve the issue of having just a skybox. So, you know, it's like I see now for example, there's a lot of Skybox generation with stable diffusion, which is very handy because you're creating you know, Skybox on the go, but like this one is basically the same, a little bit more 3d accurate. Like it really kind of gives you a different like you can create different variation of 3d models. It's it's
different May. If I may just add to that, I think you're bringing up another point which is it's like this AI art creative tech community is so open about sharing stuff, which means we all learn from each other and then like product companies integrate this stuff, which is a lot of fun. But also from an AR perspective, it's interesting to think about how this sort of sidesteps the classic AR pipeline, right to do this use case of like interior re skinning. Classically, you'd want to create this like explicit 3d mesh representation. And then you're figuring out how to like have these like parametric 3d models that line up perfectly. Then you're doing like lighting estimation. And you can kind of just do that like a frickin death map and a photo right in a web UI now and then as you start fine tuning, you know, these models to include like actual furniture that's like perhaps let's say an Ikea Wayfair thing. You can start imbuing, like specific annotations. of things that are like you could go and buy right so it's fascinating to me how this is all going to bleed into all the XR stuff too. And yeah, it's it's awesome that it's happening in the open.
Amazing, thank you so much. Thank you so much. Stay, stay tune and if anyone has a question, please keep it and we will be coming back with the while. Bilawal sorry, is that the right one? There you go.
You got it. You got
Thank you. So let's go next. And I'm just going to introduce the next guest, which is called Dan scarf. Thanks for joining Dan, and thanks for telling us about your journey with X ray glass. If I'm not mistaken bringing XR and AI together to create life subtitles and if I just want to expand on that, like if there is something that I always like, try to give more value in XR since the one that I started to play with this technology is like text. I love to see text in AR I think it's impressive to me to see like informations without any background and very you know like we are all day looking through news, social. We are consuming text all day. Basically it's under the it's a can be audio can be video but there is a lot of text in all the interfaces that we use. So I think that that would be the first for me one of the main products to kind of like open the bubble of AR would be definitely having a consistent text product that gives you value. So please go ahead and study yours please share your amazing project with everyone.
Excellent. Well, thank you so much for having me. And if I felt out of my depth before I joined the call, then I most certainly feel out of my depth now. I think I understood at least three things. Maybe you said below well in that previous section. So I'm not from the AR world I spent 20 years in enterprise IT. But I'm now out of that world and wanted to do something a little bit more exciting. So our marketing clearly failed, because you didn't catch the fact that it's X ray as an x ray glasses and who didn't want a pair of X ray glasses as a kid. There's a personal story behind this. In fact, there's two personal stories my granddad one of the other founders dad both suffer from hearing loss and we you know observed firsthand you know how isolating it can be to watch a family member, you know, not able to engage in the conversation. But this is a big problem and you know, depending on what stat you look at, you know maybe one in five people in the world have some kind of hearing loss when you start to expand that to those with you know, neurodiverse conditions as well. You're looking at maybe one in three people around the world. So this is a huge, huge challenge in accessibility in general. So we set off on our mission to subtitle the wealth through easy to use and empowering technology. So you know, how can we actually make this technology available to everyone and hopefully, the audio you'll be able to hear from this video. So if you don't hear any audio maybe someone just shout
We're living in a world that's more connected than ever. But for many people who are deaf
you guys, video Share.
No video, no video.
I just want to be sure I can try to share it myself if there are any issue but
let me just try doing it from this way instead. So if I just go share this one let's try this again.
Yep, perfect.
We're living in a world that's more connected than ever. But for many people who are deaf or hard of hearing it hasn't always felt that way. Even with assistive devices everyday life can still be challenging for those with varying levels of hearing loss. So when we launched X ray glass and AR software that uses smart glasses to translate audio into subtitles in real time, we knew it would be so much more than just revolutionary tech, introducing life subtitle rather than solely showcasing impressive technical features. We positioned X ray glass as a lifestyle brand that genuinely augments lives for the better. We established a team of brand ambassadors from different walks of life who were deaf or hard of hearing and invited them to experience X ray glass for the first time. Literally right and what a man that's amazing. Everything I'm saying yeah that was crazy. My real reactions with a real response, over 6500 pieces of media coverage. That is absolutely. And it wasn't just those who are deaf or hard of hearing that saw the benefits of X ray glass
resulting in 820 million in combined reach. We saw a 1,000% increase in online searches. Then the glasses sold out on Amazon. And our software was noticed we showed how tech can be human by giving people more access to the world they deserve to be a part of one where we're all connected. Life subtitled.
So it gives you a little idea of what we got up to and then hopefully if I quickly go back to here and I'm conscious, I'm running slightly behind but I did only get five minutes, so maybe I can steal a couple of extra minutes. And so as you saw on the video there, we've got real time subtitles across today 11 different languages 76 seem to be we can do real time translation in an ounce of any of those languages. You can engage in conversation with people who don't speak the same language. And you're probably wondering why I'm on a call about generative AI and the answer of course is check GPT. So we've taken effectively, your entire conversation history and turn that into a prompt up to tracking protein so you can now ask questions about the conversation. You're in like hey, X ray, what was it we were just talking about? Or a x ray once the answer to the pop quiz question you just heard, no cardio, but anything you like and you know, effectively, you know, using that generative AI to help people in their everyday life you know, be that making suggestions or things that they might say give them feedback or whatever it might be. Now in the video you saw obviously the Enrile s for those of you in the know. So they were the glasses that we launched with. We also have support for the TCL next where s we also have the support for the think reality a three. We've demonstrated our software running on the new reference design from Qualcomm at Mobile World Congress, and we are in the process of working with TCL to get it onto the radio x as well and I won't even begin to try and describe you know how this works behind the covers. If any of you are interested, you can take a screenshot and look through so this is how our translation and transcription engine works. And then this is our open AI assistant. So that is about all I wanted to say. I think I'm only one minute left over. Of course, anyone that has followed anything that we've done will know that there isn't one notable exception to our product lineup. And I won't say anything more other than it is our first birthday on the fifth of May, and who knows what we might announce. So there we go. That's it for me.
That was incredible in such a short amount of time I was not aware of it was just like five minutes if you if you want to add anything. I think you could also take either two minutes. It's not a problem. But I of course like working in the world of AR wearables and hardware. I think it's super cool. The fact that you I mean I see that you kind of have like the idea of translation, but also the integration of charge btw Do you see like chargeability as a sort of like, like stealing the context of understanding a language or you see more of extension also for kind of like a self communication assistant with the user.
Or honestly, we can use it for all kinds of things. I mean, that's why it's so so exciting. But initially when we started out, you know, we were looking to help people who are deaf or hard of hearing, so it was very much you know, around using the technology to help people to hear, but when we integrated chat GPT actually you go a step beyond that now and it's helping people to understand, and that's us a broad phrase of course, but you start to look at as I mentioned in the thing, you know, neurodiversity, you know, dyslexia, ADHD, you know, an ability to be in a lecture theatre using something like this and to be able to ask clarifying questions about something that they missed, or, you know, to be able to be reminded of something that they heard yesterday that they were supposed to do I mean, this kind of, you know, actually properly personal assistants. You know, that's the bit that kind of excites us the most and, you know, the transcription is almost a Trojan horse to get access to the compensation data in order to be able to provide this assistant over the top of it effectively. So that was fine. That was what we've done up to.
Nice, thank you so much for the for the answer, and we will get back to it later if there is any question but thanks so much for showing us this. Time, just keep running. So I'm just going to introduce the next guest every one of you would deserve like, you know, full day presentation for what you guys do. Sorry to cut us so short. So our next guest is is Bart. And the last name is very hard. I'm going to try to pronounce it correctly, but if I'm mistaken, sorry, trying to sell Loski
All right, don't worry.
So I have it here as AR independent developer X Apple, AR prototyper and CTO of rec that is building sport in AR so kind of like bringing AR to sports and see what's possible in that domain. Please go ahead and just like tell us your journey and what you are up to now.
Yeah, let me just see if I can share my slide deck. I have a boring old
mate you pronounce your last name for me? Oh, yeah.
Should Loski like chin? And last? Let me just see if this is gonna play. I'm hoping this everybody's still see this. Like, is this still coming across? Yes. Okay, great. Yeah. So yeah, thanks for having me. And by the way, those two presentations before were just amazing. And Dan, I definitely think we should talk I your applications are very near and dear to my heart for for reasons. I can't fully, you know, talk about on here publicly. But yeah, so I want to talk about AR as the ultimate interface for AI and a little bit more about me. You can follow me on social media. I'm not a super active poster. I've been doing AR since 2016 When I got the HoloLens and I really kind of fell in love with the medium. So you can kind of you know, see what I've been up to. Across these links here. From 2018 to 2021. I was at Apple as an AR prototyper you know, doing kind of prototypes to investigate what future product directions might look like for the company in that space. And since 2021, I've been on my own. I've been working on rec as the CTO, which is a company developing Mr. fitness application, but I'm also you know, doing just a lot of just exploration with this, you know, exciting stuff. That's been happening in the last few years. So, without further ado, I'm gonna wait into the nomenclature wars just wanted to clarify what I mean by AR as distinct from mixed reality. So mixed reality headsets, as I'm defining them here. Candidate.
slides aren't changing for me. I don't know if it's for everyone else two. Oh,
okay. Darn. Yeah, it's
not changing. Man.
I should I should have known better than to do this on a Macintosh. Um, alright, whatever. Like I'll just I'll just have it in this window. I don't think there's any way to do it. Any other way on a Mac? Unfortunately. So yeah, again, the I mean, these are my social media links. I'll have them at the end so we can go back to that later. But yeah, so mixed reality headsets can be optical see through my call lens Magic Leap are great examples. But you know, lately there's been a lot of talk about pass through headsets. Great examples are of course, the entire quest line. And the Vario XR series, of course. You know, mixed reality. I think what distinguishes it from augmented reality in my mind is that it really kind of emphasizes high fidelity spatialized 3d visuals, so really trying to like blend in with your environment. You know, really have everything kind of be integrated as if it were a real object. Unfortunate. I don't know if we can play this video in this, this view here. But let's try this is actually a video of Wrexham that we're working on, on quest in pass through so you can see it's really kind of going for that, that that mixed reality feel where you know, the objects are really kind of grounded in your space. There's all these kind of like, you know, gee whiz effects, like breaking through the ceiling here, which are kind of, you know, really designed to kind of seamlessly blend. You know, the real and virtual worlds. This is you know, this kind of tech has been actually around for a while. So this demo here is on HoloLens, which is an optical see through device. And this demo is from 2018. And it's actually a game that I published on the app store there. And if you look closely, you can see that like objects, you know, are occluded by the bed and so forth. So again, like you know, really going for this kind of high fidelity, virtual things in your environment. Kind of aesthetic, right. And I think that the other thing is kind of really important to mention is that mixed reality devices, as I assume at least are task oriented and like session based devices, so they're not designed to be all day wearable devices. It's kind of like you want to do something you get in there, you could do it and then get out. So content consumption, or content creation, you know, perhaps using generative AI. By contrast, augmented reality kind of refers to the glasses, you know, form factor and specifically, you know, being all day or at least long term, long time wearable. So the actual interactions that you're having would be short duration, you know, but maybe multiple interactions throughout the day, each very short duration through the form factor constraints, not having a lot of power there in such a small form factor that would necessitate, you know, augmentations being minimalistic, which is also not just the little hardware, but I think it's actually what people would want in an all day product. I don't think you want you know, nobody really wants to have like realistic characters running around, even if it were possible to do so. It's just not really something that you want, you know, you want less pixels, not more, and kind of you want to be present in your world is I think where a lot of this is going and you know, perhaps sometimes spatialized but not always. These are you know, almost exclusively you know, optical see through glasses, foreign factor type devices. There's not a lot out there right now, and the vision just is far from being realized yet but you know, some early hints at it are things like the North vocals, and you know, the view six blade has a similar form factor and there's a bunch of other glasses. I mean, here, you know, here are the snap. So I shouldn't be wearing these the snap spectacles, although, you know, only about 10 minutes of battery life at a time. And, you know, I think the real use cases will be kind of like phone like smartwatch like use cases so contextual assistance and so forth. And this is really the form factor that the industry is betting the big dollars on this is what they you know, they really want to achieve and the idea that this could have you know, some impact on the smartphone market perhaps release rival that market is the hope and in the very distant future. Now, whether that's money well spent or a money pit is a discussion for another, another panel probably, but it is kind of where the long term vision of the industry is. And so of course generative AI you know, using existing content to create new output where output is not necessarily restricted just to art. Why is this useful? You know, or why is AR the ultimate interface for it? Well, you know, AI is designed to serve us, but we don't exist in a browser window. You know, as you know, a lot of these you know, AI applications are kind of tied down to form factors that are not really, you know, naturally connected to us. You know, you're either in a smartphone or on a computer or whatever. You know, AR promises to kind of be there when we need it. You know, it's always it's supposed to always be there, like right now, like I can go outside, you know, where are these things? So our perspective, you know, the things that we're seeing or experiencing can be fed into, you know, AI, which we know is multimodal, and then its output can be there for us when we actually need it in the situations that we need it. So you know, you can't read the text in the chat GPT window, but it was actually a workout plan that someone asked for. And you can imagine that like, you know, in the old days, like with Google with was MapQuest, if you're old enough to remember you would like get your directions and like print them out on paper, and then go dry with them. And now you know, you just have it on your phone. It's kind of the same thing here. Like the workout plan thing really would make a lot more sense if it was, you know, there when I needed at the gym, you know, so I don't have to like, remember it or bring a notebook or pull out my phone and use some app, just kind of ask for it. And there's countless situations like that. Um, so regarding generative AI, for AR I think one of the big things, you know, of course is you know, natural language interfaces. And Dan, I think really covered this topic pretty well. So being able to you know, have that context about what you're looking at what you're doing. I'm including this concept of what's called a Data Reference. So being able to refer to something without like explicitly saying what it is because it's clear from the context that you're in what you're referring to. This is a demo I did call the chat AR kit. I'll just play it basically it's showing kind of it's it was really crude demo. It's open source on my GitHub, showing a little bit of this because you'll notice that I that I discussed, I mentioned like nears plane you know, I could have also just said table and it kind of understands how to sportscar on the nearest table, then rotated by 90 degrees and make it move back and forth a lot. Hopefully you can hear the audio from the window. I'm not sure but if not, it showed here. I've had sports car plates on the table and how to drive back and forth and using chat. GBT generates the code to do that. Perfect. Now move it to the floor. Then asking it, you know, to go to the floor, boom, there it is. Now that's not a great example of what I think an application of AR glasses would be. This is definitely kind of more in that mixed reality domain. It was more of a tech demo, but kind of hinting at like how natural it was just amazing to me how easily that that kind of natural language approach can be integrated into an app with just a few lines of code using generative AI. So of course generative AI is also good at you know again, this is kind of hitting back on use cases that Dan was talking about information ingress and comprehension, understanding speech and text and being able to do transformations on that. And of course, seeing understanding is another big topic. This picture is from a presentation that Facebook gave on their new segment anything model which kind of uses concepts from generative AI to produce a you know, a better segmentation model that has been seen before really pushing the state of the art so that you know, the real potential that that the industry sees and AR glasses is is this you know idea of like a seamless context aware computing interface. I think. And this This picture here is by Lauren case on she if you go to her website, Lauren case on.com. She has an amazing cooking demo that she did on this device, actually, that really I think hints at what the future could look like. And coincidentally Lauren and I actually worked together on the same team at Apple. And you know, she's been kind of doing some awesome stuff. since then. And I think that you know, watching that video is a really cool example of where AR could hopefully go one day and I mean AR not not mixed reality. So you know AR is all about you kind of computing the world. Air glasses should be able to kind of see and hear the world from our perspective that's really powerful and, you know, obviously provides an amazing, rich input for generative AI. They should be able to present contextually relevant information. Again, for more, you know, I'm using pictures from Lauren's demo because it's such a great example of this. And and, you know, provide situation appropriate interfaces. That's kind of a promise of, of, you know, what, what AR might be with Nina lastly, this kind of important content integrating into the many threads of our lives as they weave in and out of the foreground. And the reason I have this picture of someone snacking is there's lots of examples of this. But one good example is kind of like the the kind of dieting nutrition use case. You know, we're all aware there's like calorie nutrition trackers out there. But it's a little bit hard to actually be honest about that. Like every time you snack or go to the pantry between meetings, whatever or if you're out at a restaurant, you're probably not going to record honestly like what you've been doing. It's just not a very good user flow. You can imagine this is a situation where and this is kind of something happens in discrete events throughout the day. If you were able to wear glasses all day that could detect that. It's just an absolutely killer use case. And there's a lot of that, you know, in terms of like our work and so forth, we don't really usually conduct a lot of things in a session based way. We do it throughout different locations and time throughout the day or multiple days. And I think the promise there is having that, you know, compute assistance, being able to like understand and integrate all information. And then I want to just kind of go through some, you know, potential specific use cases that I think would be, you know, amazing whether they're doable or not, is another question, but, you know, a way to think about this as viewing AR is a superpower. So, you know, super memory. This is an example of Otter AI, which we're using right now to transcribe this meeting. It works on a PC or on a phone, but imagine if you could have this all the time and be able to distill information from your conversations, extract action items, and then surface that information when you actually need it. Super perception. So this idea of having like sight beyond sight, you know, being able to just help you see, like whether it's magnifying things or translating things as in this famous Google demo.
You know, being with communicate with people in their languages. Helping you here again, going back to and I think Dan just did a whole basically talk on that that incredibly important application. Navigation, right, a very kind of classical AR use case. perhaps better than we have right now on the phone in that if you know, I mean, who hasn't been in the bay area or somewhere right where you've got like these like five lane roads. You make a turn and then it's like, you got to turn left and you're like, Oh, crap, I've been in the wrong lane the whole time. I wish I had known ahead of time. That that context is really important. I call this the superpower of being super smooth communication. There's this great demo on Brian Chang's Twitter. Using the monocle era device where they're using chat GPT for what they call charisma as a service. So, you know, maybe helping you communicate better kind of suggesting things you might want to say or at least giving you feedback on how you are speaking, giving you that power to like communicate more effectively. And lastly, I put like omniscience so so this this scene from The Matrix, that famous scene where he's like, you know, I don't I don't just see code. I know that's like a woman, you know, blonde woman walking down the street and so forth. Like, imagine being able to kind of collectively have this, this knowledge of everything that's been going on and happening right now. So with all these devices, gathering data, you can imagine being able to query information that you want to know right now, such as, hey, how long is the lion at that restaurant? You know, right now, like, or how long is it typically, at this time, if I leave now and so forth, being able to kind of have these, you know, like, really specific queries about the world transcending kind of, you know, literally space and time. So, yeah, so So I want to just this is the conclusion basically, you know, there's a symbiosis between AR and AI. You know, AR needs AI to understand our world and provide utility, but at the same time, AI benefits from that, like rich, voluminous and multimodal. Data that that we capture from our perspective, and the two legs strongly reinforce each other. Ai makes AR more useful. And AR also you know, really lets lets ai do its do its best work serving us and also becoming better in the process. So that's pretty much all I want to talk about. Yeah, these are my kind of social media links. And thanks again for having me on here.
Thanks to you for showing up all of this year and to compile it into very short time once again, sorry to not worry about it. But yeah, this is so great, and I really appreciate that you put like at the end like a lot of real life use cases. Which any one of those has a full, probably, you know, group of a lot of startups that are concentrating on that problem. I feel like the you know, all of the landscape of AR really changed and evolved like on some like kind of like niche use cases. So I like to see that. Now with AI seems really like kind of re discussing how this, how this is going to work somehow like when it comes together. And yeah, like speaking of like, you mentioned that AR and AI are some sort of like complementary entities in a way, because I think that just just taking a look, for example to the segmentation model that Facebook released augmented anything is basically understanding the surrounding and there is a new model of interface which you can have like, indirectly reveal No, it's like for example, you're just looking at something you're identifying something you are not necessarily you know, talking about your typing instead of typing buttons. Buttons, lose a little bit of importance whenever you can, with your voice achieve the same goals. And like there is a lot of speculation in his environment now because what's going to happen next, right for some maybe not for the most complex task but for very repetitive and very like common seems like you know, voice might be might be enough or like text might be enough. So with that with this note, I want to just introduce crease Reuters, which he is focusing actually on the making an app to disclose shader. And correct me if I'm mistaking if I'm doing any mistake in this presentation, but seems like an AR application, which is generating AR filters and more just with the use of voice comments like prompts like a prompts interface. So it's something that goes beyond like the typical obligations that we see for example, with Spark AR or unity or other things this up as starts as a prompt based interaction. Chris in the past worked in companies like meta Microsoft, and he did a lot of work on XR prototyping so a lot of like, I feel like they were all a prototype errs. help you to navigate a very broad spectrum of different features that you want to pick anything from everywhere because you just want to put it together for making the best product. So please, go ahead and this is the last guest and then later after, after this one we will reserve some time for the audience questions.
Cool. Yeah, I think we might run out of time with the video like it said it was gonna be up in 10 minutes. But will I
extend it? I extended it so you're good. All right. I'm gonna share this one window
so Sunday is going to be repeat because we already heard a bit of it about Genova AI and control net and all that stuff. Let's just start so I can see my screen. Yep, yep. So yeah, I'll start with generative AI is very hot right now. I made this little animation just trying to get it to do some texts and grow it out of some loss. So give it a prompt and generate some text out of just growing, growing some moss. But you can do a lot of really fun, super amazing things with stable diffusion, but I'll just really quickly give you a little bit of my background. Oculus Mehta worked on air glasses there. There's no photos of it because it doesn't exist and the wild yet made a bunch of hardware and software prototypes, as well. As a lot of machine learning based input and other AI tools. But I don't work there anymore. I started a company called shader like you said earlier, we are making AR filters via prompts. So here you have a little video showing. I'm gonna go pretty quick through this. There's a ton of general AI companies like this is already very, very outdated as from November 17. It's kind of crazy how many have popped up in the last six months. shader also started less than six months ago. And general AI has just been moving at an incredible pace. You can see with mid journey from V one to V five pretty drastic changes. Same with stable diffusion and Dolly. But still diffusion is kind of the big one that I think everyone who is a developer and prototyping really cares about because the sources is available has a super active community, lots of extensions, it's able to, you're able to easily change on models. You can also use that a web UI and there's API's that are constantly being added on to you can run it locally. You don't have to run it just in the cloud. You don't have to pay for it like with mid journey. You can just run it on your own machine. It also runs on iOS and soon Android. And yes, you can integrate into a bunch of stuff. So people have made a couple of different UI just interfaces for stable diffusion. The most popular one is automatic 1111. And then a new one that just came out is called comfy UI. It's all node based, which is pretty rad. You can check out a video on that from alivio. But yeah, automatic 1111 is kind of the standard right now. It seems. So that's what I'm gonna be talking about. Most of the time, people kind of know from mid journey this text image and this is the interface for automatic 1111 You just type in a prompt and hit generate. But there's also the negative prompt, which is not very apparent with mid journey. You can use negative prompts but it's a little more hidden. And then the big difference between mid journey and stable diffusion is that it has checkpoints. You can change. You can change the base model, I guess of all the different images that it's been trained on. So that's a pretty big deal because you can go for something that's very realistic, something that's more cartoonish, something that's specific to RPG type. Art. So you can have more finely controlled output of your images. So these different checkpoints, you can give the exact same prompt exact same seed and you can see here the differences with Progen animais with dream like diffusion open journey synthwave and deliberate. So it does change the image quite a bit even if you give it a very specific pose and everything. The other thing that is kind of new ish to mid journey is image to image they've added this in there but it's been there for ever for stable diffusion. And that allows you to take an image and kind of go and make a new image from it. And you can kind of diffuse it to whatever amount that you want. Here's an example of going from the original image to a from a young Bill Gates in a coffee shop in Seattle to a old grumpier Bill Gates in a coffee shop in Seattle. And just seeing the weight go from a diffusion of zero to five to 0.9. And then likewise here, this one is going from a older Bill Gates to a Bill Gates who has now been attacked by zombies in a post apocalyptic horror film. And you can just see the dial kind of been cranked up here to almost 100% On the last one but I think the thing that is most impressive about civil diffusion right now is that you can end with a lot of these tools is just being able to train your own model. We saw Lanza come out, and that's basically just the implementation of stable diffusion and getting your models generated so you can train a model to have specific people or places or objects or styles or clothing, or even locations. And you can download a bunch of these different trained models, either at like hugging face or at civet AI, but warnings a bunch of these are not safe for work. People tend to train models that are a little you know, more risky there. But one of the best parts about steel diffusion is this wonderful little button in automatic level 11 Which is kind of hidden. It shows you all a preview of all the different Laura's or textual inversions or different models you have trained that you can use and kind of give you a preview so you can just kind of click on one of these and say I want that style. I want to use this one and that will drastically change your image. You can also upsample quite a lot. My GPU will let me go to about 8k resolution images. But if you have cloud compute or if you want a more powerful GPU, more RAM you could probably go even higher. kind of walk through a bunch of the steps of that but we're gonna jump into some other things. There's another nice feature of stable diffusion that is kind of better than mid journey is like you don't have this like guessing game of what is my prompt and what did I do and honestly, you have a lot more control. So you can use the image browser. And that will allow you to look at all the images that you've created. You can see the prompt, you can see all the settings and then you can load it into your text image or image image or in paint or whatever other areas you want to work with the images. There's also a ton of extensions that you can download if you go to the extensions tab and civil diffusion. You can just go to available and then load from and it will load all these and and you can just install right away. You can have a 3d model loader so if you want to drop that in and use it with control net that's very useful. Control net. Like below wall said is like it's the hot new thing as of a couple of months ago. It allows you to really control your images and this is a really big deal for stable diffusion allow you to go from your original image to call them too much like Lord of the Rings characters. And we get all the different maps that he kind of mentioned before you have the depth Kenny normals head segmentation, up ahead and then twice and these allow you to very easily fine tune your image or control your image. Another font or one that came out a while back to is the scribble mode. It also is using control net and you can just give it any scribble and then adjust the weights. So here you can see like a weight of 1.0 You know, this is again, young Bill Gates in a coffee shop. I'm from Seattle, so he's my victim for today's talk his shoulders are matching this illustration how it's not very proportionate, as the weight is like a 1.0 and then as you kind of scale it back he becomes more human and more realistic looking based on the problem. So this is super useful for doing like a rough sketch and getting a nice composition definitely useful in AR as well.
This image that I showed earlier, is also just messing with the weights. It's just creating a video and literally just changing the weight from zero to one slowly over time and giving it more and more input from control net and less from the problem. You can also use this to kind of like transfer styles. I trained a model of myself I trained one of Pedro Pascal from the show last of us, just because it was amusing. I wanted to try training on somebody. You can see it also kind of took his flannel shirt that he wears in the show and applied it to the scene. You can also do color correction with stable diffusion. Just give it a base image. This is kinda where we have some overlap with AR. So you can use stable diffusion within Unity. There's a couple of different implementations. I like K juros. It's it's using a local stable diffusion build and it can run on iOS. It can run on a Mac. It is meant for outputs Apple silicon. So this is just running locally. So if you build an app with this, then you don't actually have to charge anything like cloud computing could run locally, which is very nice for users. But it is slower. So that's the downside. There's another version to baratos unity integration that allows you to do materials like tileable materials as well as just image generation and then text the text. And this is using a local stable diffusion builds so if you have it running locally, you can just run it for yourself. Or if you set this up in the cloud, you will have to like you know, connect it to your cloud server and make that work. Tons and tons of API's. I don't think we have a lot of time to go into this today. But you can basically do everything that you can do with the UI, just through the API's. If you scroll down to the automatic 1111 interface at the very bottom, you'll see a little API button that'll tell you all the different API's and updates as you add more extensions and modules. So it's very helpful documentation and you can also try it out in the documentation live and just start typing in prompts and making stuff. So very, very useful. And here's just a little preview of one of the pieces of the app that I'm building called shader, being able to generate a bunch of 3d AR scenes and then compose them and then generate imagery that way. That's just one little piece of it. But here you have these fancy cars on the right, but really it's just a little cheap, you know, free 3d model that was on the Unity Asset Store and placed in my kitchen next to a backsplash. And then just give me a prompt that it's a an ad for some fancy supercar. There's a bunch of issues that we kind of like gloss over like the gender of AI ethics. Stable diffusion is based off of the lie on dataset, which uses copyrighted images. Hopefully as an industry we can get past this. I feel like Adobe is doing a very good job with Firefly. They are using their stock imagery to make a new base for stable diffusion. And I feel like a lot of companies are going to have to do this in the future and then be pretty open about where they got their images for this to continue to grow as an industry because otherwise, it's just going to be shamed and it's gonna be harder for companies to grow without a clean base for stable diffusion to generate images off of that aren't just scraped off the internet, or actually from some stock or some more legitimate source. So if you have any questions, you can scan this QR code. It links to this deck and then you have all these links here you can click on and here's my socials if you would like and that's it.
I just wanted to make sure you guys have an amazing presentation is there we can share maybe like a PDF or something on the discords or any like extract of it would be amazing. I don't know what you can share what you cannot share that would be super cool to have it in the in the discord or Norman's discord or mine. Both I mean as you prefer you can also just send it to me whatever you would like to share. I think this is super valuable because I think Blauw like you said something right like this. This presentation would have been an amazing introduction because it kind of present a large variety of tools that are the tools that everyone is using. So this is the stuff that you just like if you're trying to make something you just go on internet and you're just gonna run into this stuff, this thing's hugging phase something I've been using also a lot and I find it super valuable because they kind of abstract a little bit more than work for you and makes it very easy way to implement things for your project. So that's that's very nice. So yeah, we are at the end of this talk. So I just want to say first of all a round of applause to all of the speakers. You guys are just amazing. So and I leave it to the audience now. I I know that the the meeting ends at 730 and downloads to extend a little bit more because also the respect of you guys giving exhaustive answer and explain more. So please, if there is anyone that has a question, raise your hand now. And let's get into it. Otherwise I have a lot of questions. Is there anyone or you can also ride the question if you're shy. Um,
there's a question that BART had Is there a version control net that is usable on iOS? Somebody made a app called draw things. If you guys have seen this, it runs on iOS. And it is basically all the things that you see in stable diffusion in an app, and it runs locally. So there's no charge. It's just a free app. And so if you want to try different checkpoints, you want to try Ctrl night you want to try a bunch of things. You can do it just right on your phone, and it's free.
I got a question for Bob on your slide where you were talking about augmented reality glasses with like optical pass through. You know, you mentioned the views explained I think one other but you didn't mention the unreal you mentioned the TCL even mentioned the road kids dimension or any of that class the RSA of glass and I just I always feel like the the ugly stepchild you know, these these class of glasses and I just wonder if you deliberately missed them off or whether you don't classify them as augmented reality glasses.
I mean, I yeah, I don't know. I don't know. I don't know what to think of them. To be honest. I guess. I guess they could pass for both. I gotta choose my words carefully. I don't want to you know, people get very sensitive about about this stuff. As sometimes I am known for being blunt, but I don't really see phone connected glasses as having a future personally. So I've always like I'm just not very interested in them as a product category. I think what you're doing is a mate like, see like it's it's it's funny because like, like when I hear these products pitched they're like, we're like, what are they for? It's usually like, oh, yeah, well, it's, you know, it's gonna be able to do all this stuff. Like you'll have like, you know, your banking apps will be extended into like AR and blah, blah, blah, blah, blah. Now in your case, you're doing something that's very specific with a with a really well defined like user base, which I think is awesome and incredibly valuable. So you know, in some ways, like you're actually a little bit less constrained by foreign factor because like, for people that are like, say, hard of hearing, or, you know, if you wanted to, like use this for, you know, to take, you know, running meetings and things like that, you're able to like you know, you're gonna be willing to put up you know, put up with like, say a tethered connection or something right, or a form factor that, like Unreal, that is not really like I wouldn't call them like really glasses form factor yet. They look funny, you know, when you actually see in real life, right? They're kind of pushed up on your nose, they look a little bit like, they don't look quite right. But, you know, it's like, you know, I'd be willing to to deal with that. If you know I needed the hearing assistance for example. I just don't think that they're really anywhere near and also because they're depending on the phone they're not really I think really kind of hit that like all day. You know, wearable use case anytime soon. I personally think that they're gonna have to integrate more onto the devices themselves, and they're kind of like, not quite there yet. And quite far away from hitting that use case. So yeah, I don't know. I don't really know what to think of them. I mean, they could very well you know, be AR glasses. I guess it doesn't really matter. But, ya know, I still think that like the real, like potential is anywhere close and I think they're kind of like they kind of fit in this odd like strange like middle territory where they're still like I feel especially in the case like Unreal and stuff. They're really still trying to go for like, we want to render lots of stuff in your face. So we're we're going to connect it to a phone to get that extra power as opposed to like embracing what what probably the medium should be which is a lighter weight, you know thing and maybe we don't want those like I've seen a lot of downloads with like the Snapdragon stuff and it's always like, oh look like here's like a whole giant rendering of a room but like a small field of view, and it looks really weird. And I don't know what why this exists. You know, I just don't I don't know if they really know what they're going to do with those yet. So um, yeah, I don't know I'm not really sure what what the idea for the long term vision of these products is so I didn't really mention because I think like North vocals for example, really pushed like the the lightweight AR hard, you know, wasn't a super successful product for various reasons. But you know, I've known people that have worn them for a while and they were like, No, this is like the closest I've gotten to feeling Yeah, this there was like some real utility here real value. And it wasn't about putting pixels on my face kind of thing. So that's, I guess, kind of why I didn't I didn't really know where to categorize like things like Unreal and those other glasses personally, but that's just my opinion. I'm not, you know, I could be totally wrong and might have to eat crow in a couple years. We'll see.
Yeah, I can give a very lengthy discussion around the future of lightweight AR glasses at a later date.
Do it for sure. I'm down. I'd love to hear your thoughts.
So we have Jared next.
Yeah. Hi, everyone. I'm actually really interested in what you were saying Bart, because in my experience, I've worked for a non XR industry right and I've kind of been in the over the last few months. been kind of getting a little bit burnt out just because over three years I've produced like 12 different prototypes, both in the field of virtual reality and augmented reality trying to find ways to use and leverage these technologies for for a global corporation right like a corporation that is not entertainment focus or is not, you know, inherently XR focused. And what I've noticed is that so much of the form factor is the issue. You know, if people don't want to wear it, they're not going to use it. And I actually looked into a projector system called light guide that would like project down augmented reality visuals and that was one of the few things I got people interested in. So I guess my question to you Bart, because you were the one just talking about this, but if anyone else who were who was presenting wants to chime in, what do you guys see is it's kind of like the next step in regards to providing value outside of the world of like digital entertainment and video games. And I'm thinking more like customer service, retail, employees, things like that.
That's a really good question. And I did kind of neglect those use cases in my presentations. are very valuable ones. I think, like actually Alessio might be able to speak more, you know, because like, Magic Leap is actually I think the you know, really the dominant player in like the the transparent AR and for like serious use cases that are session based and so forth. I, you know, to be honest, I haven't given a lot of thought to like non consumer use cases, I know they're there. You know, what's going to, like deliver value first is a really kind of hard question to answer. Because like, like, you know, like, since HoloLens 2016, right, that they really push the enterprise use cases pretty hard. And it's kind of odd because like they I just don't really have a good good to be honest, like lay of the land there because they, you know, there's been all these like fancy announcements and demos and companies claiming that they're using them and like, you know, you had like, what is it was like it was like Japan Airlines or Toyota or both. That ordered a bunch of HoloLens and the claim they were actually using them for like maintenance and so forth. And this was here to stay but then you know, the program kind of collapsed. So I guess that wasn't enough to like really keep the HoloLens afloat. So it's always kind of like you can't really wait and see how useful it really is. But I do think that you know, some of these like, you know, factoring maintenance use cases like make a lot of sense. You know, but yeah, again, like form factor, right? Like, like it I would hate to be well, first of all, I hate to be an Amazon warehouse worker, it sounds awful. But imagine doing something like that with like a HoloLens on your face. I think Magic Leap now has a much better form factor and it's getting even better probably, you know, if they do the next iteration you know, musics has been around like forever. But yeah, I don't know. Like, again, some of these things have been out for a very long time. And like, I get I don't work in like that kind of setting. I don't work on factory floors. I have very little understanding of that space, to be honest. But just anecdotally, I just don't really it doesn't seem like there's been an explosion yet. So I don't know exactly what were the you know, problem is I suspect that it has a lot to do with like, yeah, the screen, you know, the actual like resolution of the views X or like monocular one eye, right. That's their compromise. You know, having that really good computer vision accessible in that form factor. With stable tracking, I think there's, you know, it's like, there's a lot of like little things to get, right. I don't know, maybe someone else can chime in, like maybe Alessio or someone who knows, maybe a little bit more about some of these, like non consumer applications. What do you think is gonna be the big one to be disrupted? Is there something that's like, you know, almost ready to be disrupted that that we haven't talked about?
I just want to mediate bringing to the next question. I think that generally it really depends. What's your point of view in, in defining something disruptive. Like, for me, disrupting is also for example, you know, you go to the optical optician and you do like an excellent for, you know, for your eyeballs, and you need to go through a lot of different steps. And I don't know this is really not closely related to Magic Leap. But if you ever worked with that headset, also the Magic Leap one would just allow you to get a lot of data out of your, you know, eyes for example, position. It would just give you a lot it can give you can provide you a lot of data regarding the user and so kind of you know, those are become valuable in that specific setting. Or now Magiclip is doing some remote rendering things. They are targeting also dentistry, so healthcare providers, and you see this happening very much right now, specifically in some fields very slowly but disrupting like those industries first. I think that as much as I want to see these things happening more often seems like things are moving very quickly. A very, very slowly sorry, and in a very specific niche that is being defined now. So the fourth factor, unfortunately is very determining of that. Because you cannot wear that every day and you know, we all know it. But the thing that you can do with it. It's pretty incredible and the fact that now we have these latest AI developments that allow us to do things we you know, less weigh less power than before or you know, like a lot of cloud solution that could help to reduce form factors could be also direction that now he's taken more than more than before. So I just want to give more space to all the other questions too. So if there is any follow up is keep going on Discord, I think is very active. There are a lot of now, three 300 more members that all work and wants to work in the ER so. So Marianne asks what is your guys take on the current abilities are creating a realistic looking 3d model of a real life object using AI prompts? So who want to take this one?
I could start I'm sure that people will, you know, chime in. I think there's a couple paths, right like I think the Nerf path is probably the best, like proven path right now for generating high quality meshes from real objects. But I think like it was shown earlier, you know, you can start to add stable effusion on top of that. I think there's this there's this kind of like gray zone of like, you need some sort of base to start with to start generating good quality content. So if you are just trying to make a bunch of boxes that look really detailed, like we got you like that's easy, like, because you can just texture map those and you can generate meshes or textures. And that's pretty, pretty straightforward. But when you start to doing do like really complex geometry that has a lot of intricate textures and things it's going to be much, much harder to generate from a prop. This paper I just shared in the chat. It's called differentiable signed distance function rendering. It's kind of interesting. It's kind of like growing out the object with some distance fields I imagined or something similar but kind of combining with some more stable diffusion like methods. So it's kind of funky cross blends between different types of technology. Below that you have something I feel you have some equities here. Yeah, for
sure. I mean, to add to what you said, I think there's a lot of potential for you to do your base modeling whatever you're doing, and then using things like you know, the depth control net stuff we talked about to project textures, from different viewpoints and masks that stuff together. There's some really cool plugins for blender that people are using, just to do exactly that. Almost like kitbashing textures. together. And then on the text, that 3d side there is this paper called Dream fusion, and there's like a new iteration of that that came out to so the closest thing I've seen to like a product that you could get access to is what Luma has done. So they've implemented basically dream fusion. And I mean like, it's okay for certain types of results. Like for objects, it kinda works. But again, like these models don't know about like, they don't understand, like the semantic context of what they're recreating. So you'll get like really funky weird errors like clearly Elon has got some like long as legs here, trying to do this like, like GI Joe type character and like, Okay, this is pretty usable. But again, like, is this match something that you just like maybe for a prototype, you throw this into Mixamo and rig it and start doing some fun stuff with it, but like it's a production ready just yet, right? So you can do some really cool results I think like, let's see, I enjoyed the Steve Jobs one is kind of fun. Maybe I'll end on this one. It's like kind of holding a frickin phone up and just had like, kind of the glasses and stuff on so yeah, like if you want to make like a monster I think you'll get some good results out of it. But again, like the moment you start getting into the details of the digits and all of that, like, you know, best of luck. So, again, I expect texture 3d to advance both for like objects kind of outside in recreations where like, again, they didn't even use any like, you know, 3d training data. This is all just using stable diffusion effectively. So it'll be cool to see where this goes. But I would say it's a nice toy right now. And the real fun is happening in the image space. And then of course, just if you think about modalities, images, the most important then is video because video is ubiquitous. And then you'll start seeing 3d stuff sort of showing up and a lot of the problems for video may get solved by needing some sort of intermediate 3d representation to but you know, Photo Video and 3d in that order is kind of how I'm expecting this thing to start getting more production ready, if you will.
Thank you like always if you guys want to follow up with more detailed question, resources sharing discord, the right place to be Luis please go ahead.
I had a question about like, I know magically created like a character called mica, which is using like AI. And I was wondering if you'll see like in the futures, like through the AR glasses, like a character like Nuka on our like, daily like answering this question then is there like a way we can like do that or
I mean, it's speak for magically but I was not there at that time. So I didn't work on that project at all. Any was done in a moment were charged up team was not released. So I doubt that they use anything similar to that. I experienced though an assistant at the MIT reality hackathon. It was done very, very well. We charge everything and I had like a 20 minutes conversation with this with this thing, and it was like, if I didn't know there was an agent. I wouldn't just keep going. This is my I don't know like I see other follow up question. I want to give space to the speaker. What do you guys think about use of AI? Assistants in this in this kind of like, era on please go ahead.
There's already a bunch of companies doing this. I feel like it's it's already here. We're here now. So it's just a matter of like, which one do you want? Is it a service is it like, like, integrated into your specific thing and so they're out there? I would look. Unfortunately, I'm not. I don't know them by name, but there's definitely a bunch of companies that are there doing this right now.
Yeah, I think I think it's especially big in games, maybe not. So not technically an assistant but if you were at GDC this year, you'd see that there were like, on the expo floor. There were like at least a couple of companies that were doing this kind of stuff. One had a demo and it was it was pretty pretty fun, actually. And so yeah, it is again Yeah, it's it's kind of doable now. But specifically like regarding AR glasses I'm not sure like I haven't really you know tried ever really to be honest, like a demo of like a humanized kind of assistant what that would feel like. I think that on the Mr. Type headsets, that'd be absolutely doable even today. You could definitely prototype this on the quest pro like today. But I wonder about so AR again, kind of going into lightly depends on kind of how AR evolved. My bet though, is that because of the power limitations of like, what an AR like all day kind of wearable or long term multi hour wearable would be that the answer would be probably no because I think that for for a couple reasons. Number one, it's like even if it is an interface that you want, just the power requirements of like rendering like a an object like that are pretty big, you know, with with like properly spatialize and all that stuff. It's just I think not really an efficient use of the headset. But secondly, you know, as cool as it would be going to have to kind of think about like would you actually want that all day long? Like do you really want like a holographic character with you like maybe some people do, but I think that as fun as it is like because all these demos that we're trying to AR we've all tried them like for minutes at a time right basically like very none of us have really done probably AR for hours continuously to time. We think we know we want but I think that in reality like for kind of going around your day. You probably want the it's kind of a less is more thing like you know, it's kind of like with communication. How sci fi authors predicted, you know, video phones, all that stuff that stuff exists right now like FaceTime, but most of our communications still we just want to send a text or you know, get like an audio message from Imagine having somewhat like an assistance there or maybe like ends up as a visual but not as a humanoid. Just some like light augmentation, I think would probably be less distracting. Because there's there's also that distraction issue like one of the I remember like wants doing experiment with something really simple where it's just like rendering like a little sphere or something. And there's just something about like that AR visual that like you just can't take your eyes off of it and it's and I'm gonna be walking around it's actually pretty distracting and no matter how like minimalistic I tried to make it it was like still kind of annoying to have it there. Not not maybe not annoying but like it was just like really tough to like not notice it and focus on it for some reason. So yeah, I don't know I think just because of the the power constraints and again, what you might want is like to be present and not be overloaded with that sensory stuff, especially if you like you need assistance at the moment you might not want you know like visuals kind of cluttering your your your information space I think so I'm not I'd be skeptical you see an AR glasses but again, you could definitely do it on Mr. Devices and there could very well be utility to have that sort of like humanoid you know, type character to talk to. And I'm excited to see it happen on like the current and upcoming Mr. Headsets because they're totally able to do it like right now today.
Yeah, I just like to that is again, I agree that the avatar bit is probably a bit far out right now. But you know, assistants in general, and again, I'm tooting my own horn here of course, but you know, I think they are going to be super, super interesting. And a lot of the interest again, we were talking about enterprise scenarios for this, you know, we have to have tons and tons of companies reach out to us to say, you know, can we make a personalized version of this for our company? So you know, the well known airline that we're working with, at the moment that wants to give this to their passengers, for instance, and these things, by the way, you can wear for hours and hours and hours and hours at a time. So you know, this ability to be able to call up information like you know, where am I supposed to be going, you know, when's the flight? When's this one's that, you know, that kind of assistance I think will be super interesting. We wouldn't necessarily be an avatar, it might just be text, but the ability to call on anytime, is super interesting. Thank you
guys,
like this at HoloLens with Cortana for the HoloLens, but I don't think it made it just the you know,
it did wear them for hours on end again, back to back. We got to fix that first. You got to
Yeah, priority if your headset is over 100 grams.
Sometimes what do you think is the best solution it's maybe not testing it every day, even this car like all of the nuances, like it's, you know, like, usually from my experience when there is such a radical idea comes out. It's never probably the one it's a good starter. It's a good lead, but it's just like it kind of finds a lot of differentiation until kind of the stabalize but you know, the one of take more time. Douglas, please go ahead.
Yeah, so it was it magically starting from January of 2014. And actually, I helped build the deep learning team, as well. As net now I'm currently leading a generative AI research team at magically but so I can tell you a bit like I didn't work directly on on like it but I did sort of work on essentially all of the deep learning optimization for ml one to make that possible. Right. So those models are literally like three orders of magnitude more efficient than the state of the art models from from circa 2017. But but so if generative aI had existed in anything close to its current form that then Michael would have been incredible, John most monoesters team in Culver City so he had like, an amazing team. Germanos has an Academy Award in motion capture and human modeling. It was an amazing piece of technology. They spent years building it. It's really kind of a shame because basically, her mind really was a mirage. It didn't exist. But but her physical form was phenomenal. And frankly, because we kind of you know, we've sort of pivoted away from that. And so we probably couldn't like resuscitate mica if we wanted to, but it was a technological tour de force. And I have to say it's really a shame that it didn't coincide with generative ag because it would be a mind blowing demo.
Yeah, so some some videos and eat kind of like, I feel like Magic Leap has a history of like, amazing art that has been built in the past and I think that like I don't know like seeing things that move very like naturally and all of that stuff that they made in the past the content for the platform was very, very, very high quality even if
it was awesome because either they they had they had some of the best luminaries from video games, comic books, 3d, like visual effects seems like an amazing combination of experts from so many different fields including again, like like entertainment fields, things that you don't normally find, in sort of, like, you know, extremely technology focused companies maybe more something like Pixar right where you see like, you have all these experts across medical sciences across visual effects movies, you know, just, you know, we had John data, right, who did the special effects for the matrix, right? So we just had, you know, very, extremely, you know, across science fiction, kind of eclectic just luminaries, right? It was it was pretty amazing. But it was way before its time and you know, you know, rest in peace Mica.
Anyone that wants to chime in for a question? I see that. I don't know if I please go ahead. If you if you have a question now. I didn't know maybe I lost some on on the chat.
Okay, I see here Marco. Thanks rolling with that better conversation or does anyone feel this is kind of related threat to public health. Since you can see generate realistic character in AR I think this is kind of related to the conversation we had. It's like it's it's already there. Like Chris pointed out, it's just up to us how to use it. We will see if it's something that you know, like I feel like we can claim the chargeability is like an everyday use. I think most of us I don't know if you guys ever get a chance to use it in your workflow, but I use it outside of my work for some you know, siphoned projects and I have to say itself, it's like it's on my bookmarks. And I think it's a condition that a lot of other people have any just like speeds up your development. Definitely. And it's also a kind of like, what I want to mention is that it creates a sort of addiction to what you want to accomplish because you're kind of like finding solution based on like your constant answer and understanding more about this thing. could be dangerous. Yeah, saves you time. For more productivity. People will always probably choose that path. So it's kind of like a person current situation, but there are still more problems for now for now. Anyone else that wants to chime in for for some other points, or I think I have like I have a personal question if anyone doesn't have anyone got