Everything Everywhere XR #10 - Francesco Colonnese
8:57AM Jun 17, 2023
Speakers:
Alessio Grancini
Keywords:
ai
image
models
bit
neural network
problem
guess
space
nonlinear
big
learn
activations
signal
cat
started
interaction
xr
headset
francesco
type
okay with that, guys, here we are at the everything everywhere. It's our podcast. And this time I wanted to focus on like, full, let's say full episodes about AI. And because it's like a big topic now for everyone, I really wanted to like dig deep into it. And maybe include someone that really works with AI not even just with XR, but just really only with AI. So I'm lucky to have one of my best friends like to be an AI engineer. And so I thought that was like a very probably insightful meeting to have and to since he he tells me a lot about his work and things like that, I thought to include it in this series of episodes that I do every once or two weeks, or sometimes also a month. And I just want to mention that before I start that this is not related to my job, we are both not getting you know, paid for doing this. It is just for education purposes, and just really to also not even views because it's on my Vimeo, personal Vimeo. So it's really just for educational purposes and keep and keep it up with the community. Okay, so welcome, Francesco.
Thank you for having me.
And if you want to introduce yourself just a little bit, before we get straight into song, like, four or five questions that I have, feel free to.
Yeah, so I'm Francesco, software engineer at an AI startup here in West LA. We are in medtech, and we are trying to disrupt histopathology using AI. So histopathology is like the process that that is used to sort of understand how, how a slide might be viewed under the microscope. And this process very often is chemical. And we're trying to replace it using AI basically, is the short story. Been an AI for a while. And yeah, recently we on boarded on this journey, I guess from 2020. And yeah, since then we're trying to grow and really disrupt the space, I guess.
Cool. So like, you've been in AI for quite a long time. And so now it's like it's 2023. And you just go on Twitter, and you see all of these hashtags. And it's happening a little bit. Also with VR and AR in the last years where you know, you before, it was some sort of like, almost like a sub topic and then became very big, of course, because it's the future and a lot of technology go through this kind of processed of like, you know, I get enthusiastic at the beginning. And that becomes mainstream. Right? And when did you start to work in AI? And why did you pick AI versus other topics? Specifically?
Yeah, so. So in 2018, I was in college, and I was part of a club. And the focus of the club was to teach people AI, right. So myself coming from, I guess, a full stack, you know, web application type of background. I guess I had never heard about the topic. It wasn't really popular at that time. And so we started to approach the topic, like I guess, on the mathematical perspective first. And, you know, soon I realized I was like, wow, like this is really, really potentially a breakthrough technology, really, that is based off of our brain, right, which is really something that we don't understand to this day. And so I just thought it was extremely interesting. And I from there, I started doing a few personal projects where I tried to learn to train a few neural networks, right. But I guess this picked up when, when I started working my my first internship was at this Medical Research Institute, it's called Huntington Medical Research Institute. And what they were doing there was they were trying to do time series prediction on MRI data. The technical term is bold, blood oxygen level dependent signal. And it basically what it does is it shows like how, how the brain functions, right? Like what whatever activation is in the brain, right? And so what we were trying to do is take that signal and predict it at time plus one. And so, you know, there's classical approaches to time series prediction, but at that time, this So, this wasn't really new, but it was something that was starting to pick up was an architecture called long term short memory. I love LSDM. And, and I applied that, and we got pretty good results. Right? And, and I was just like, wow, like, you know, for the first time, I'm seeing real results in, in doing AI research, right. So I, I kind of got more involved. And later on, I started a job as an undergraduate researcher at UCLA. And we started doing some work with super resolution. So super resolution is basically the problem where you input an image, and you're basically trying to learn a representation of an up sample version of that image. And so here I was, you know, sort of switching from time series prediction to a problem that really can be translated into image to image translation. And I started training the neural networks, and I was like, wow, like, this is actually working, you know, are people even aware that this problem is feasible for somebody who has, you know, pretty standard Python, and like, computer science knowledge, it was still something a little bit like under the hood, I guess, in within PhD research. Um, and, and yeah, and then, you know, the work at this lab was really focused on histopathology. And there I was, you know, trying to really build a web application and consequent like, I guess, infrastructure, that would allow the step of image into image translation problem that scale. And, and from there, I became a founding engineer for what it is today, picture labs.
That's cool that you started to work on med tech, this seems like, you know, like, it's like a big porpoise. I feel like a lot of use cases. Also, for technologies like AR and VR are actually targeting healthcare first seems like the entry point for a lot of, you know, experimental technology. What, like, I wonder, also keep in mind that this is also educational content, and maybe can help whoever is looking at this video listening? What what are the papers that, in your opinion, are whether the research without going too specific, during your opinion has been tracing like, a timeline of events that brought to what we have today? Yeah,
so I mean, the algorithm that is used for for learning is really backpropagation. Right, which dates back to I believe, 40 to 50 years ago.
back propagation, what is it back propagation
is basically an algorithm that is used to propagate, you know, the, like, Guess the representation that is learned in a bypassing an input into the neural network, right. Um, so we're past. So a neural network is really just like a bunch of matrices. And we pass an input in, and we basically run a bunch of matrix multiplications, and we get an output, right, and this output might differ from what we're trying to learn, right? So we compute the difference, and then we have to understand how to update the weights to minimize a loss, right. And the loss function very often is the difference between whatever you're trying to learn and what has been created by the neural network. And so, you know, pioneers of AI, like Geoffrey Hinton, developed this method, such that it is possible to, to minimize this loss, right, through this series of a matrix multiplications. And consequent like, computation of that loss. I think, you know, another very big breakthrough, a little bit more specific to computer vision, could be computer computer convolutional neural networks. So, the idea there, right, convolution is an operator, a very specific operator that you apply to, I guess, you know, a signal, right. And in computer vision, basically what happens as you pass an image in, and when you apply convolution, you apply it using a kernel, which is just an otter matrix, right? So it can be like a three by three, a five by five, a seven by seven matrix. And there's some numbers right, and you apply this operator. And in the idea here is the neural network starts off and these kernels are basically initialized, and then the numbers of this kernel are learned throughout the training process. And, you know, this has this reach state of the art in problems like classification. So if you're trying to predict whether an image of an animal is a cat or a dog, this has been shown to work really well, right. And so this is kind of like, interesting because in the past, the way that people would approach this problem is to sort of try to use an algorithm to find features in an image, and then classify based on what features relate to a dog, or what features relate to a cat. And the interesting fact is that if you look at the activations of an image that passes through a convolutional neural network, the features that are learned by the neural network, do activate in correlation to what is a dog or a cat. So for instance, in the very high, I guess, layers of neural network, you can see, you know, basically, the neural network focusing on what type of ear the animal has, because clearly, there's a I mean, it's quite crazy how, metaphorically similar it is, to, you know, how the human brain works, right? So like, an ear of a cat differs from an arrow or a dog, if I showed you and you have a cat, you instantly recognize probably most likely that that is a cat, right? So similarly, has been shown that, that the same happens for a neural net for an artificial neural network. So that was an otter a think, kind of like crazy moment. And their research was pioneered by a person called Young lecan. And the guy is very active in AI today, very popular Twitter figure. And so yeah, definitely another pioneer of AI. And then I think, the most recent breakthrough, and that is, you know, trending right now, on Twitter, because this architecture is basically behind all of these large language models is the transformer, right? And so that's a paper by Google called attention is all you need, where they're developing this mechanism called attention, which is actually very interesting. And I, you know, I think there's probably other architectures are out there in the future that will surpass the transformer. But what's really interesting about the transformer is that any introduces explainability to the problem, right? So all of these API's, some people might have heard the term AI black box, right? AI is really different to interpret, just like it's different to understand what's going on inside of a human brain. But, you know, the phenomena of attention is, provides a way for researchers to see where the neural network focuses a little bit more analytically. And aside from that, these architectures are behind state of the art for most problems in the machine learning field out there, including natural language processing, and all the, you know, trendy stuff with large language models.
And other another thing is that like, I feel like for every beginner and Me included when I started to pick up a little bit of this stuff, like I did, like a class on machine learning. And you also told me sometimes for certain problems that I needed in that case, for example, was what I was working on user test data. And I thought, like, why don't we just use a UI to start to get like some things? What would you do with something of this kind, and I work on user testing a lot of different parts of my career, and a lot of it for companies is a very common practice that everyone does. And I feel like everyone needs data. And it's a very important part of, you know, a lot of companies mission. So I think it would be very helpful to explain a very crucial difference between machine learning and AI. Also, in the context of whatever, like the modern use case that you are seeing in front of you.
Yeah, so AI. I mean, some people might disagree, but I think AI is a subset of machine learning, in that what a neural network is, is just a bunch of linear regressions, right? So So you basically have some data, some independent variable on the horizontal axis, some dependency right on the vertical axis. And the idea is in a linear regression problem, you're trying to plug in an x and try to get one All right. So, for instance, what's the problem that can have a very linear correlation? Um, you know, if I go to bed at 2am, right, there's a certain probability that I will be tired, right? So I might want to record a bunch of, you know, I guess, values of when I go to bed, say 12am 1am 2am. And then, you know, map that to whether I'm tired or not, right. And so then I can basic plot all of these values on some graph, right? And the idea is that, you know, the later I will go to bed, you know, the more I will be tired, right? So, you know, as one variable increases, the other variable increases, right? So that's the very standard machine learning problem, y is equal to mx plus q. And so the coefficients M and q are learned, right? But really, you can do this at scale, right? And it's been shown in the AI field that when you do this at scale, this all of these little weights, really learn a nonlinear curve leash, right? So problems that are harder to solve, such as, What is a cat, or what is the dog is extremely nonlinear. It's not like I give your pixel representation. And you can just say, what are a dog is a dog or a cat is a cat in that image? It might be that there's different lighting, right? And might be that there's so many different factors that imply that something is a dog or a cat. And so neural networks have been shown to do to do really well in these types of problems. Right, so that, you know, sort of tying back to what you said about like, why should I use AI for problem, right? You should use AI, if you see a very big nonlinear correlation, right? So for instance, I guess if you take problems like, you know, classifying whether a stock is gonna go up or down, right, it might be that the data set that you're working with, right, doesn't have a lot of nonlinear correlations, it most likely does. In fact, it's probably because of Brownian motion. It's probably like random more than anything. But depending on the data representation that you're working with, the best thing that you can do is probably try to understand, you know, what kind of correlation Do you have between those variables, and especially between those variables in the variable that you're trying to predict? And if you see that there's a pretty nonlinear correlation between those variables, then AI is fairly well suitable for that problem.
So another question that I have is like, you know, like you worked with this technology daily. And I think it's very important that maybe the audience understands what are the challenges that comes with it, and like what it means to have like a pipeline, which is more of a production pipeline, instead of like, just an experiment or some, you know, open source material that we easily find online. But then when it gets to, you know, implementation of serious solutions, there, it's all a different workflow that you need to go through. So maybe you can just expand a little bit more on what you do at picture, and what do you think the audience should know about this?
Yeah, so So picture labs AI. So we basically deal with an image to image translation problem. So we ingest basically, you know, a digitized version of a microscope slide. And we basically color that, so that a pathologist can take a look at it, and, you know, be able to make assertions about whether maybe a patient has a disease or not, whether a drug works on a specific animal, and so on and so forth. And so the way this works is we basically scan this slide with something that is called an out of fluorescence microscope. And so what that is, is, you know, basically, usually, what happens is we put the slide in, and we basically take a scan of it. So all this different tiles are imaged by this microscope one by one. And these images are big, right? They're like, hundreds of 1000s of pixels, and in their image in the autofluorescence space, which parallely is, you know, I guess similar but different than the RGB space, right. So an image is just a 3d array. It's kind of width, it's got a height, and it's got three channels red, green, and blue. And so apparently, you know, the auto fluorescence signal All can be taken out, you know different wavelengths. And each wavelength basically represents different a different type of signal, right. And so we take this image and in then we learn a representation to the chemical, chemically stained counterpart of this image. And the chemical is same counterpart is in the, in the RGB space. So that's why we're going from the autofluorescence space to the RGB space. And we got to do this at scale, right? Because this images are humongous. So it's impossible to fit them into GPU memory. Again, I'm not sure what how the audience is familiar with GPUs, but their memory average is around like 16 to 24 gigs for commercially available GPUs. So it's impossible to put a 60,000 60,000 image on a GPU. So software engineering wise, it's really fun, because you got to crop the image, you got to pass it to the models that should back and perhaps, you know, provide it to the user in a web application. And so our AI problem is a little bit different than all the trendy stuff that you see on Twitter, right. So on Twitter, you see a lot of natural language processing these days, thanks for chatting up, etc, a lot of, you know, diffusion models, thanks to image generation Dali, and all that stuff. So those those models are very random, right? There's a lot of randomness in this model. So what that means is that you pass, you know, a text prompt into a stable diffusion model, you pass a text from him to large language model, what you get is always a little bit different depending on how much on how you tweak the temperature of that model. And so we can't afford that. And the reason why we cannot afford it is because, you know, lives depend on it lives depend on a diagnosis on approving a drug. And so the output of these of our models has to be deterministic 100%. So we have to be able to replicate it. And we can choose any randomness. So I guess that's a very common approach in the medical field. Because we cannot afford any any skew between one output and another one.
Yeah, that's great to hear that like, I mean, this is also common in a lot of, in a lot of disciplines, I feel like when you get to the production side of things, there are all a bunch of problems that come in, and, and you need people that are working on that specific problems. There is not just a, you know, AI works. And that's done. There is a whole series of challenges in this case, like the size of the file for processing, or which which you're doing like, that's, that's, that's a big one.
Yeah, and I think the fun part of this problem for whoever is interested in sort of, like taking part of it, is that, you know, we're building this, like, this is the first time that you know, people are building AI products in the history of the world. Right. So that's very exciting. And I think there needs to be a lot of creativity and thinking outside of the box. Because, you know, how do you deploy models? If you have a set of models? How do you switch back and forth? How do you serve them? How do you provide an API to it? It really never ends, right? Like the field of ml Ops is in specifically large language model ops, if somebody's working in natural language processing is just started, right. Like you see projects like Lang chain, you know, auto GPT. Really, these are, you know, starting just right now, but we never had something like that before. So I think that's very exciting.
So yeah. Do you think the audience should reach out to you they're interested in Peter? Yeah, please,
please, please. We're always looking for talent. So if you're interested in please shoot me an email. That's great.
And I'm gonna go toward the end of the of the interview podcast. And I think we can kind of like, a little bit more re-embraced, XR topics. Yeah. It's cool that I interview someone that really never had any, you know, full, like deep experience with XR, because it's good that we get like one on one topic at a time. Mixing too many things together. Sometimes. I feel like, there is always like that space of exploration. But I'm also interested in like products like making products and seeing products. So you've seen recently, Apple dropping the bomb of division pro, you know, I work in Magic Leap. So you kind of are familiar a little bit with the work that I do. So what do you think AI could in a new model of interaction for future, you know, headsets? What do you think AI could be used for leverage that technology?
Yeah, I think, you know, if you look at, you know, magically, the Oculus, you know, you always have some form of a controller All right. And so this was, I guess, you know, version one of interacting with an alternate an alternate reality. And now you're seeing the vision Pro. And really, I want to this is just a hypothesis, right? But in the future, I believe people are not going to interact with controllers, but rather with their hands. And why is that? That is probably just because it's a simpler way to interact with a system, right? Like, you know, look at the touch screen versus a keyboard, right? Like, there's, it's a much more natural way to interact with that environment. And so, you know, language really, is the, one of the ultimate form of interaction, in my opinion, in the sense that you could just talk to your XR assistant within your headset, and, you know, really just make things happen at a much faster pace that you can touch or type. And so that's very powerful, right? Like, what if I told a large language model one day, Hey, open Netflix and show me the latest TV series on that that have been released, right, that's much faster than having to go on my dashboard, click on Netflix and select, right. Additionally, my AI assistant can be fine tuned on what I like. So without having an interaction with the Netflix application, it already knows what I like and what I dislike. So that interaction takes even less. Right. Um, and then right, we can get into the Farfetch future in which like, systems like, kernel, there's this, you know, company in Culver City that basically uses a noninvasive headset to like, take your brain activations or like more popularly, the neural link, right? Like, that's probably the ultimate, like, we're getting into the Black Mirror type of space. Right? Like, at that point, you don't even have to
tell us, like mirrors. Season Six came out yesterday,
or dead down. I'm definitely binge watching it out over the weekend. But yeah, I mean, that's, that's really where we want to get right. Like, like, minimum lag between IO through, you know, whatever system I'm using, right. And, yeah, I mean, I think there's a lot of power to using AI and extended reality, like, you look at meta meta definitely has an advantage in in that space. Because if you look at the research that meta has put out, I mean, they're definitely one of the biggest contributors to the AI, and we might want to see open source space, thanks to large language models like llama, and otter, I guess, you know, foundational models. So, so yeah, I mean, I think I think the, what I would recommend to the audience, right is to experiment with what we call foundation models, right? Which are trained on large corpuses of data, and try to fine tune it for, you know, their own specific interactions with the extended reality space, because there's a lot of power in that. And again, like I said, like within the larger language model space, we're building this right now. So there's a lot, there's a lot to be built. And a lot of a lot to optimize, right. So I think XR also applies to this equation.
If this is, you know, very helpful, I think that was a very sincere and sober view of the space. I'm going to try to do more of this. And please guys, keep in touch for the next times. Let's see you next time. Yes, thank you, Jessica. Thank you.