20250313 KubeVirt in the Enterprise

RRob HirschfeldMar 14 at 6:47 pm34min

Rob Hirschfeld

00:05

Music. This is one of those fun conversations where we're really diving not just into the tech, but the enterprise consumption of the tech, how people are thinking about it, and how as a technology like Kubernetes evolves, it can get used in ways that the community is not thinking about, and then find a whole new path for adoption and commercialization. And I think we're exactly there with Kubernetes. If this is going on in your organization, we want to hear from you. We want to you to be part of the conversation, because this is a really important transition point for the industry, for people questioning their VMware consumption, for people looking to expand their Kubernetes footprints. So please be part of the cloud 2030 group. Let us know what you're thinking. We want to hear from you. I

Rob Hirschfeld

01:06

Oh, I've been asking the team to show to do some videos of some of the clickup stuff for the VM management, but this is what we were talking about last week, and what I've been doing some research on and getting the team ready for from a from a go to market perspective, trying to think of how much recap to do. I live in this every day. So let me do a little bit of recap. So so we are working on helping enterprise customers use, ultimately, cube vert. But there's a problem with with the with cube vert, per se, as a VMware exit. And the thing that we see with with this is that most of the conversations in industry about open shift or, sorry, Kubernetes virtualization, I'm going to say open shift virtualization a whole bunch, because we actually think open shift virtualization is the right answer for for our customers and prospects here, and it's what we're going to be talking about in market a lot is that the people who want this, who want, who want, who are talking about cube vert, by and large, are Kubernetes developers and taking a Kubernetes mixed mode operation strategy for this. So generally, when we see people talking about cube vert, they're positioning it as a Kubernetes developer, or a platform engineering team or, sorry, virtualization. And so they're, they're very, they're very, they're like, Oh yeah, we're going to create clusters. We're going to do all this work, and it's going to be very Kubernetes typical. You know, here's the CLI, here's my here's my click off spreadsheet, here's my get ups, controls, right? All of those sort of Kubernetes lingo. But what we see coming out of the enterprise is not that people who are replacing VMware, their first concern is, I need a virtualization platform. And so this idea that I need to learn Kubernetes in any, really, in any way I've been questioning with so the idea is, if you're trying to replace VMware with another virtualization platform, and you want to use Hubert. Hubert isn't actually a virtualization platform. It is the equivalent of ESX, and some would argue, maybe not even all the way to ESX, but it's, it's in this analogy, it's not vCenter, it's ESX. And what's what these customers want in a VMware replacement are two things. One is they need the V center equivalent. So they need the management plane overlay, the click ops, the controls, all those pieces, and they don't want to commingle Kubernetes with anything else or virtualization. Kubernetes virtualization with any other workloads. They're going to buy servers to do virtualization, and then they're going to virtual you run one app, one workload, one cluster across all those machines, which is not very Kubernetes as it's because all they care about is virtualization. Make the so you're saying, you're saying, Yeah, that makes sense to

Speaker 1

04:47

you. Are they? Are they caring only about virtual decision, or virtualization, plus network management,

Rob Hirschfeld

04:56

they care about that's the more of the big. Pieces that they will need is some network segmentation and network management is going to be a piece. They're also going to care about storage and VM storage from that perspective. So there's a whole bunch of VM management pieces, like where the VM images come from, how you initialize the VMs. It's an right? There's infrastructure stuff that you need to do when you're managing VMs as a as an enterprise manages VMs, yeah,

Speaker 1

05:33

so the network segmentation on the network management part, and in theory, you can do network segmentation, but you're going to need to understand network policies, and it's going to be a policy based segmentation on not like a subnet based segmentation, because you don't get to control the IP of your VM. That's done by the CNI. The other part is that there are certain things that you just won't be able to do on coordinates, like jumbo frames or like mixing jumbo frames and none VLANs, but again, because that is all abstracted from the CNI. Yeah. So if, for example, you want to run a VM and, like, attach it to ASAN to extend the storage well that that's not going to work exactly like on, on on VMware, or even on on any other like on any other virtualization platform you or whether KBM because it is fundamentally a different model of managing assets.

Speaker 2

07:08

I don't know that you're ever going to get an exact well within degrees right in my little brain. I can't imagine that you're ever going to get a direct Apples to Apples match. And what is the issue with cube vert? Is what I want.

Rob Hirschfeld

07:31

What is the issue with cube vert? I mean, cube is ultimately a rapper, a rapper for KVM. Oh, right. My issue with cube Vert is that cube Vert is, is, is just a way to create VMs using the Kubernetes control plane, okay, which is like, but that's not how the virtualization teams manage virtualization. What they what they need is a virtualization control plane, man management plane, where they can see the VMs. They can click into the VMs, they can create them and stop them, right? They This is they're going to man. They need to be able to manage their VMs through, you know, through a GUI. It's not the only way they're going to manage to manage VMs, but they need a it's missing from this graphic, right? They need a manager. That's why it's right. And this is something that I believe vendors are going to build, right? OpenShift Red Hat is building a Virtual Machine Manager into the open shift virtualization product. That's, that's the product is not cube vert, right? The product, the product is the virtual machine manager on top. And this, this is where, you know, I think things are going to end up being more vendored, right? It's not a matter of an enterprise saying, Oh, I'm going to use cube vert and then, you know, just add it into whatever distro they're already using. The place where they're going to be buying. Because it's not this team buying, it's this team buying, it's the virtualization team buying. They're going to need the, you know, how does the infrastructure work? How do I see what VMs I have? How do I, you know, set the networking and I believe that some of the networking questions that Klaus was talking about, we're going to end up needing to address them or have better solutions for it, which I watched happen, not just in I'm watching better networking controls come out of Kubernetes, not just for this use case, but for AI use cases and some other use cases.

Speaker 2

09:51

Yeah, but here's where, where these charts confuse me. Okay, please one respect. It's a V. Am off ramp. In another respect, it's not. The VMs are still there. So what's substituting for the virtual machines in a non VMware environment, like is there? Is there anything in the config, the setup, the any part of it that differs. If you were to take those virtual machines out of VMware and use this format going forward in Kubernetes, go

Rob Hirschfeld

10:36

ahead class. I thought you were going to say something. I thought I heard it quick. Okay, so I, my belief here is that the it's a two teams issue, if, if I'm a Kubernetes platform team using VMs today and VMware, I don't that. I intend I'm going to keep consuming VMs in the future, and you don't actually care if it's VMware or Proxmox or, you know, OpenShift virtualization, there might be some concerns, but for the most part, the people doing Kubernetes workloads don't need a lot of the network segmentation or the storage or, you know, ha capabilities that VMware has. It's just, you know, VMware the answer for virtualization in Avenue organ, actually, the answer for infrastructure in most organizations is put it into VM and put it in VMware more generally, because they just standardized. So the idea here is that those workloads, those primary dynamic Kubernetes workloads, don't need or want to change out of virtualization.

Speaker 2

11:53

Okay, I was just asking because I mean, when we, when we're spinning up the micro services for the agents, and we're using Kubernetes and Doc and or Docker. There are we do find that there are slight differences when you pull them out of one setup and put them in another, like we just pulled like three or four agents out of what was a test bed of VMware versus Kubernetes. And because we didn't want to be bothered with VMware, and we found very interesting differences. Now this is just a little test, nothing, you know, enterprise ready, but because we found some differences, and we felt okay, so this is going to make a difference, because we're also looking at Red Hat stuff, because have to do it anyway. We did find the performance was different. The way the agents, the way the micro services had to be written, everything was just enough variation to be annoying, yeah, so that's the reason behind my question, because you can just see that, not only on the agentic AI stuff, but on all the security around The agenda, the AI and the agentic AI stuff

Rob Hirschfeld

13:25

there. It's funny, you're right. There's this stuff is not entirely This is why VMware so hard to replace. It's not, it's not a, you know, straightforward, one to one migration Linux. The Linux stuff is a little bit easier, but some of the windows stuff definitely isn't. And then, you know, one of the comments that we've made in the past is, right, there's an ecosystem of vendors who certified on VMware that aren't ready to certify on, you know, other systems for a whole bunch of performance reasons. Right? You might not, you know, how do you guarantee the appropriate resource reservations and you don't have noisy neighbors? There's, there's a lot of stuff coming that has to be, has to be managed.

Speaker 2

14:16

You know? Well, in part, this is why we decided to create agents that spin off agents, like empty agents, okay, kind of thing, right? You got your framework, you got your Kubernetes, you or your Docker container, and you've got all that stuff in there, but then you can start, as long as you can get the container is to play nicely together in the same sandbox, because the idea is that it's a choreography. You then also need to have, not only the blank ones just spin up new ones, but you have to be able to put them up and take them down very quickly. Otherwise you're. Collecting a huge amount of data, storage need and requirement, and a huge amount of memory needed requirement. So what? Again, a lot of agentic AI is being built that way, just FYI.

Rob Hirschfeld

15:18

So does that, but I mean, for them, they would, and this is, this is, to me, part of

Unknown Speaker

15:26

last class.

Unknown Speaker

15:28

I guess, boring.

Rob Hirschfeld

15:31

I think it's interesting. He He gets pulled into other stuff sometimes, but and his and his and his knees are slightly different, because he's not running these pieces this. So where we this is weird, because this is part of my evolution on this. I would have answered where you are coming from. You don't need virtualization at all. Just put the stuff on bare metal and run the clusters without the overhead before, and then you're going to have more ability to tune the overall environment. So I would have, I would have been, you know, previously, more in this, in this idea of, hey, just run Kubernetes bare metal, and we can get rid of VMs with this here. And and I don't think that's wrong, except I think the more time I spend with it, the more, the more it feels like the bare metal Kubernetes workloads are going to be like this open shift workload, they're going to be much more static over over, like a special use cluster. And so I think the extent to which you're describing, I have an AI use case, and it needs specific resources with specific capacity and proximity. You might say, You know what? I'm just gonna buy machines to run Kubernetes for this use case. And that's, that's all I need. Wait what this is. This is the thing I'm not entirely sure of from a market perspective, is right? Because this is making this this graphic has an assumption that you're going to use a cluster manager to manage all your other Kubernetes clusters, which is what Red Hat's telling you to do, very expensive, actually extremely

Speaker 2

17:27

expensive, and especially for you know, think about a fact that you may have. Let's assume, for example, that the customer that I'm thinking of is a global manufacturer and has 20 facilities worldwide, right? And managing that cluster, or a cluster per facility, would be extremely expensive, so I mean, I'm looking at it from from the point of view of if you can spin up agents and kill them after you're finished that particular task, because it's, it's a choreographed event, and it's not as an, I use the word choreographed because it is many agents that can run at one time, in parallel and or agents that can tap other agents to get the same result. As long as you get the result, how you've how you've broken the tasks down and reversed and reverse engineered them, doesn't really matter. As long as you're getting the data out. You may not need the cluster management, but from the distributed point of view, I don't know how you would not have chaos if you didn't.

Rob Hirschfeld

18:47

Well, the what you this is the challenge with how people have been deploying Kubernetes, which is, which is, it's not just you have 10 sites. You could we, could easily help somebody build a Kubernetes cluster on each site. But that's not what people are doing. What people are doing is they're saying, I'm going to have 100 clusters on each site, and they're going to be dynamically changing and all that stuff. And so the assessment that we ended up with was like, I'm not going to bare metal, that I'm going to put a virtualization cluster in. I'm going to treat that like infrastructure, and then I'm going to use the I'm going to subdivide it, you know, I'm going to have all that this, the cluster management running in top, on top. I'm actually starting to question if this idea of a cluster manager going to specialize bare metal and back like this is even that practical. Like, if

Unknown Speaker

19:42

I would, just from my perspective, it's not

Rob Hirschfeld

19:45

so. So what you're suggesting is, if we can do, if we can do the, if we can do this effectively, which we can right, yay. I'm building cluster manager, right? All that stuff. Great. And. I'm not going to do this. I am going to come back and say, You know what I I believe. And I'm not sure I would how much I would change this graphic to do this, but I would, what you're suggesting is I don't need that. I'm just going to go here and replicate right? It's basically, you're just like, yeah, just, just run the cluster for me, please. I really don't want to. I don't want, I don't, I don't want to need the cluster manager on the bare metal side. Well,

Speaker 2

20:28

the you are correct, and what I would do as well is, from the agentic point of view, the agentic AI point of view, because agents are built to be run autonomously, you could just have it constantly executing against the bare metal without all the workload clusters and whatever

Rob Hirschfeld

20:57

correct no you could. And conceivably, you don't even need Kubernetes for that.

Speaker 2

21:04

Well, I think there's two points of view. When it comes to that one, it's really about I think this is where the issues of the network kick in. How much latency do you have in either situation, right? You can build a bunch of different use cases for that to test

Rob Hirschfeld

21:32

and and in those cases, do you still need a control a manager or control plane? What's your thinking? You

Unknown Speaker

21:39

need a choreographer? Okay?

Rob Hirschfeld

21:44

Because that's why. And from my perspective, I wouldn't fight. I would just, I would probably tell people to use core Kubernetes as that choreographer, rather than trying. You mean, you could use other stuff, using us for that while possible, just doesn't seem particularly, you know, it's not what we do. We're spinning up home. We're dealing with whole machines. We're not dealing with distributing workloads across machines. So just use Kubernetes. But, but the I mean, you and I keep coming back to the same idea. It's like there's one, is, there's a cluster manager pattern that people are using. But that is, it's expensive. It has a lot of overhead. It requires you to do all this work if you know that you're if you're like, Well, I just have this one workload. I want it to be high performance. I don't want any other concerns. Just make the workload run on Kubernetes directly. This is, this is the fight that fights may be too strong word this is, this is the thing that I think we're watching, like the Red Hat Kubernetes OpenShift teams, and Kubernetes, more generally, keep going back to it's like, we want Kubernetes to run Kubernetes and yay, you know, all that's great. And we're like, for people who just want one workload, doing one thing, really high performance, they don't want to commingle it with a whole bunch of other stuff, they're just going to run that one thing, the different use cases.

Speaker 2

23:13

Yeah, I don't know that. I agree that the trend is going that way, though. Okay, I think people are going to want to cohabitate, if you will, because as AI continues to grow and expand, and you look at the diversity of the data sources from which you're going to be cherry picking information or data points to contextualize them in a semantic layer, it would be much more difficult and isolating, I think. And I'm having I'm thinking out loud, I'm not, I haven't proven this yet, to do them on individual workloads, as opposed to or an individual cluster, than in in a segmented and sort of siloed way, than to do them as commingled and cohabitant. The the

Rob Hirschfeld

24:08

challenge that I see with com, with a multi tenant cluster, if that's if you'll accept me calling it that

Unknown Speaker

24:18

for the moment. Okay, okay, well,

Rob Hirschfeld

24:19

this is, this is, the challenge is that what, what I've seen in the Kubernetes marketplaces, and the way people are doing Kubernetes is they end up spinning up a whole bunch of CRDs and other services that are, that are version specific and also application workload locked. And so what the challenge? The challenge has been Kubernetes multi tenancy doesn't have any it doesn't overlay with CRDs or add ins or things like that. So you end up with a my Kubernetes cluster requires these auxiliary services and. Capabilities, and they are, I need these specific versions inside of this, inside of this cluster, and there is no mechanism by which all of that ecosystem can co exist for workloads that need different versions of that, or different life cycles for that.

Speaker 2

25:19

Right No, I hear what you're saying. I mean, this is, this is part of the challenge of coming at it from a very different perspective, which is AI and agentic AI, because there it's kind of like, think about an image of free radicals. You know, they can all know well, because they can swarm together, they can swarm in parallels. They can, you know, re, recreate themselves, almost in a lather, rinse, repeat with where each each repetition or iteration has a slight difference. And you can, I mean, you can put a lot of agents together, as long as you have what would be tooling agents and running, you know, like routing agents and whatever, whatever, as long as you have those basics, then you can pretty much think about An infinite number of combinations and permutations and that that translating into this of what I'm looking at on the screen is like. So you have infinite at one end of the spectrum, no pun intended. And you have, I said no, but intended. And you have, um, five at the lower end, well,

Rob Hirschfeld

26:45

and now, now you're hitting, now you're hitting. The core thing that, to me, is the mismatch. It's like the Kubernetes teams, the platform teams, are like, yeah, we're going to spin up clusters and they're cheap and go crazy. Lots of clusters, yay. And that that, if you went to KubeCon, that that's the assumption, oh, if you go to a virtualization team and you're like, you're gonna have a whole bunch of dynamic clusters and all that, it's like, no, no, no, no, we're gonna have a virtualization cluster and we're gonna have VMs and right? And those VMs might come and go pretty quickly, but the cluster, and the boundaries of the cluster, and the equipment for the cluster, and the configuration of that cluster, it ain't changing very fast, buddy, it's it's gonna be no no,

Speaker 2

27:30

right? And that's where, that's where, like, the notion of guard rails energetic starts coming into play. Like, I mean, we created, I created 30 different agents for why AI, and one person said to me, you're out of your mind. That's so much to manage and so many patterns that you could put together. And I said, well, not really, because you wouldn't be mixing and matching all of them, you would be taking core, adding two here and three there. And that's a very dynamic pattern association that the AI is going to figure out for you anyway, in this scenario, I this

Rob Hirschfeld

28:19

right, well, that's, that's part of, part of the thing, I think that gets interesting is, but you're not the agents aren't going to be spinning up whole new clusters. They're just going to be spinning up new agents, right? New workers. You're going to have a bounded, a bounded system.

Speaker 2

28:33

Well, actually, you strike a chord, because you could technically an agent could spin up a cluster. It's all about what you put inside of it. That's true. Okay, sure

Rob Hirschfeld

28:53

spin up a a cluster in order to get more resources back to its you know, for for additional computational needs, yeah,

Speaker 2

29:04

yeah. That's why the hardest part was actually creating the blank agent to spin up other agents. Because every time you spin you create an agent, the first three lines of the micro services are basically the same, and then it's the programmatics, the data sources that it's hitting, that you may want to, that you may either call another agent that's dedicated to a particular data source or something else. But technically, yes, you could spin up new clusters and kill them off and bring them back.

Rob Hirschfeld

29:42

Sounds like it sounds vaguely like a paper clip problem to me that, but yes, I think you're right. Yeah, the agent, Agent, you know, consuming all your available resources to solve a problem because it's, you know, because it needs to solve the problem. So it's gonna use all the resources they can get it all up to solve it.

Speaker 2

30:04

Well, we've limited to what the data sources are, and we've put sort of guardrails around the I don't want to say memory, but call it tooling that either calls would would call would require a new agent to be spun off, or a new cluster, or another virtual machine. So you're talking to it would be like, I need x, and you have to talk to four things to get. Why?

Rob Hirschfeld

30:43

Right? No, you it's got to have free reign to go do that, do those pieces. Yeah, it's a funny it's a funny component. But the just the base system you're gonna, you're gonna need a pool of resources for it to be able to go and start doing its its thing.

Speaker 2

31:03

Yeah, and you need a pool of resources that is generic and like white paper, because the data coming from multiple sources, it will be in all sorts of different configurations and formats, but we're not that's part of where the secret sauce comes in, but irrespective of that, yeah, it actually does unlock the potential of enterprise systems to be far more useful in creating cost reduction or revenue or growth than you would expect them to be this is part of why SAP created jewel. So you can start looking at all the features and functions, not just the five you actually use and pay a million dollars a year for. Okay, yes, you will you will see more.

Rob Hirschfeld

32:09

Hello, I'm Rob Hirschfeld, CEO and co founder of RackN and your host for the cloud 23 podcast. In this episode, we talk about, really question how people are going to consume cube vert or Kubernetes virtualization. What's missing from that equation? How could you be thinking about these workloads as different than we already think about Kubernetes workloads in general? And then we go a little bit further into more general ideas, where the oh, I have a lot of clusters. Oh, I spin up new clusters. Idea of how Kubernetes community is currently working might not work or might not be the right way to approach some of these more captive, long term workloads, like a virtualization workload. Hope I didn't give you too many spoilers on the discussion. I know you'll enjoy back and forth on it. Thanks. Thank you for listening to the cloud 2030 podcast. It is sponsored by RackN, where we are really working to build a community of people who are using and thinking about infrastructure differently, because that's what RackN does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly, or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. It's all part of building a better infrastructure operations community. Thank you.

00:0000:00