Rob, hello. I'm Rob Hirschfeld, CEO and co founder of RackN and your host for the cloud 2030 podcast. In this episode of the tech ops series, we plumb deep into the mystery of Kubernetes installation, specifically OpenShift installation, because that's what we're focused on. But we are going to help explain why Kubernetes installs look so weird compared to traditional operations. Install processes, where are the playbooks? Where are the scripts? Are the Runbooks describing all the steps you need to take. All of it seems to be missing, and in this podcast, we explain why
topic of the day is going to Red Hat open shift. But before that, I always like to start news of the day as sort of an icebreaker. See if there's anything that is on people's mind in the news that they want to bring up for discussion. The one I was going to talk about is the one that just jumped on the RackN internals, which was the Red Hat, 10 immutable image pieces.
But I'm happy to just jump to straight to Kubernetes, somewhat related anyway, from that perspective.
All right, let's talk about Kubernetes,
specifically open shift, and what I was hoping to do. So I should lay out what my goal is for the meeting, and that'll hopefully influence the discussion, although always happy to have a broader discussion, I am trying to create a small, short explainer video that explains how open shift gets laid down in an environment. So what the challenges are, what the prep is, what the actual process is, so that you can, we can sit down and sort of say, these are the steps that that you're going through and that we're automating, and the hairy details are helpful, because, like the original Kubernetes the Hard Way by Kelsey Hightower, that in some ways changed Kubernetes trajectory. I don't have that broadened ambition for this, but the Kubernetes the Hard Way, writing and analysis really helped people understand the breadth of what it looked like to get Kubernetes running. And so there's an element for my goal today of I want to understand the OpenShift the hard way conversation. Works and all that from my
Yeah, I was gonna say, from my perspective, it's like any other complex platform story path, which is once you have all of these things in this state and this functional set of resources. Now you can do right and so the Kubernetes OpenShift story is you have all of your hardware. Your hardware has the right firmware. It's booting correctly. It has the right BIOS configuration. It has the right right, you know, storage has the right networks. It has the right is the storage marshaled into disks that are usable if they're on RAID controllers. Do OSS deploy on them? Successfully? What OS do you deploy on them successfully? How do you configure those Oss? And now you can start approaching the actual platform problem of the story, which has its own problems. We saw this in OpenStack, right? Open Stack was the same thing of and then how do they go about solving that? They go about solving that by Well, let's put OpenStack on OpenStack. Let's put Kubernetes on Kubernetes. Let's put Docker, and Docker, let's, you know, and let's add more layers of complexity to it to minimize the surface area of what Kubernetes has to deal with for orchestrating things to make them useful and consumable,
right? So, so there's the additional layer here to get to, which is the Kubernetes installing Kubernetes question, I want to parking lot that a little bit. I mean, I'd like to get. To it, but I want to, I guess, lay out an assumption that I'd love to hear validated or challenged, that even if you're doing Kubernetes on Kubernetes, installing Kubernetes on bare metal, requires certain stuff to be done, and certain knowledge about what the systems are. And I want to, I want to be specific and elusive and name those things. But while there's a different system driving the install process, I, you know, I'm assuming, please challenge that fundamentally, you're still installing Kubernetes. You're still following the same process. There's no magic that a different orchestrator is going to pull off. Go ahead,
it's a story, and it, I mean, it's a pretty good story, but it's a story that starts at chapter three, or maybe even chapter four,
the Kubernetes, the Kubernetes installer, Kubernetes on Kubernetes, yeah. Okay, well, and from that perspective, it's still starting on chapter four, and the first couple chapters are still pretty much the same. That's right, that I'm asking that's not turgid. I just turned that to a question, right? The question that they should still be the same steps. Think there's, there's some to my knowledge, there's some difference. If you have Kubernetes on Kubernetes, you can stash your secrets in other places. You know you you have a different place where you're storing information that's driving the cluster build, but you're still getting information building certificate. That's what I want to walk through first, and then add in that layer, because I think the chapters aren't that different, as much as I think the Kubernetes community might want to convince you otherwise. I don't think the chapters are that different. I think
it depends on where you start. I mean, if we start even just the fundamental OS layer. I mean, you look at some variants like italos, or even what Red Hat's doing with Red Hat from a core as perspective versus a full rail distribution or full Lubuntu. So there's nuance there, in terms of how I initially get that box on and boot it up from something that is usable at the OS level,
to my knowledge, I open shift is not as only supporting for OS, for their distro, their for their running distro.
It's a mixed statement. Rob, okay, please. So with regard to that, specifically, the core OS is the path that you will get the most manageability out of you can bring your own RELs, if you will, but it's viewed and treated more as externally provisioned systems, and so you, quote, lose the automated, controlled update procedures that they're building into the Kubernetes systems, right?
So, so there's a degree of simplicity from the immutable OS is that are that are being taken advantage of by the install orchestrators?
Yeah, a fair way to say it, or I guess I would say it more as they're they've built automation and tooling for deploying specifically CoreOS, at least for OpenShift, specifically, right
the tighter integration,
yeah, product perspective and so from that perspective, if you want all the bells and whistles, then you're using that path. You can choose to have less bells and whistles and go other paths.
You can install OpenShift on Red Hat, sure, but you know, if you wanted,
yeah, and the point I'm getting at is, okay, then that system will show up as externally provisioned, and then ACM will not necessarily use it in building a new cluster or that kind of stuff, right? And you won't get the rolling upgrade capabilities and other stuff like that.
You expand on what, what externally provisioned it means
well. So all of this, all of my comments will be from the OpenShift person. Effective because I suspect there's analogous parts to Kubernetes, but I haven't been looking or trying to learn that path. And so from the perspective of the metal cube Cappy provider, when it's creating resources or has resources presented. The nodes that represent the state of the machine, relative to Kubernetes or OpenShift get marked as externally provisioned, which means that they're provided you've built them, they've got containers running on him and all that stuff, and they've joined as a worker or control plane or a storage or whatever the usage is, but from the perspective of the open shift bare metal controllers, it's dead to them. And so with regard to like ACM and some of the cluster auto scaling and stuff like that, you start to lose functionality because the controllers that are responsible for that function don't have resources to work with. That makes sense.
Yeah, you usually will hear it as UPI or IPI, user provision infrastructure or installer provision infrastructure. So really comes down to the level of management that the given system can have over the infrastructure, particularly in cases when the installer or the tool didn't provision the underlying infrastructure. So if it didn't provision the underlying infrastructure, it doesn't know a lot of the all of the appropriate operations to perform, to upgrade and all the things, from a life cycle perspective, with the infrastructure, because the user provisioned it. And so the inverse would be as if the installer did it. You told the installer, build me a Kubernetes cluster. Now it knows how to upgrade nodes and expand the cluster and shrink the cluster, all those sorts of things.
I've heard the terms UPI and and IPI more generally. Like I was talking to somebody from Nutanix, and he was his, he said, Oh, okay, we only have a up. I for metal. That that's, that's that. I've heard the lingo too, martes. That's
yeah, it's a general term that a lot of people have gravitated towards.
Sounds like something you need antibiotics to treat personally.
Yeah. On the other one, I forgot about as I thought about
it, was sorry, that joke landed so flat. Oh well, I
forgot about Windows. You happen to be wanting to run Windows containers. That's its own set of fun. Okay?
So, I mean the goal I I'm assuming, for most people, that the you, the user provided is a if that's the stop gap here, versus the installer provided or the fully managed
it depends on how I want to manage it. So that goes to the do I want Kubernetes clusters to build my other Kubernetes clusters, and then I run into a chicken and egg problem. Or do I want, potentially, an external system to be the initial bootstrapper, and then I just want to run a Kubernetes distribution like OpenShift or RK or insert your other flavor and not necessarily be tied to their installer? The challenge becomes, from an industry standpoint, they're starting to be a much tighter coupling between the installer provisioned infrastructure that where it makes it very difficult, that I've seen, to be able to install it on your existing infrastructure, or infrastructure provision outside of the installer.
So you're saying that all of these integrations are creating a challenge for people who have, have to, have, have to manage machine, life cycle somehow. Is that? Well, it's the I clarifying something that I'm that I'm seeing. I want to make sure you and I are, I understand what you're
saying. Yeah, mine would be, it's the classic it challenge of, if I get a product from a vendor, how much do they try and fully get integrated into their ecosystem and the way in which they do it, versus allow it to be flexible for me, to align it with my processes and my workflows?
It's basically flexibility that takes more effort to configure, versus for the lock in.
Well, right? I mean, I guess this is the thing is, if you're going to say, do all that for me, if you're provisioning against a virtual or a cloud system, those are pretty well known target. Targets. If you're turning around and saying, I want to do that. I want, I want the built in orchestration to manage my own infrastructure. It usually means bare metal because they have interfaces to VMware and Nutanix and, you know, OpenShift virtualization, then that really what we're talking about, the free specific use case. And so what I want to discuss, so it's good, but saying, you know, the as they take more and more built in on orchestration and rules into those the self installers, then that means that the the self and the self managed. Part of the story has to do those extra things. Is that a fair statement? We're getting smarter with the installers,
you will. But like all things, if we take a just a very practical example of as an example, you said, the virtual and the cloud are well defined patterns. They are but let's say my organization tags things a certain way, and the installer wants to tag things a different way than the way I tag things. Well then how do I address that, that conflict of difference, or you translate that across of let's say the installer deploys workloads onto Azure, but doesn't expose certain settings for how I would like my instances or virtual machines configured in Azure in my environment. Then I either have to hack some way as a post deployment process to then come back and apply what I would have wanted to apply as part of the initial deployment of the workloads. So those sorts of things go on and on and on,
is that us like when we did our first copy work, one of the things that we found was that there was a ton of need to pass very environment specific information through that system. For is is that a similar issue, where, right, all of these chain things together, can you can lose or flatten data, and that data might be necessary downstream. Is that a, I would say it's not saying that right?
I would say it's even more sort of fundamental than that is you wrote the CAPI provider. Were you aiming for speed and just overall tight integration with the way in which you think it should be done? Or were you thinking to be as flexible as possible, given the possibility of somebody might want to use option or variable number 15 as opposed to variable number two?
I can answer that if you'd
like. I would love to hear your answer. So
with Digital Rebar, kind of our whole thing is like composable automation and being able to define things to your own tooling. So we were focusing more on your latter comment about being able to interface with everything. So our cluster API provider for Digital Rebar is like legitimately just a hook into Digital Rebar automation. You can't use it without, like, a bunch of parameters. You have to specify, like, what operating system pipelines you're going to be using, what params are needed. You can specify what configurations are going to be run, like extra tasks for your load balance or whatever. But I found that with a lot of the other providers, really, you're just getting the same kind of configuration that you get with something like Ansible or TerraForm, where you just say, this is I need a VPC. I need whatever, these IPs. And I feel like there's a trade off between like developer usability, where you just want to provision some VMs, or, I would say platform engineering usability, where you're actually managing the resources that you're like the infrastructure provider actually needs. So there's definitely a trade off. Yeah,
are thinking through, yeah,
we're talking we're pulled to the shiny light of the Kubernetes orchestrators. Can Do you mind if we step back? Because Isaac, you're describing a whole bunch of need for data and configuration settings that I think are typical of the environments that are harder, harder to configure or have more variability. I still want to step all the way back to chapter chapter one, and talk through that what a bootstrapping cluster looks like like. What are people going through just to get the first cluster running? I think that that's an important part of the conversation. Before we talk about the orchestra installers, orchestrating. Yeah.
Uh, yeah, I feel like with a lot of these tools, like they're made for a general purpose, and people fit them to their need, rather than having the tools be like special purpose, which is, it's frustrating in some regards, where you can't just have like a defined use case for them. Like, if you have, like, tools for a car, you're like, Oh, this is obviously a car tool. It's not just like a box of metal that, like hooks together. And it could be used for a car, it could be used for a fridge. I think Cappy is kind of like, it allows you to do both. And I think open shift has resources that kind of put it closer to the former, but it has somewhat of the flexibility of the latter if you needed it.
Yeah, wait, hold on. Hold on. I want to get back to that. I'm I want to first go through, you know, Kubernetes or OpenShift the hard way, right? I show up with a piece of metal and boot it. So, yeah.
So this is where OpenShift has done a lot of work of late, and per the IPM, IPI and UPI story Red Hat gives you both now and so you can take a bare metal system and follow a set of procedures and get an IPI based installed system that will eventually bootstrapped itself into existence. Okay, let's Pat and today that looks like all right, go find a Linux system, get pod man, run the bootstrap container VMs on that system and feed it the IPMI and other credentials for the machines you want to create a cluster, and voila, you'll have a cluster that's IPI and self managing once it's done, because The Bootstrap system will either eat itself, or, depending on which method you use, it just, you just kill the VM now you're off and running. Or, in some cases, the bootstrap machine gets reworked into a worker.
So, so in that model, the IPMI you're you, you're feeding it a list of IPMI addresses for machines, right? That means that you've configured machines with IPMI. You've given them addresses have, right? You've you've done, you've done work for those systems before you feed it into the IPMI to to make them at least bootable on the network you've put them on, there's, you know, there's no BIOS or RAID configuration. So you free, you prep the machines so that the bootstrap installer you described can media attach. Like they have to be media attached. Ready? Media attach ISOs to those machines. Can power them on. I'm assuming using red fish
that or the action protocol allows for red fish. I direct, del direct, it's all sorts of ways.
But yeah, they have, they have a variety of out of bands, so you specify that in the IPMI inventory. So the IPMI inventory has, has, has collected that information, you've you've put it on some type of of management, some management or data plane network, and that that's going to allow you to bootstrap through that process. Because the OpenShift installer creates ISOs,
all right. Reference that, right? I mean the it becomes a story of how constrained Do you want to make yourself, right? So today, I could say, Hey, if you have internet access and seven machines, or six machines, or however many, they've decided is the kind of default minimum and a and one of those is a Linux system that can run containers or a keyboard library, VMs, you can have a built open shift cluster in couple hours depending on your network speed. But like you said, to do that, that way, you would have to have IPMI configured enough to have addresses and password, usernames and passwords. Yeah, all right, so that's a path, right? Is it scalable? Is it automatable? Right? All of those questions then become, how you have to then decide whether that's effective, right? The other aspect of that is you have to decide, as operators, what are you concerned with what are, what are important, right? I think Martez mentioned, what are your workflows, and how are you integrating those into your paths, right? So, like, as we talk with customers, one of the biggest challenges is Red Hat is telling them OpenShift can manage bare metal. Well, okay, it can, in the sense of, I can, if given BMC credentials and addresses it can Virtual Media, amount and re install itself. Okay, that is a form of machine management, but you have to then decide, does that meet your organizational needs, and is that sufficient? And right now, one of the biggest challenges from the if you were thinking about open shift, the hard way, right, or the easy way, or whichever way is if bio settings or those kind of things were important to you, that is going to be a challenge for you,
or potentially multiple networks or correct other pieces
and so, all Right, so now you have to make other decisions about how you're driving that. So in some regards, some people then go, Well, I'm going to go UPI, so I'm going to change the line, and I'm going to condition my system. I'm going to deploy Red Hat, and that gets me my previous process workflow mostly work because I was able to deploy Red Hat before, or whatever, and now the applications enough to go push it, but that may or may not meet the long term needs, because now I have to manage open shift life cycle on its own, separate from necessarily the open shift tool. Okay, so I made a trade, and right now, if you're looking for what, like RackN, considers full hardware lifecycle joined with automated OpenShift lifecycle, there's a gap. I mean, we're trying to fill it, obviously, but
there's a gap, but, but so when, when we automate an open shift install, we right. We already have a complete, fair metal life cycle system. So we're going to, we're going to bootstrap, discover inventory, all of those systems, and we have a way to install target operating systems, Red Hat or CoreOS. How do we
so go ahead. So from rackns perspective, our initial pass at building open shift virtualization or open shift ACM clusters or cluster is to follow our normal processes for Creating and conditioning bare metal, then automating OpenShift UPI process to generate a system or cluster, right? And so then we use our orchestration capabilities to synchronize, bringing up machines and all the other things that need to happen for the UPI process. And in some case, it's actually using the OpenShift installer, which is common between the two. It's just, we leave off the section that is management, because we've already managed the metal. No reason to do it again, right?
But, I mean, I, I was under the impression that to do this stuff, we needed to generate certificates and DNS entries and and Norman Red Hat needs, you know, super Dundee's needs that stuff. So anyway,
requirement of the installer, the open chef installer, specifically, is to have DNS entries for the control plane and worker and the VIP for the API and system, and it needs to have a DNS entries and a load balancer. Now, those can be externalized. So a lot of times, if you read the API process or OpenShift, and even Kubernetes, it's go add these 28 entries to your DNS server based upon what you know the IP addresses are going to be, and set up your load balancer to point these, this one VIP at these three addresses for these names.
Okay, so. So
we say that's it, right? Because, well, that's it, but Right? Automating that, driving that into a consistent, repeatable process, doing that at scale, trying to do that more than one time, right? All of those things
become for enter, for enterprises, doing that for for new systems, or lab systems, or for, you know, machines before they're there, they have an OS installed on it, is often a very big challenge, correct?
And so that's so there's that, right? And so we choose to take advantage of rackns capabilities and install and create a load balancer that we automatically populate and fill in Dr, PS, DNS capabilities so that it's available. All right? That's fine, but as we start looking at Enterprise stories, that's not necessarily accessible, right? So part of our workflows and pipelines are getting altered so that those can be externalizable. Call outs to configure external systems, right? If I'm an enterprise and I have a five load balancer, right? Why am I going to set up another service to do that? I should just use my load balancer that I paid good money for,
assuming the teams will collaborate. Yes, but that is true. You want to hand off, you're going to hand off to your enterprise load balancer at some point. Right?
And the same thing with with DNS right, whether for right or wrong reasons. I've purchased info blocks, so I'm going to use it all right, fine. So instead of necessarily configuring your P's your DNS right, there may need to be a call out to say, like, make sure these exist, or go call them out to automatically add right that becomes part of the control process to get towards your low, touch zero, touch automation.
So from from there, so if we're, if we're doing it, are we still going to use the red, the open shift generated ISOs, and let and have them, because it just booting core. OS does not make a cluster unless each one of those ISOs has predefined role and scripts on them to do that work, something somewhere has to initiate, initiate the process, yeah, for
open shift, especially when you're talking about the IPO IPI path. UPI is a little trickier, though, if you UPI with core OS, it's much easier. Is they've got their install process down to take a base ISO of core OS or Red Hat CoreOS, and we're going to generate two ignition files, well, three, really, but for you to run, so take said ISO boot. It pointed at that ignition file for whether or not you're a bootstrap system, a master or control plane or worker, and voila, you now have a building system.
Do you want to see it?
Not at the moment, I'm interested in the process.
And so with regard to that, again, focusing on OpenShift right? OpenShift says we're easy to install. Okay, we can laugh or chuckle or whatever, but it doesn't really matter. From their perspective. They've got on their website a document that says, go get this ISO, go get this binary and this other binary. So you have to get OC you have to get OpenShift installer and an ISO. Now you need some way to serve that ISO, right? But this is where they say, like, Hey, if you have a VM, start the VM with the ISO point this way, right? There's a whole set of things that you then go through, right? One manual step, well, the next usual step is you take your OpenShift installer and you build a config, yaml, well, your Kubernetes, so everything's yaml. So this is a surprise, right? So you build a YAML that defines what you want your cluster to be, that can be simple as a few lines is as efficient to get going. It won't do necessarily any customizations, and your hardware better be spot on, but you specify enough networking and stuff to be able to spin up new containers. All that other stuff, and you populate enough of the host names that you've set up in your DNS that it can function now that install config can go crazy. You could specify the IPMI, username, password, stuff. You could specify specialized network configurations for your systems, you could specify storage or configurations. All those things can be specified into that install config, right? So then one of the commands you run is a thing called manifests, which creates a bunch of Kubernetes CRD files. And then you run a command called Building mission files, which builds a control plane ignition file and a bootstrap ignition file and a worker ignition file.
This is assuming you have to do this against the feed Kubernetes cluster. No, in order for any of this to work,
no. It's all standalone. It's a standalone binary. Go binary. In fact that okay, builds these files and then it says, Okay, go run a VM using core OS, have it boot the bootstrap ignition file and get it running. Now, go boot all the other machines you want to core OS with their appropriate ignition files.
And if you did it, well, this is just sounds very much like a giant Ansible inventory
story. Kind of, it's not Ansible, but kind
of, not but, but it's the same thing. It's you, you write a big file that specifies the, sort of a static layout of what your cluster looks like, and you pre build everything in your environment to match that inventory file. And then when you execute that inventory file, it's gonna, it's like all the pieces are gonna sort of fall into place, because you've you've arranged, you've arranged everything before you started.
Yeah, but in true Kubernetes style, it's all reconcile and wait for it to show up pattern, right? Sure. Well, that's different than Ansible, right? Ansible is more prescriptive, go forward versus Okay, based Ansible, but whatever, right, right?
It's, it's, you've, you've, you've, you've assigned a whole bunch of target desired state well, and it's gonna
continue the process, right? They start the bootstrap thing, and you say, set it off running, and then you start all the others booting. And then they synchronize themselves, and you basically run the OpenShift installer program against the bootstrap system and saying, Are you done yet? Are you done yet? And it sits there and watches and waits for everything to bootstrap so it's now you've specified a registry, and if your air gap, That's horrific, because you have to figure out how to close three across an air gap boundary and all sorts of other stuff. And if you're right, and depending on how you're choosing to boot these things right, you're letting the bootstrap system be a provisioner and have network controls and DHC, right? All that stuff. Well, I guess, since if you're using the Virtual Media mount, you don't have to worry about that. You just have to have BMC access, right? Okay, so now you're stuck using Virtual Media mount. Some organizations believe that's a security exposure that you shouldn't do, right? So they want to do network booting with more directed control over what images they boot. Okay, fine, and your choices. But the point is that out of the box doesn't necessarily do that. It's all the Virtual Media mount stuff, so then once processed all through these open shift install commands. There's two primary ones you have to do well, you have to do two to set things up, and then you have to run two to actually complete the cluster. And then you wait for that to finish, and if it fails, it poops out, something like 60 log files that you get to go scrape through, scrape through to find out what broke. But if it works, then you have an open shift cluster running on those components being functional, that's ready to then have other services deployed into it, like OpenShift virtualization, which once you have an open shift cluster, is pretty straightforward. You start the open shift virtualization controllers, and now you have that. Now it, if you want that to be performant or scalable, or have shared storage and all those other things, then you had to conceive, or at least set up your appropriate storage platforms and all that other stuff.
Back, you're back to pre mapping all the all those and then, and then, and then, basically generating, repeating, apply, rinse, repeat the process. So go back, generate new ISO, feed a new file.
But the point is, yes, if you needed to do that, but the process. Process isn't necessarily designed for that, right? So their Point is, get it right up front, build it, get it right done. Well, I we laugh, but okay, that is a way. I mean, it's what a lot of them do. And so in the case of like, Hey, I actually want to reference architecture supported performant virtualization cluster, right? Well, okay, so now I'm going to figure out how to inject into the ignition file, how to do bonding on,
you know,
creating three pairs of bonds, one for my control plane, one for my user access and one for my storage, you know, okay, or whatever it's like, API and control is one container networking or virtualization. Networking is on another pair, and then storage is on a third pair, right? Okay, that's the reference architecture. Well, that's not out of the box. And so to make that work on your chosen set of systems, you need to know what the Nicks are going to be named so that you can build the ignition files that will then build the bond that will then make all those things present. Okay, right? Somebody's got to do it.
You can't and then try and translate it. But I mean, even if we're doing this, because I want to talk through what it would look like the process that we're following. I mean, would we still have to inject all that stuff into ignition files, or do we reduce the need rewire the whole landscape?
Well, we you still have to inject them into ignition files, because CoreOS still needs to know how to build the pieces. So there you go. Somebody's got to do it. Now, can we automate it? Sure? We could templatize Some of it and drive it that path, or at least generate it as parameterized things that are more trackable and automatable, sure, but even our stuff, right? We've actually taken the tack to actually allow you to have composable ignition files. So an enterprise that needs to have a, quote, hardened core, OS can address that if they need to, and still inject them the OpenShift Kubernetes pieces or OpenShift containers. Sorry, right?
So you can, you're so that, that would mean I can take my base core OS, take the Kubernetes pieces, add that into the core OS ignition file, and then an enterprise, without breaking the Kubernetes pieces could then continue to overlay additional restrictions that they needed into that boot,
correct. Okay,
right? And then using more traditional net, right? You know that would be all CoreOS, net, boot, run, install, do its thing. Same. It's the same. Ultimately, it's, you're using CoreOS in this model for us, ignition files, control over that becomes, is one of the basic tenants of building these, this Kubernetes cluster, right? Those ignition files determine if it's a controller or a worker, like how it checks in its behavior, all sorts of stuff, even where it pulls the registry from, right? Yep, that's who you were talking about. The registry being internal, you know, publicly accessible or not, it's part
of what you have to specify. So,
so what? When we're looking at building a cluster, assuming we're not taking the landscape, and landscapes are entirely the wrong word here. We're we're not pre wiring the whole thing. If you're building up a cluster for us, and we're identifying a whole bunch of machines going through an inventory. How does, how does that become? How do you start the cluster process for us?
Or us, we
take advantage of our broker system and the ability to pull things from pools, so you can tag machines that you want to be in your cluster and then, say, build this cluster and we'll go pull them. We're also updating it to let you have pre tag machines that then get built into the cluster based upon their own tags. But that's us choosing a tagging scheme. Right. As Martez pointed out earlier, we're imposing a tagging scheme that then we're going to act as the bridge between the two, right? We'll let the customer choose whatever tagging scheme they want, and then we'll act as the bridge between those in some cases, are we
able to leverage the same bootstrap ignition file where you can say, you know, inject this into the ignition file, and it becomes Kubernetes control, whatever.
That's an that's how we build cluster zero, if you will. Okay. Is that we, we take rackns information, we allocate the machines either by. By matching. In our initial case, we just pull them from the pool. In other cases, we'll do it by tag, but we then generate a set that represents the initial cluster, and then we build them and go through pushing the OpenShift install process along to get the files out that we need to, then reference and install
them. Does that mean, like, we're running a orcas, a task orchestrator, like, and, you know, Command, Command on the machine, command on the machine, like, the when, the when I, when I wrote, When I wrote, a Kubernetes installer from the Kubernetes, the hard way, stuff, right? I had to, like, First, install all of the prerequisite environment pieces, and then I had to run cube ADM with this series of commands to build the first work, the first controller, and store a whole bunch of certificates, and then go to the second one and run cube ADM again with, you know, downloading all that information and, you know, keep I had to basically prep all the environments and then run Kube ADM with a whole bunch of, you know, machine specific configuration data over and over and over again. So
the process requires to run some things, but it's not as far reaching and as random as the original. I'd say
it's a different level of complexity, because with most OpenShift installs, you're aiming towards a higher target, shall we say, in terms of a more enterprise ready deployment.
And the thing I've noticed about the OpenShift stuff is the four commands and the customizations to your manifest ignition files get you started and from open shows perspective, the goal is to get you an open shift that's just enough to then get you into Kubernetes CRD creation methodology. So in other words, we don't, we've been playing, started to play with using like Seth on workers. You don't actually have to install that up front. You can run it after the fact, and it pushes the systems, and it has operators and all of the stuff already, where you can go into the UX and say, Hey, add Seth. It's not called that. You say go and it goes out and tries to figure out everything from the machines and start all the appropriate Seth, mons and stuff as ds, and all of those pieces in the right places at the right times with the right disks. And voila, you now have shared storage for blocks, object and something else, file
on your system, and which form,
maybe right, maybe not. But this
is back to tagging, right? So, so what you're describing to me is, if the systems are tagged, then Seth will come up, you know, based on the tags. If it's not, then you're saying that there's the systems will try and interpret what you know, the the roles tags end up being roles for the seven Ceph cluster install process, right? So, so what you're describing to me is is sort of like, if you start with nothing, you're going to get an undifferentiated system, and then additional process steps are going to infer or guess at, at roles for those systems which show up as tags in the Kubernetes clusters, which then influence downstream behavior to continue the setup process. So it's, it's sort of like this evolving ag factory that then influences waves of of operations that, uh, yeah, my oversimplifying, okay, because So, so this is, this is part of the mystery that I wanted to talk through. Yeah. I uh, because part of the, part of the mystery here is, you know, all right, I have machines. I have DNS entries. I I am going to boot a machine. What you're just, what you're describing is the first phase of the configuration is injected into the ignition file, because that's going to bootstrap the system and the smart, you know, I have choice, if I may leave it dumb, then it's going to have to then start in using tags to further infer what the system should do and and how those tags get on the system make a difference. So at every point. We have a choice for we have a choice for, sorry, my cat is trying to bring me a present, interrupting
it from doing that.
Every, every, every process in the system we can inject more clarifying data into the process and have a more complex install, or I can let the system defer that and let it happen more downstream.
Yeah, and this is where some of the interesting stuff that Red Hat's chosen with their core OS, Red Hat CoreOS image, right? Is they started by saying it's got all the pieces in it, so we're not going to worry about it. So like doing open virtualization, core OS already has live bird in it, right? It's already got right, turned on. Okay, so technically, all of your OpenShift nodes could run virtual machines. Now it's off. They're not enabled. They're not set up. But right? All it means then to have an open shift virtualization node is to start the Qbert controller. Okay? That's totally within the realm of managing, just creating a CRD and all of a sudden, and having those right in the registry,
that's convenient, right? That's how that's really nice, right? Sorry, what I used to go through to get a a installable node up, that's a that's huge, yes, right? But it doesn't mean that that that machine is going to perform or attach to the right network, or know what network to attach to, or have adequate storage, or this would be on the right nodes.
Interesting too, though, right? Because of the advancements in Kubernetes and the ability of them to have now network spaces and others things like that. Creating arbitrary networks to attach to is just a CRD create now, because there's enough controllers that exist that are built into couplet. Not sure where they are completely. But right now, if you want to create a separate VLAN for those systems and you want to attach it to a VLAN that's on the system, you create two CR ds, and now you have a network space and a an attachment bridge that got built all on the system, and you're off and running, right? It's not like, Hey, let me go try and configure this under the system so that's available to the system and all that stuff. Oh, so shift in Kubernetes. It's built into the process.
This is, this is really cool. It's a shift in thinking. So historically, we would talk through step by step by step of I need to, you know, lay in this configuration piece. Lay in that configuration piece, build a conf, you know, a system D file, and then start a service. What, what you're describing is, is much more fluid in that I'm effectively marking my system up more and more to figure out how I want to tune it. And ideally, all that markup becomes part of the knowledge of your cluster. It has to be right. It's in Kubernetes live, but you probably need to externalize it somewhere, and if you're seeding a new cluster, the quality of that markup is going to then influence how all of the rest of the cluster stuff gets built. No more like building a conf it's the equivalent of building a config file, but it's really pre seeded markup data. It's still configuration,
yeah, but
it's more of how flushed out is the system, right? In summary, okay, so for some of this, the Kubernetes environment is getting much farther along on the get opsification this right? Because in some regards. Yeah, right. Think about the configuration of an environment. In the case of like, trying to say, like, I'm going to have 16 VLANs that are going to be. Native on the switch, and I want them available on the system, right? Well, in a Kubernetes environment, basically you create 16 CRDs and get and just make sure OC apply ran on them at some point. And it'll make sure that those are available on all the systems configured correctly and available for use by VMs or containers.
My nightmare is that that all falls apart very quickly for a lot of these organizations when they have to actually start
troubleshooting things. Oh yeah, yeah, Martez, you get to a whole nother level of Well, I just get ops to everything. I'm a developer. It's awesome. And the ops team goes, what the heck did you just do to my environment?
And this is what I'm trying to explain the to people building the environment, because what you're describing sounds like, you just throw all the pieces off on the floor, and they, they, you know, Frankenstein's monster themselves, into a into a robot, into Cooper, into a Kubernetes robot. But that's not what I'm trying to do. And what I what I think we've done a good job of here, is we've actually decomposed the process into stages of clarification that have to occur to make the system work. And if you want that process to be repeatable, or, you know, from an ops perspective, trouble shootable, that you actually have to sort of understand how these systems are getting built. You have to control how these systems are getting built. That's the, that's the new playbook here, because we don't, it doesn't sound like we have a pipeline with 1000 tasks in it, like I'm used to from crib, our old Kubernetes thing, where it was like, here is this long sequence of tasks, tasks, task, task, tasks, very linear sequence stuff. And we would discover information and stuff it in our file, then pass it to the next layer, traditional ops sequence. What you're describing here is feels more to me like a picture coming into focus slowly, where you know you can sort of see it as fuzzy, and then you keep clicking in, and it gets crisper and crisper and crisper, and you have to know how to recreate the steps that add all those clarifying details.
Yeah, I, I think of
it. Maybe it's, yeah, I um,
taking advantage of the black box. Okay, that has goodness and badness, right in that it's a black box you don't necessarily know what's going on in OpenShift installer or their bootstrap process. You know, give it these 12 things, this thing may poop out successfully,
right? Okay,
if you think about your coop spray experience, right? But the point is good, that was very white box ish, you knew exactly what 4000 steps needed to happen to make that occur, right? So when OpenShift, at least in the part for this story, right? For OpenShift, they created a black box OpenShift install and you run three or four commands, and you have to give it enough input to be meaningful or to match what you do. And we can argue about whether that's documented better or worse, or not at all, or all those things, but nonetheless, you do that, and it's
hidden
by choice of free packaging and OS, right? Such that it has all the things already installed in it to do containers and virtualization at a security posture that they're happy with, right? Okay, well, that section of the coop spray steps disappear. Okay, fine, right? The Coop spray steps you needed to do to set up the networks and fill in the names and put IPs in place, and stuff like that, they've normalized into a configuration file, okay, but the fact that that used to be an Ansible thing in coop spray and now is in some set of ignition YAML files that were auto generated by the installer. Okay, you end up having to do the same work. It's how much effort has the provider put in to make that happy? And Red Hat's put a lot of effort into their OpenShift install stuff. Yeah.
So this is really helpful, different than I expected the answer to be. But now I now I don't feel like I'm missing something obvious when I'm looking at our install script, because they're definitely missing something that I was looking for. But now I know where it is, or I know, I know that there's a box somewhere that's going to interpret a config file, and that's that's about all I can expect. Now I know. Thank you. This was really helpful. I was I really felt like I was totally missing, you know, something major when I couldn't find, you know, significant parts of the install experience that I'm used to seeing. They're not, they're not here. They're not, not expected to
well, if
you want to go gaze upon insanity, go look at one of those job log error things. See the 1000s of steps being run
because they're still doing the configuration work. They're just doing it in a captive. It's
not cubatom, right? And it's not radio, right? It's other things. But you'll see all the pods synchronizing. You'll see the pod starting and waiting for this. You'll see DNS services starting right, all those things that are happening that you have to do. It's logging and keeping track of right? Add them in coop spray,
but it's just hidden.
It takes very long. Takes very, very long to build the cluster. Yes, even a single node, it takes like 45 minutes.
Yep, yeah. It takes what? John 45 for our seven node lab VM
cluster thing. Oh, yeah, no. I mean not quite that long, maybe 20 minutes. Okay, so it largely depends on download speed for containers. Since we're since we're pulling from there their repository, so it it takes significantly longer to do it at home, where I've got, you know, less compute and less bandwidth, versus the data center where, you know, we've got giant pipes and massive CPUs. So,
all right, everybody, we are over time, and I'm gonna, I'm getting tapped to my next meeting. But this was very, very helpful. Thank you.
Applause. Wow.
I love how much in these series we take something and dig and dig and dig until we get to the bottom of why something works and how it's different. It's as much as we do it, it still surprises me, and I hope that you are also enjoying the podcast. Thank you for listening to the cloud 2030 podcast. It is sponsored by RackN, where we are really working to build a community of people who are using and thinking about infrastructure differently, because that's what RackN does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly, or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. It's all part of building a better infrastructure operations community. Thank you. Applause.