20240416 out of band management

    10:39PM May 25, 2024

    Speakers:

    Rob Hirschfeld

    Keywords:

    bmc

    redfish

    machine

    system

    band

    vendor

    access

    password

    configured

    control

    process

    network

    management

    drive

    set

    dhcp

    talk

    server

    manage

    information

    Hello, I'm Rob Hirschfeld, CEO and co founder of racket and your host for the cloud 2030 podcast. In this episode, we continue our tech ops series, this case diving deep, cheap, deep into out of band management. And one of the things about out of band management is it quickly turns into a alphabet soup of protocol names, vendor specific pieces, even the way we talk about out of band management. We have different acronyms for the same action. In this conversation. Greg Althouse reckons CTO and my co founder really dives deep into lessons learned things that you need to understand technical details, really core understanding of how to build BMC integrations. And frankly, one of the things that we cover in here is why it's so hard to do this well. So a lot of great information in this, even if you have no plans in ever touching an out of band interface, the architectural lessons will help you they are very useful, they know you're going to enjoy the conversation.

    So out of band management, in general, is this idea of having some type of server infrastructure, it could be a switch, it could be anything in which you have the primary function of the device and the primary utility, usually an operating system, something running on it, and you have some additional way to manage the device. So I'm intentionally keeping this very general. That allows you to control that system remotely, outside of having to actually access the primary operating system for the machine. So always some assumption of out, you know, you're doing this without a band means in band would mean you are logging into the operating system operating environment of that system, taking actions on the machine itself or on that system itself. And then that would be in band control out of band control would be if there is a secondary control system that allows you to to access components of the machine, ideally, not the operating environment itself ponents of the machine. And so modern servers use out of band management to do things like turn the machines on and off to provide a console backup serial access to the operating system. So there's almost always some very simple text based interface to a server, or a switch that you can connect into. A lot of times, these interfaces also include a way to patch and update the Raiden BIOS control systems. So there's a lot of different control mechanisms back to make this happen. And then there's one of the challenges with this is that different vendors have provided in different ways, different API's and controls to do this. So sometimes we've had the emergence of some standards we'll talk about and sometimes it's been very vendor specific. And you will I'm sure we'll throw in these terms without even thinking about it. Like for Dells, it's called a I Drac for HPs. It's i Lo for I don't even say there every each vendor is compelled, I think by international it treaties to create a their own name for this type of system. And then there's been some standards for this early one was IPMI, which was IP meant the IP based management interface to do this via when we started doing it by network instead of by serial cables. And then, more recently, there was a standard efforts called redfish, which I'm certain we're going to go into those different protocols. But the goal is always to provide a an alternate path to managing the systems assuming that you can't always do it or don't want to do it first through the system, the primary access network access for the machines. A good summary anybody want to elaborate or expand on my attempt to define out of band management

    but my goal here is to talk through I mean, I'd love to hear your from from The RackN team specifically want to know how people are doing like, how are our enterprise customers making this work at scale, this is definitely a machine by machine type of thing. What what are the challenges to actually using the center this type of interface for machines, even down to creating login credentials and knowing which IP addresses to use, and can somebody take me through the lifecycle?

    Sorry, thinking about it.

    So, for a lot of what we've encountered, through at least my history with Dell, as well, as you know, a rack in here,

    there's a couple of challenges of dealing with things at scale.

    The first is, knowing what you need to operate on. And that's oftentimes a process procedure thing. You either need to have a method to deal with that, as the machines arrive at the factory, from the factory, to your data centers are your areas so that you can record the information. This is where things like governments have gotten too helpful. California, for example, and their drive for making the universe more secure, requires all hardware shipped from within California to have random passwords. Well, that sounds like a great idea. Except every passwords written on the box, and you then have to enter that so that you can figure it out and how to set it. Well, okay, more secure, but harder to scale. So you have those kinds of issues of discovery, then, once you have it, rack, you have a challenge of being able to drive that. And a lot of that is how you chose to implement tools, what tools you choose to use, and what you're trying to do. And so a lot of our recommendations are using some of our like our scanning features, or using your DHCP and networking environments to get that consistently set. Some companies often choose a, quote, pre step where they take care of the oh my gosh, I got a random password, let me set it to the corporate normal, so that we can then integrate it into our automation system. record those things, all of those have degrees of failure rates, right? The more humans you put involved, the more failure you get. So the idea of scanning a password off of a box to then let you be able to access it just reeks of failure rate.

    And passwords. Yeah.

    Yeah. So once you're in to this kind of process of discovered and control, then there's the whole lifecycle of is it the right level, do you want to apply things and and that's where additional tooling needed to the vendor tooling for a specific platform or like using our tools to help normalize that to drive to a consistent vetted state that you then can use in production.

    There's other ways to go from a lifecycle perspective, like depending on your tooling and what you have, one of the things that we do with racket is we give our customers the ability to not even worry about the BMC configuration BMC as the general term for islandwide Drac. base board management controller. Anyway, the idea is that through like our tools, we can discover the machine and inventory the machine from within the machines, normal processor space. And then from that space, we can then drive the actual configuration of the out of band controller to set it to a node password to set it to a nodes state, which helps take care of the the other side where it's either misconfigured Redonda configured or randomly configured? Does the machine

    inventory show enough information about the out of band control? Or the like the access ports? Or things like that if you were looking at them? Or is are those completely isolated?

    You mean with per

    vendor? Yeah. I mean, so if we didn't inventory on a system, which I mean, we're going to find all the ports on the network, all sorts of information, would we actually identify the out of band controllers and network interface? Or is that or I've always leaving those things as separate. But

    they are, though, they're usually presented in a way that is discoverable from the base operating system.

    That's cool.

    So our inventory process allows us to, from our discovery, image, inventories BMC to get its addressing methodology and if it's configured and

    that kind of stuff. Makes sense. Don't sometimes they ship the BMC, like with all identical MAC addresses, or not MAC addresses, although I know that's a problem too. But identical IP, like pre configured IP addresses are and if we're working, we rely on DHCP. To identify, oh, my God, everything, everything I'm describing is problematic in one direction or another. If you use DHCP, then the machines potentially could have they're out of band management, IP addresses drift, which I know, you know, most most operators wouldn't want, you know, the IP address of an out of band management system to drift because you wouldn't know which machine you're talking to. Seems like there's like so many degrees of freedom here.

    There are many. But for example, Dell for a while was shipping every system with brute Calvin as the password, username password set with its DHCP. Okay, well, depending on your DHCP server, you could then find out what machines were there. And if you found your you know, the the I Drax addresses, you could then log into them to configure them, to convert them to static IPs, or make them permanent reservations, all those kinds of things. So like, for our process, one of the ways we would, or a way to run our system is to have the BMCs DHCP, the back end process, we'll discover it, we can convert, especially if DRP is the DHCP server, we can convert those addresses into reservations so that they become permanently quote permanently assigned to that machine. And then we can configure a new password, and a new use set of users to access the system so that we can remove the manufacturer's defaults or the random one that was set or unset at all. And then it's configured. And you can have remote access to drive the system if you wanted to. So that's a take control kind of methodology, or another methodology that we've enabled, because a lot of times customers, and these environments have brownfield setups, or they're not sure what's there. But they think they know what the access passwords might be, is that we have a scanning system that will let you scan for out of band management controllers. So we'll look for them, we'll, we'll try and hit them with IPMI configuration messages to say, Hey, are you there? What do you got, and then using a set of credentials, like the default for the manufacturers plus what users might have set them to in the past attempt to log in and inventory the whole environment so that we can record machines that we think are BMCs and then ones that we can actually get into and provide that to the user as a set of information to say like look, we scan this environment we think these are the MCS that could be managed and these are ones we actually successfully got into if we can get into them will actually go through the redfish IPMI layers to gather enough information to try and let them identify the system like this is a Dell This is a Dell with these MAC addresses that Scott these kinds of setups so that the user can then try and identify that to what's actually in there. environment like, hey, we had a spreadsheet that had all our machines Wait, what's that one? To address those kinds of issues, right. And then once that's in place, we can drive processes to do things like normalize the systems. A lot of times, if you're dealing with brownfield environments, the systems aren't in known good states. So you need to run Oh, tasks and sequences to say, All right, we're going to make sure the disks are wiped, we're going to make sure enough of the system is configured so that we could actually network boot, we're going to write drive a whole set of sequences, to normalize the system to what it would look like either coming from a factory or a baseline state, so that then it could go through the additional provisioning or consumption processes that the customer might have had. Right? Oh, that way you can kind of take over, it also lets you provide inventory and state, right. That's another aspect of

    systems are. Yeah, I mean, how much? Do we with a lot of these, there aren't that many, especially if you start drifting away from merge, the vendor controls, pretty easy to have these systems have some type of drift, right, if they were manually set up. And different people have done different things, or I guess that people just end up making all the systems identical, because they can't manage them easily. And what's the problem? If they did that? Well,

    the similar things are often important, from how you operate to drive consistency. So it's not necessarily bad have identical systems. The challenge is when you assume they're identical or they're not, right, or you didn't hand and basically, if you don't handle the variance within your process, and so part of a lot of times what the Brownfield normalization step is about, is to say like, well, we can handle a Dell system and an HP System and these kinds of setups. But if the disk array if the drives of arrays are not functional, or if the networks aren't configured correctly, or this firmware is not at the right level, because it can't do a certain thing, then that normalization process might be needed. An example that we saw in our own lab, for example, is we had a system with an old enough firmware that it can't do HTTP boot. But we were switching to say we're only going to do HTTP boot in our kind of test environment. Well, okay, that's great. And we want to do HTTP boot for all sorts of reasons, right? It's faster, it's faster, or lots faster, arguably, he could be a little more secure. But the challenge was that that firmware didn't support it. So we needed to update the firmware on the system, through the automated management controller kind of stuff, or a different boot process for that specific machine, and then drive it through the rest of our normal process, because we'd set that as the normalized process. Right? Okay. So that's why you might have those kinds of things, where variance can be handled in your process. But there's realms right of so much variance, all of a sudden, you're trying to protect yourself from all sorts of things that it's just no longer cost effective for. Right, that's the the normalization step helps you avoid that

    big a big deal. That makes I mean, one of the things I think that you're making me realize and is important, people think about is, you know, most, you know, especially the more regulated companies that we deal with the answer of somebody saying, you know, what, you know, your, what version firmware are you running in your fleet? And the I don't know, answer is not an acceptable answer. So there's an element of consistency is important, but actually knowing is the first right is the first critical thing. And not having a decent inventory and then not having a way to update and tweak are really significant problems. That's why they that's why they set up the database management.

    That's right. Now, there are some interesting trade offs as you go through some of these processes. A lot of times, we talk about, hey, I have an out of band management controller I want to have it do all this stuff. And that's good because it's the one that's actually going to do the work. The question is, how do you drive it? And one of the challenges that we often run into, or see is systems design that become bottlenecked, by the driver of the BMC. So one of the things that we try and do is have the machines do the work, not the server.

    And machines, machines versus server?

    Well, so for example, say I want to update the flash, or flash a BIOS on 2000 of my machines. Okay, well, I can use a vendor tool that's generally written to use redfish. And it will run from the server. And your scale is somewhat bounded by the server doing that work. And sure, you can write some of those in parallel. But we worked on an HP System, where they're like, Yeah, our management platform can only do 16 at a time, because that's its loading capacity. And the thing it's managing had 768 machines at it, right? And you're only well, okay, but if you had each of the machines choosing to update themselves, through the operating system space into the BMC, or loading the BMC. And driving that from within the operation space, you've now scaled that where you did 768 at a time instead of 16. And a lot of times, this isn't an issue. But there are times where you need to roll a patch out, or you need everything to happen. And all of those kinds of things, or you just have 50,000 machines to deal with, that some of our Cytoscape customers write

    that in Van versus out of band challenge, right.

    So a lot of our, our processes are designed to run from within the operating system, to then load the BMC or configure the out of band management controller to say, Alright, you're going to do this work, we're going to set it up and management from all these, these locations. So that's another aspect of as you look at trying to manage machines at scale, are you running it from a server to a machine? Or is the machine doing the action itself? That then drives it? That's a trade off, because in some cases, you're required to reboot and other cases you're not so.

    So when when you're looking because one of the things I it's easy to forget, right? These these BMC interfaces are our little, there's their little computers, so they're their own operating environment. And they're often slow. Um, it's just there's no, no two ways about it, right? I mean, some of these API's and these processes that you're invoking are not fast. Systems, right. They've they've got it they take there's dwell time they have to set if there's, you know, we there's a whole conversation I think about when reboots are required or not. But you know, at the end of the day, part of the synchronization, you're describing a severe driving an out of band interface, it's going to be pokey. Unless unless things have improved since I was I was playing with it, they typically typically take a while to do the to have the the commands save and be processed and acknowledged via those API's. Is that Is that still a factor, a design factor here. And so if I was going to make a change a sequence of changes against a redfish interface, I'm going to have to allow time for the the do it in the right order, and then just queue up the commands and have them operate one after another.

    Yeah, so those are definitely real concerns. The the processors have gotten better. That is true. But there's still issues. And in some regards, they're related to how some of these systems are designed themselves. So for example, we've noticed on the Lenovo system if you needed to set only values inside of the bios, you could do one at a time. And that one at a time process would take 30 minutes because the round trip time to send the request, get the request vetted. Have the BMC validate it, apply it, save it, sequence and then return the result took enough time that it was a minute to do in some cases, where if you were actually able to generate a, a batch action of all those things you wanted to do, then it would do it in one minute. But then you got all the things done. So there's a whole set of optimizations around, what can be applied together, what can be applied separately, and managing those things become a challenge as well. Okay. And so that, where

    do you where do you track? How do we track that logic? I mean, that's vendor specific. You know, it,

    it is. And that's what's the really challenging part. So for some of the operations that we have to do, we'll do them, we'll try and batch them as much as possible, we'll have them iteratively applied. So sometimes, you'll have issues of things like, certain parameters aren't settable until other parameters are set. You'll also have conditions of some parameters don't exist on a system until a different set of configurations or settings are applied. So many of our processes are designed to be iterative and state discovering. So our BIOS setting pratense will like, we want to apply this. Alright, let's apply it. Okay, what took what didn't? Okay, so now we can apply these pieces. Let's apply this again and see if it happens. And based upon that iterative kind of process, we can figure out whether we're making progress, or whether there's a failure, or whether we'll eventually get to success. But it still may require multiple reboots sets and multiple apply sets to actually get done.

    And that actually varies by server model sometimes, right? Because different we're even firmware. It's not even the model. It's some we found this with the redfish work. There were times when if you patched the bios, it changed redfish is behavior based on the version of redfish that was now on the machine? So you'd have the machine same machine but the behavior the API would change?

    Correct? Well, this also applies to things like BIOS settings presented through the BMC. Right? So say you have a process that you're content with for your intel icelake system, right. Okay, well, so you go to cascade lake or whatever the next one is. And Intel's added a new performance feature. Okay, well, Dell then chooses, for example, to reflect that through its BMC as a certain set of parameters named something else, where HP may have a bit through a different name. Those parameters and bios configurations you had for your using the generation kind of level your 15th generation Dell server may or may not apply a plugin locally to your 60 Gen server. And like you were saying, if you haven't idirect, eight versus nine, DirectX nine, what they chose to put in there may or may not have made that process visible or named the same and those kinds of things. So a lot of your management systems have to be able to handle the variance of not just Dell versus hp, but across generation cross generation within BMC systems so that you can handle and toggle the appropriate parameters right. performance parameters are particularly annoying because they tend to be platform specific and custom to the vendor.

    One of the things that we we I know there's actually podcasts if you go back a couple years when we talk to redfish people right one of the promises of redfish was a unified at at unifying all this noise, because we're like IPMI the original had so many weird exceptions and changes and stuff like that, although fundamentally like the basics always seemed to work for IPMI. So redfish was trying to unify all those things, but you're describing these performance variables and tunings are so vendor specific. Even with redfish even if the API was consistent, which is something that is a positive thing. Although it's soap you still have to know exactly what you're talking to even down to the redfish version believe it So fair.

    Yeah, redfish is a little tricky. For a couple of reasons. One, they keep changing, right and expanding and trying to fix the specification. It's now mostly JSON, HTTP request base, which is great. But and it's got some measure of kind of self description to it. The problem is that, depending on your version of your generation of your platform, the BIOS and the flash level of your out of band controller, that all changes the semantics. So one of the challenges that like we have is somebody will say, like, Oh, can't you just use redfish to update the bios, and you're like, maybe

    we're gonna write their own redfish tool. And it's all going to that's going to turn into a normalized thing. Yeah,

    cuz the problem, for example, is some of these things move from specification to specification. And then, like, if you have an old enough BMC, it might not have that proper API level. And sure, you can inspect it to try and see if that API is available. But if it's not, what are you going to do? Much less to this API, right? Or you can inspect and say, hey, that's great. You have this new API, but that only applies to 20% of my system. So I still have to have something that can make the decisions. And a lot of times the simplest thing is just to revert back to the vendor tools, because the vendor already had to handle the problem. Right? That's why Redfin is a bit of a joke. Because the vendor knows its variance across all of its platforms. And so they've already built into their tools, the ability to handle it now. Yeah, yeah. And so your tier one vendors, your HP is your Dells your Lenovo owes? Well, yeah. In some regards, have started to normalize on redfish even in their own tooling, because they've realized, hey, if we make this consistently so we can handle it, but they still have all these scripts, they wrap around the redfish to handle the variances, even within their own platform. Right? There's

    no There's no downside to supporting redfish uses mostly a protocol. And then they still are dealing with their own. And you know, it didn't it didn't address the vendor specific knowledge problem. And I don't think you can address the vendor specific knowledge problem. These machines are new or different. Yeah.

    I mean, that's why he hasn't. There's the issue. Right? HP has its ILO rest tool, and you're like, what is it? Well, it's mostly a wrapper around red fish. We just do red fish calls and you're like, Yeah, but you knew to go get this HP OEM section of the red fish space to then prevent me isn't present a status call, right? You're like, Well, how was that just red finish? Right?

    I think that's one of the things when we when we were talking to the redfish people ages ago, they kept saying, well, but we've normalized it. It's all the same. It's all using redfish. It's all the same. And you're like, the so much details in what information you need to know to make that call? Or what's what's valid. That it's it's not, you're confusing, I can talk to it. Right. But it's English, but I don't know the technical jargon. So I can't understand that. I can't understand you anyway. Yeah, so it's a really significant challenge to do out of band management, or well at all. And one of the things that you're you're pointing out is the amount of knowledge you have to know about the system, and then how it's going to take those updates. In order to actually do the results weren't there? Wasn't there an example and it doesn't matter what vendor it was. We're like, we were going through a sequence of settings and then the last setting and the chain was overriding something previously and so like, we you get to the end of it, you're like I did all these settings, they all worked to do this last setting and you know, you know, and then a reboot and things didn't stick the way we thought they were supposed to stick. But it didn't show up until the reboost something like there was something I could I could make up a story would probably have happened at this point. But

    there's so there's so much in the So, but this isn't vendor specific. I mean, these are these stories are not like one vendors, the bad, the bad vendor, that's it's a, to me, it's about, you know, building systems over time, they're gonna have variants that you have to track and account for.

    Transferring from a security perspective, like, do we do we have a recommendation on securing out of band management? Like using, you know, certificates or password like password protocols? Or do people even have to worry about this, technically, these systems are supposed to be deeply behind the firewall and protected. You know, you know, setting everything to, you know, root one root and password. How dangerous is that?

    Well, so, general best procedures to put your BMCs on a separate network isolated from your production environment, right? Use your appropriate firewall, like access, rules to present act prevent access from just random use, right, so you have your limited space. Many of our customers drive the BMCs, to use the corporate security systems. So many of the BMCs have the ability to attach to like your LDAP Active Directory kind of systems for authentication. And they define users that have access based upon roles and stuff. So that's a policy that a lot of our customers use to restrict access. And then it makes sense, within a full normal usage account, if that once that gets configured. A lot of times a good procedure is to remove all the accounts, the local accounts, except for one for backup, and then store that one password that's fairly complex, and then non publicly available, usually stored in your recovery store, something like that. Ideally, a per machine basis. So when compromised, doesn't hurt you. But that's kind of up to your kind of security position and what you have in place for dealing with those kinds of assets, right?

    Does that make it harder to do an automation? I mean, you still you need an you need an account for automation?

    Sure. And so in some cases, since all of this is password user based, you could use the authentication system. So for example, one of our financial institutions attaches the Active Directory to the BMC. And then they configure our system to use a custom service account to manage those that then have the passwords associated inside and configured in Racket and so that people don't necessarily know the password. So

    it's like, is it like, Are you describing a named account on the BMC, or a service account, it's

    a service account with an active directory that has restricted use, so that it's only purpose is to log into a BMC. And they control that access. And then they have a secure store that they can go to to get that password where they need it. Now, and in some cases, for some of their validation and test environments, they'll also maintain a local account for emergency access and stuff like that, that has a separate password base. So think of it as automation service account, whose job it is, and it's tracked and is accessed for service actions, and then they have an emergency local account.

    Gotcha. Okay, that makes that makes a ton of sense. Is there any way that like, a BMC, like, if you would get an event or something when somebody logs into it like this, that then you'd actually know if somebody was doing some type of out of band actions on these systems.

    So your body and BMC ease, especially the high end enterprise, class broke down Dell and HP, for example, also have the ability to generate alerts and notifications through both Traditional SNMP based system, as well as they have like some of the bills like Syslog targets and all sorts of other eventing targets, to handle both some of your access audit controls, as well as failure alarm notifications, like failures and memory and disk and other kinds of hardware level failures that BMC detect. So that they can be information can be notified outward, right? Make

    sense? How many people are actually setting that that?

    Don't know for sure, a lot of times those become customizations that are unique to our customers. We enable it, it's part of the field that they can pass into these configurations. I know two or three of them do. It's one of those, it requires a maturity level of your operations to actually take it to that level. Because you'd need a bunch of infrastructure to receive the information and all sorts of other stuff like

    that. Yeah, I hadn't I hadn't even thought about the SNMP aspect of these BMC systems. And yet, yeah, I do. But the enterprise systems do support SNMP. And you could use it as a as a nice monitoring and infrastructure system, in addition to direct monitoring on the machine. But interesting. Yeah, I liked I liked the I'm not sure how practical it is. But you know, the intrusion having having an event that says, hey, wait, there's access on these systems. It's not from the service account. Conceivably, I guess you could even include any service access, we do with a don't alarm for five minutes type of I mean, I don't think I've really never discussed this. So I'm just thinking outside the box, but you could conceivably lock a system down that tightly. And when people talk about air gap systems, they're not air gapping, the BMC, right, they're still managing the BMC inside there, inside that has to be inside the air gap inside the controls. Actually, it'd be worth I mean, we only have a couple more minutes. But if you you see a best practice for lockdown or securing a BMC network, like how much isolation it should be? Or how or what type of segmentation? Or?

    Um, I don't know. Don't know if I have recommendations necessarily for that, in some regards, your network team security policy is going to drive in what you've invested in that space in general, that the, from a usability and cost perspective, a lot of times what we've encountered in some spaces is customers trying to save money by either not necessarily providing a BMC space, or some BMCs have the ability to override and be parallel on some of the production network ports, I was gonna ask that they have shared, they have shared ports. And that technology, while functional represents some security challenges. Because you're now running your BMC is in the same space as your production space. And so that's generally not a good policy. The kind of my recommendation is spend the money have a separate BMC network for I mean, it's usually especially nowadays compared to the cost of some of these systems. An extra one gig switch with enough ports to manage your servers in Iraq is usually monetarily ways recovered. Right? It's so cost effective that if you have any issue this it will save way more money than it ever cost you.

    And we're not we're not talking about the fiber interface cards or anything like that. Yeah, we're just talking about copper one gig Yeah, that's that's what's funny about is these aren't these aren't high speed, you know, complex networking, you can buy the cheapest, the cheapest gear and use the simplest wiring for it. And,

    and some of that will give you more functional access and security than than any of the other things you try and do to save yourself money. I mean, we we often argue along a similar lines to having a provisioning network in your or environment to, while not directly applicable to this discussion, a lot of times people like well, I've got this system and I've got these 100 Gig links, and I'm going to set up my AI system. And I need, you know, this 100 Gig links for data Ingress, and then I need these eight InfiniBand cards for this thing. And I'm like, yeah, how are you doing your provisioning? Oh, well, we're gonna have to go manually touch every machine so that we can enable pixie on these 100 Gig adapters, which don't do it by default. And then we'll provision over that and transition in the mic to the boxes have built in one gig ports, it's like, yeah, we ignore those, like, go have a one gig port, right? We're gonna transfer two gig of operating system and you'll manage it, and your life will just be so much simpler, right? But people don't think about the, hey, I'm going to spend, you know, an extra $2,000 For this, you know, $2 million rack. And now I have a whole bunch of operational headaches that have disappeared. And that applies to both the BMC kind of access out of band network, as well as a provisioning control network. And it's just the kind of some of this stuff just kind of boggles my mind at times, as we're looking at, especially as these machines, especially for the current kind of trend of these AI clusters, and data and analytics systems and high end, virtual machine environments, Kubernetes clusters, and all these things, have these fairly large, big machines that are fairly expensive. And it's like, you choose not to have one gig copper port that makes your life so much simpler, and has very little, you know, you can restrict security domains, all sorts of things. And it's just, yeah, it kind of boggles my mind a little bit that it's like, yeah, we're now going to worry about, you know, $2,000 on this

    rack, I think, I think this is one of the themes in a way about the whole BMC conversation is, right, there are some very simple steps that people can take, that will dramatically reduce complexity. And this is a good example of one. And there's also some things that no matter how much you try to simplify things, like buying only from one vendor, you can't simplify, there's certain complexity you don't get to remove from the system. And you've definitely want to pick your battles when it comes to this type of control, but not having it. Right not having out of control is completely unacceptable from an infrastructure perspective. You really, that that is one thing I we didn't even bother to talk through. It's if you're buying buying and racking servers. You need an out of band control pep path. Cool, Greg, this was amazing. Thank you. There's so many details about about this, that I think we covered on it. And, you know, some some questions, I think that I've had in the background that I haven't chewed over with you that, you know, it's nice to be able to walk through it. And I really do appreciate the customer and the, you know, the experiential stories, because a lot of this stuff comes back to you know, hearing what other people don't. Thank you. I appreciate the time with it.

    Alright, everybody, thank you, guys.

    Wow, I love these tech ops discussions, because they take information, detailed concrete information. And we have the time to go deep enough in that topic that we can really examine complexities and subtleties of that automation of the challenge here. And BMC out of band management is one of those that is architecturally a very, very complex thing to do, and isn't really well explained. I haven't ever seen a conversation as detailed as this one about these processes. So I hope you've enjoyed it. If this is valuable to you, please tune in, check out the 2030 podcast series, especially the tech ops components that we are going to be putting back together as individual posts, probably like a tech ops book. We want to hear from you about what is interesting to you what else you want to hear us cover. So drop us a line check us out. Thank you for listening. Thank you for sent to the cloud 2030 podcast. It is sponsored by rockin, where we are really working to build a community of people who are using and thinking about infrastructure differently. Because that's what rec end does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly, or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. It's all part of building a better infrastructure operations community. Thank you