Process: Good, Bad & Ugly

6:50PM Aug 31, 2024

Speakers:

Rob Hirschfeld

Rich Miller

Claus Strommer

Keywords:

policies

system

put

cyber insurance

problem

ecosystem

vehicle

ai

delta

microsoft

process

conversation

risk

challenge

issue

user

standard

working

case

human

Rob, Hello, I'm Rob Hirschfeld, CEO and co founder of RackN. And this is the cloud 2030 podcast. In today's episode, we dive in to the challenge of process improvement, which sounds maybe dry, but it's really fascinating, because we bring in all of these specific examples from data center operations and the challenges of patching and maintaining the BIOS raid OS configurations to automotive examples, factories, how AI influences how humans and process control can get mixed into it FedRAMP all the way down into CrowdStrike airlines. All of this is a great knot of knot of how process is the lifeblood of successful IT organizations and why it's so hard. I know you'll enjoy the discussion.

Like Rob was asking, what would it take for secure boot to move faster? Because I had made the statement that, well, compared to say, like TLS, it's like 15 years behind. Yeah, it really feels like the web in 2010, 2005, kind of thing. And then I was trying a parallelism to like the mobile industry, where there the boot process or under verification process is more streamlined. But the flip side is that, like, those are single vendor ecosystems. Like, it's, it's apple and it's Google and blackberry back when, when, when, when they were still around and doing phones, yeah, they also had their own single vendor ecosystem, like it was a Blackberry phone with with BlackBerry OS, and it was locked down as well. It that is, unfortunately not as viable on the computer ecosystem. And it, yeah, I don't know what it would take to accelerate the evolution order revolution of the secure blue process.

Yeah, and the intro, you made an interesting point that I would rewind to, which is that the cloud providers have the time to invest in doing this right and there. So what we've the challenge is cloud providers are such a big and opaque part of this, they're very and they have very little investment in translating what they do. It's not even probably possible to translate what they do into the broader market. It

also doesn't help that they are, by and large, just white box servers that they're not off the shelf. And that's part

of what I mean by they don't, it probably doesn't even it's probably not even possible for them, right? And they might even be in a position to, you know, have enough control their own bias if they're injecting their own keys, and so it's not a not a concern for them and and they have very little motivation for the people who are do it yourselfers here, from their perspective, to make their jobs any easier, right? That's what we see in the industry is that instead of the industry rowing together to make this, do it, you know, standards and do it yourself, things and anything like that, easier, that's all been disrupted because there's, there's most of the pressure here is actually for the cloud providers to do it for you,

although there's also very little motivation on From a business perspective, to open up the ecosystem. So as a result, the the only people actively doing research or research and making this generic are either academics or open source enthusiasts, which happens to be a Venn diagram that with a very big overlap as well.

But there's a third group also emerging to that, and that is in the vehicle to anything space. That ecosystem, the ecosystem is being broadened because you have AWS and Microsoft and even GCP, to a certain extent, all trying to get into the vehicle, to anything space, like Cloud, to the vehicle or vehicle to the user, and the security around that is very much akin to the kind of ecosystem. Systems they created around the phones, but they need to run.

But just to your point, there isn't that more like the mobile phone situation where you know the customer is in charge of an individual phone or an individual automobile, and the, you know, as close to the edges they get that to control, it is talking to basically a gateway, not unlike what what happens with phone. So I would say that's, that's kind of closer to, how do I, how do I standardize access to my ecosystem and therefore go out and poach, you know, poach somebody else's, you know, yeah, phone customer and so forth. There's not

about regressions as well, like, for example, GM dropping Android Auto on airplay, or Apple CarPlay. They're going back to their own in house, OS and ecosystem. And again, this is because the manufacturers, they're trying to treat vehicles as a subscription service, which means that they have the incentive to lock it down and to do their own private thing, as opposed to a standardized platform.

Except for one thing, and the wrench in the woodworks is, according to the automotive industry, folk, and that could be, you know, whether it's GM or stellantis or anybody else, excuse me if they don't put some form of standardization that allows the consumer to move from one vehicle to another, they're screwed. You can't you can't wall garden vehicle manufacturer around a single source of ecosystem. They have to be across ecosystems. And for that, there is a huge push of standardization, not only on the cyber side, but also even on the provider side that they all have to interoperate. So there's thread group like associations and consortiums out there that are trying to standardize across them, because otherwise

there is, this is the scenario there one of an individual, identifiable user, moving from one automobile to another, or what? What's the what's the driving scenario?

Okay, so if I'm GM, and I go with, let's say Apple, what about all of my Android users? If I'm a fleet manager for a rental car company, am I going to choose which vehicles I'm then going to, you know, buy or lease?

So it's, it's like, how do I how do I establish a standard for identity, and then all of the goodies that come along with that identity that allow, well, what whoever, plus delivering service,

unfortunately, the direction in which they're taking Their solution is my opinion off, but because instead, again, instead of opening up their standards and making essentially replaceable parts, as far as the the infotainment or connectivity Part is concerned, instead they have their own private system with a mobile app that is going to be intrusive?

Well, here's here's my question, and and there are two schools of thought here. I don't disagree with you, Klaus is that it depends on how they go about implementing the technology for the consumer at large. In other words, will they be proprietary in the chip that they use that will restrict will they build it on board? Will it go through gateways? There's a lot of different ways to skin that cat, and there's nothing preventing them from saying, I want a over b, or I have to have all of them. And that's what's pushing a standardization. Because you're you're going. On the basis that they're going to stream, right? But there are some manufacturers that are putting in specialized chips that will only pick up certain frequencies. Think about OnStar as an example, right? It's only available in certain vehicles, and it's got its own frequencies. It allows you to do everything. But if you try and go off of a frequency or add something to it, you can't, because it's EPROM. It's built into the hardware. Well,

actually, OnStar is a good example of something that like to me, at least, is an example of the prop, because, yeah, like the manufacturer, it has a contact with OnStar. The vehicle comes with OnStar, and you have no choice but to take that, including all of the downsides, like, like, telemetry and so on, which you cannot turn off. Well, yes. So again, I I think you're talking a bit about it from the integrated system perspective. I'm talking about it from the consumer perspective. So our discussions are not incompatible, but the perspective is different, yeah, yeah, yeah,

yeah. Because, I mean, my view is, if you're going to put it in firmware and not make it upgradable in some ways, then you're you may have more of a problem than you realize. But by the same token, from a business perspective, you've locked in your audience for the duration of that vehicle.

This is, this is such a sad state, in a way, for where we are, because, like, back, we used to be able to swap radios out in cars. Yeah, it was and, and you could actually, you know, buy a buy a cheap radio and then put in at your fancy radio. And to, you know, our point earlier, you might choose, you might buy a really nice radio and then keep it and and use it car to car. And I could see your the whole infotainment system actually being a modular piece, where you choose to buy a high end infotainment system. And there's a to what Klaus is asking, if they're a whole market, and what you're describing from the vendor side, it's like, well, I'm not going to make a infotainment system. I'm going to create a standard space, or a standard plug where it fits in, and then, you know, I'll sell you, I'll sell you one, but that's a dealer upgrade.

That would mean, however, that their whole communication bus has to be

standardized and adaptable and then also secured

part. We just the security part is a big problem in current vehicles. But even then, like, again, like, just looking at it, like, from the business perspective, manufacturers don't have the incentive to do that. Like, look at how long it took for the farmers. Like, they had to go to court against John Deere to be able to service their own vehicles. It finally, like the court, starting trying to force them, but otherwise it will have stayed a closed, unlocked ecosystem. And I think that is the problem with like with this platform is that, again, you don't have the user serviceability yet. And going back all the way to the what started this conversation like secure boot, you still don't have the user service ability there. Okay, it's not meant to be managed by user. It's meant to be managed by an OEM,

absolutely

right? And that's why we've This is one of the things that comes up in this is how difficult it is to actually change out those certificates, or do do your own signing of them, right? Because I think, you know, when I look at what, what our customers would probably get more excited about is if they could sign their own OSS, because a lot of them have a pipeline for producing a bill the golden image, if they could actually control the certificates on the machines, and then also then manage the, you know, assign their OS, and then they actually are in very tight control over what's going out into their data centers or edges. Um, the challenge is that that then requires the BIOS to have a very secure process for up. Creating it, and it's all manageable. Most customers don't have the sophistication to even dream of doing that, let alone try to do it, and

the vendors are not going to help them out by, you know, delivering to them. You know, packages that are that have different dependencies and so forth. So yeah,

and the problem is that the system currently requires that sophistication like and again, like you compare it to to getting a an SSL certificate like 1015, years ago, where you literally had to fax them a copy of your passport in order to do it, yeah, compared to now, where all you need is a domain name that you own and on your on with Acme on your web server, you can get the certificate in minutes. It was

all out of South Africa for some bizarre reason, yeah, yep. Yeah. It was brilliant, and it was expensive. From a compute intensity perspective, it was nice to see the, yeah, to see that go. It's, it's, in some ways, I would go even further back to back in the early days of automobiles where right? We didn't have standards or anything at the end, it was all right, custom made. You know, required a lot of knowledge, required people to really understand how cars worked. And, you know, although sincere, we were talking about cars earlier. In some ways, cars have become more and more integrated units that you can't decompose exactly and and now it's, you know, the security stuff we're talking about people being able to manage is now a locked box, just like our phones are, right? Just great, I guess, as a protected instance, but bad as a locked subscription that you can't now do anything with. I I'm not sure that that's ultimately like we keep putting more and more bubble wrap around things, because users aren't smart enough, quote, air quote, aren't smart enough instead of, instead of saying there's a degree of responsibility or knowledge for, for that, this

is the policy problem. You know, when something goes wrong, and this is something that it's, it's been, it's kind of an interesting, interesting management school use case. So the policy problem is, is one that when you have a problem, what does an enterprise typically do? What does a corporate entity typically do? They put a policy in place to educate and prevent folks from doing that again. And so what happens is, then the next time that you have an issue, another policy gets put in place, and then eventually another policy gets put in place, and eventually another policy gets put in place. And there are two problems with this. One is that you can never put enough policies in place to cover all the scenarios. And number two is reactive. You put so many policies in place that they start to conflict with one another because you've forgotten and you can't, you can't keep from a human standpoint, you can't manage all of the policies. And so one could argue, well, this is where automation comes in. But you would quickly find out that, well, some of your policies actually come

out automate. Automation can't, can't solve the the problem. If the policies are agreeing,

policies to manage the automation. And the third problem, which, which is kind of borderline, like kind of touches, what would you said, like tangential, is that the Paul, in many cases, the policies don't get reviewed after they've been created. That's right, so you make it a standard that every policy needs to be reviewed every certain period, like once a year, once every couple months, they end up being forgotten, they rot, they metastasize, and they turn into a literal cancer for That's right, your progress,

that's right. But it's also a problem of situational awareness and contextuality. And the companies that get over that policy upon policy upon policy problem are the ones that actually take the trouble to contextualize the circumstance around which that policy was created and put guardrails around it so it can't be you. To obviate, if you will, some of that repetitiveness of a policy on a policy on a policy. And that's where I think going forward we're going to see a lot more contextualization be the analytic of choice. Choice, or the key to better analysis is the degree of which you contextualize the scenario, right? Because, to your point, yes, in a B School case, it's definitely policy. But if you go Ford, stellantis, Toyota, Boeing, quality is no longer job one. It's job none. In order to fix that, you gotta go all the way to the you know, back to the drawing board of how things were made. You can't put a new quality policy in place, necessarily, because the circumstances are radically different today than they were six months ago, a year ago, two years ago, etc. So no matter how you policy around that, you need that semantic you need that contextualization, and you need to be able to shift for your agility and your resilience. Context.

Context is what determines what policies are applicable and should be considered and correct, what, or, in the case of a conflict, what takes precedence? And that's you're absolutely right. It is about context, but I'm trying to interpret them exactly. So I keep trying to think of an industry or an enterprise that has, in my mind, done a good job with policy, and I'm having a hard time coming up with anybody. No,

I actually this gets back to the core issue, which is, in order to do everything that you've said, which makes a lot of sense, you have to first know what all the policies are. And the reality is that we don't have good documentation of knowing what all the policies are, and we don't have a good way of communicating those within an organization. And so putting the guard rails up, for example, great in theory, but in practice, becomes really hard to do unless you're talking about a very specific area. But when you're talking about broad policies or policies that go across function or across organization, it gets pretty, pretty quickly you get to a point where you realize that you can't manage that. Now that's assuming that someone is actually stepping back and thinking about this, which is another problem which most don't right. They don't go back and and prune the portfolio of policies Say that 10 times fast, it just doesn't happen or

not do. Actually,

there is one field where I would argue that they've done a really good job at updating and curating and maintaining the policies, and that is FedRAMP. I'm sorry, FedRAMP, so right now, revision five came out short a little bit less than a year ago, and it is a major overhaul and rethinking of their policies in the context of new technologies that came out on new issues that came out, they've done all things compared a stellar job at modernizing their policies. Like, for example, like it used to be that the policies were written entirely with physical hardware or the revised VMs. In mind, they have now a whole section for containerization, for policy based rules, and this is like security policies, not, not, not user policies. They've, they've, they've absorbed and and streamlined the process for like s bombs, source, source code, bills of bills of material. It, I mean, yes, it's causing a lot of churn for for vendors and users, but is a good kind of term like it is getting rid of a lot of craft that was holding

things and to your point, Klaus, when they talk about policies in FedRAMP, they've done an excellent job of working with the means by which you specify or declare or define a policy it is, it can be, you know, two, two individuals or two, two automatons. Can take a look at a policy declaration and come up with the same interpretation. And that is the secret. There and it get it goes to to Tim's point is, if the policies are unclear to begin with, if they're they're more, you know, suggestions, than they are policies, then you have so much interpretability in them that, you know, forget it. I don't think, and this is where I would kind of disagree with with Tim, you you can't and you don't know all the policies in place to begin with. You have to have access to all of them that are currently in place. But the idea that somehow you had to kind of lock down policies that's that's going to, you know, get you into trouble as well. So I think it's a lot about how you define policy, how you test for the and test and validate and test policies. And I think that, to Klaus's point, FedRAMP has done a very good job there.

The other part, also that builds into what you're saying, is they have a very reasonable exception framework. The policies are not written in stone. You cut like if you have a scenario where the policy is not reasonable, you can open a request and get an exception for that scenario. Because, surprise, surprise the people who write the policies and who enforce the policies, they're reasonable as well. And again, I have to give it to them. They've done a stellar job at modernizing,

right? But there's an investment here. I mean, when we do that, going all the way back to the question, right about you fee, and then I do actually want to pivot is to talk about Broadcom, because in some ways, these are related topics. The the issue is, right, there could be a policy that a lot of the companies don't have the time. It's It's too hard. They don't have the operational work policies, in some ways, are human. Are really expecting human beings to do work. And my experience has been that there's, you know, most people know what they should do. They they don't have time. They're not, you know, we're not. We're not. You know, we're not in a position where we can do the things that we know we should be doing, or even follow the policies. Because they're they're they. They require a level of effort or discipline that's beyond what not that people want to do, that people,

you know, mere mortals, mere mortals aren't available to do that. This

is what's so funny about the AI, that right about AI, and, you know, worry about AI taking, taking people's jobs and and things like that, is that most of the companies that we deal with, you know, they know what they want to do, but they're it's so hard for them to actually implement any type of change or and if you can free up some time for them it, you know, they do, but they kind of do perform better since

you bring up, AI, yeah, I would say the policy, or at least users finding uninterpreting policies is likely a reasonable use of llms, because, I mean, it's not like the provide any or the bringing any groundbreaking technology in here, you can always do a search on for documents if your documents are indexed, but the interpretation part that it has always been the problem, and llms can lower the barrier for that. Yeah,

that that's the type of of great lift, that that can make a big difference for people. Yeah, when you can be taking an action and something's looking over your shoulder and saying, you know, that's supported by policy, or that's out of policy, that that can make a big, a big difference, even even down to the point of saying, you know, maybe I think that ideally, the conversation would include, I don't think that policy applies here. Can you raise an exception? Or, can you rate, you know, you know,

yeah, yeah. I mean,

it doesn't even need to be authoritative, like it can still be, can still be saying, like, hey, I want to do this. Does it fall within the reasonable framework, or, or are there any cases where I need to be concerned about

this? Yeah, and if you think about policies in general, they they have kind of certain swim lanes or guidelines that they generally follow. They they're very specific as to what shall do or not do within that guideline. Fine, but here's the thing that's kind of interesting. Before generative AI, a human could actually compute and work through what to do and what not to do if you gave them those guidelines, and has a far greater capacity than trying to memorize all of the policies that are typically applied to any one given job. Now, one of the use cases that is coming up pretty regularly within generative AI is to feed all of this into an LLM that you can then automate in such a way to help guide folks in terms of whether what they're doing is within or outside of policy. And so now we've found an automated way to provide guidance on it. The problem is, as we all know, generative AI is not perfect, and so therefore, if it goes haywire on a particular policy that is otherwise germane to the core of their business, you could end up in trouble. Now, there are ways to correlate that to protect and lower the statistical probability that it goes haywire on those particular policies. However, it's something you definitely have to hedge against.

That was, that was aI analytics problem even before the LMS, though, right with Yeah, calculating mortgage rates or showing bias and making decisions, yeah, and,

and that's why there is such attention to validating the use of machine learning, and now even more so Gen AI in exactly those kinds of decisions. You know, the model risk management is a is a big deal in the bank right now. And to to Tim's other point, throwing a bunch of policy documents into an LLM, into a rag with LLM, you know, using just pure kind of vector embeddings and and similarity is subject to some some doubt. And that's why the hybrid solutions, where you add either network graph, you know, like basically predicate grass, or, you know, a number of other forms of structure, could be some sort of ontology or taxonomy to the interpretation gets you up to a pretty high standard of of interpretability. There are a couple of companies I know who are working on exactly that, yeah, that's where

I think correlation becomes really important. You know, the output you get from one you correlated against the second. And

basically it's evaluation of the out. It's evaluation of the of the system. You really do have to run these things through an observation and evaluation process without that, then you're, you know, you're playing with, you know, a loaded gun, a

supervisor, right? Supervisors for

policy.

But I think, but I think in some ways, this is some point this, this kind of talks to a larger issue, which is we put a lot of trust in technology in terms of how technology works, and not necessarily knowing fully how it works, right? We see this with generative AI, you know, in in a more recent example, we saw this just recently with the CrowdStrike issue and Windows. But the point is, we trust technology almost too much culturally, and that gets us into trouble when things go haywire, because typically it goes haywire in a big way.

We have to trust it nowadays because of the of the the overload like you, you

become,

cannot supervise every single component. You have to delegate.

Yeah, I'm not, I'm not saying that. I'm not disagreeing with you on that. I'm just saying that I think we have overly our confidence level is is too high on and it doesn't correlate with the degree of potential risk. Meaning, if there's a higher risk probability, there should be a higher degree of, you know, maybe. Information, as opposed to, okay, it's, you know, I'm, I'm running a simple little process. Go for it,

yeah, that I agree with, again, it doesn't help that, also that we are mixing legacy processes with newer technology or newer scales like, like, I'm not saying that the crowds like issue would have been impossible on Linux, but ebpf makes it damn difficult to have this kind of system wide outage, just because it sandboxes the processes properly, or at least it tries really hard. Like, sure, there might be bugs that haven't been found yet, but it is a sandbox first approach, as opposed to access to everything first approach, like ring zero on Windows, yeah.

But to your point, you know, I wrote a piece a while ago called The 98% rule, and it's trust technology, maybe 98% but that 2% is always going to get you. You can't lose your own either. Subject matter, expertise, your own confidence level, or your own intuition. And this is this plays back in the AI space of companies that have to capture the institutional knowledge of their workforce, because that's that 2% that makes the difference between success and failure. I did it myself. I know it works. That's the 2% and that will always be a factor. And I think more and more people are starting to realize that, not the mass, but people in the space of technology are starting to recognize that that 2% if you don't pay attention to it, you're going to have grief from now until forever.

I I honestly want to be supportive and agree with that, but at the same time, I don't think people are there. I think you give them more credit than they deserve. With regards to the 2% I wish it, I wish it weren't the case. I wish it weren't the case. Maybe

it's my colleagues and cohorts who I believe think along those lines, because there's always a great assault, you know, like it's never 100% because if you just trusted technology, there would be a lot more people being charged with manslaughter for satellites.

We're in a that's such a weird position for what human supervision is actually going to mean, and how much we're responsible for.

Well, accountability and responsibility is, we're

sort of saying is that there's how much stuff slips through the cracks because people miss, missed it, right? We're starting to get into the doctors reading X rays behind AIS, right human, you know, eat the the test, the drivers who were supervising the AI cars, let alone the AI air cars, without drivers,

well, but there's going to be a, there's a much more simplistic version of that that we're going to be able to see in real time here. And going back to the CrowdStrike incident, Delta. I don't know if you saw this, delta hired David Boyd to to represent them, to go after CrowdStrike and Microsoft in terms of trying to recover some of the the monies that that they were, that they're out. You know, half a billion dollars is what delta is claiming. So it'll be interesting to see where that goes, if it goes anywhere. You know, one of the questions that I start to to wonder is, at some point, does this become potentially class action suitable, or something along those lines, right? Because when you when you talk about the impact that some of these companies can have their liability goes way, way up. And I don't think they're necessarily thinking about the potential ramifications they think they can necessarily hide behind the contract and and, you know, absolve themselves of responsibility, or rather liability, but I think that's probably only going to go so far, and I'll be curious to see how this plays out. But also, even if it doesn't play out, where Microsoft and CrowdStrike gets held accountable for this, I think it starts to send a shot across the bow of you know, maybe we should be talking about making some changes. To who's liable for some of this stuff,

and at the same time, like, since you're speaking of liability, cyber insurance is probably going to be affected by this as well. Big Yeah,

I'm actually hearing a number of organizations that are starting to let their cyber cyber security insurance lapse specifically because they either can't qualify for it any longer, or the and or the expense of it is just far too great, yeah,

and it feels very similar to pet insurance,

yeah, yeah.

But, but here's my question, because, oh, sorry,

no, I just want to go ahead ask your question. I'll tell you why I'm laughing. All right, okay,

my question is not around pet insurance. My question is, does cyber insurance actually cover you when it's not a hack, but it's an error in coding from a third party provider that you are paying for the privilege of having the license for. I don't know that. That actually,

I think they do

exist,

but they do exist. Okay,

they do exist, but they're very rare. They're very expensive, and it is not usually covered in what we think of as cyber insurance. That's right, you're you're absolutely right. And

what I was going to get to as well that that, again, like in the light of this, people are gonna going to try to either use their cyber insurance, excuse me, to to recoup some of the cost, and if they can't, they're going to reevaluate whether the cyber insurance is actually worth it.

Here's what I'm I was wondering, and I kind of had this conversation with someone a couple of days ago is, will the next set of policies be around SDLC guarantees and not necessarily cyber

Yes. Sorry, dare like rev five is like major changes are out. SDLC guarantees, so, yes, it's already happening.

I think the change that I would expect to see, and this is just based on the conversations I'm having with executives that are rethinking how they approach not just cyber but the broader risk profile is, what are all of the different vectors for risk? So case in point, CrowdStrike, not a cyber incident. Now mind you, what happens when there's a major incident, all the bad actors get on board, and we saw a huge uptick in DDoS attacks. We saw huge uptick in phishing attempts, right? So the bad actors are right behind it, you know, saying, hey, click here and you can fix it right away and send us, you know, million dollars while you're at it. So there, there is this kind of tag on, but the crowd strike incident, that's not a cyber incident. Yes, it was a cybersecurity company, but it's not a cyber incident. It was a problem in service, right? Whether you want to talk about a change in code, change in content, whatever, I think it's semantics. The point is, they made a change to their system which caused a service change, which then caused a failure in the window system. And so that is well as technical debt, as well as fragile architectures. And kind of to the earlier point that was made about people bringing old legacy thinking into the mix. Okay, my words, but being those old legacy architectures and the way you approach technology, bringing that into the mix, starts to add additional risk to the portfolio. And so one of the things that that I'm seeing come up again and again and starting to shift is don't just look at cyber or ransomware, but look at all of the different ways that we can consider risk, and that is one of the vectors cyber and, yeah, sure, because there might be other places that actually create greater risk, I can think of a couple of assessments I've done With large enterprises and frankly, their technical debt created much greater risk than their cybersecurity footprint than their risk from cyber far great, far great. So

that what that, what that goes through then, is anything you do to try and address risk needs a you. Form of validation or verification, and if you don't have that, then you are flying blind.

The in the amazing thing to me, and this is one of the challenges I have anytime we start talking about like the, you know, the eufy Hack, or any security risk, and that a lot of those improvements come along for free if you improve your process controls in in the first place, like when, when I this, when it comes to the crowd story thing, I think Delta's in trouble, because I think their suit is going to expose just how oper their operational flaws, much more than crowdstrikes. CrowdStrike fix this quickly, right? I mean, it was a serious bug, but they addressed it quickly. Of you know, I think in a courtroom it's going to come back to upper Delta's operational control processes around it. Now,

I disagree. I flatly disagree. And here's, here's my here's why I'm I'm steadfast against this. Rob, okay, it's not to get in your face. And you know, I'm

interested in your point of view.

I think, you know, okay, CrowdStrike had an issue with and I think this is a great use case to talk about third party risk. So CrowdStrike makes makes a change to their system. Something happens, and supposedly they did go through testing, and supposedly it didn't show up as an issue against the Windows systems testing. Supposedly Okay, they'll go back and revisit that, I'm sure. But let's say that is all true. How is it that a Windows system allowed a third party to be able to completely BSOD the system, as opposed to just a reduction of service like how did it get at this stage 2024 How does a Windows system put so much trust in a third party that the third party can essentially kill the system. And so I think the failure here is not necessarily CrowdStrike. I actually think more of the failure is on on the sense of Microsoft now delta so little bit of background. I have a history and working with major airlines as clients as well as it was the capstone for my MBA was on the airline industry, the global airline industry. So everything from long haul, short haul, cargo, crew systems, operations, route networks, etc. My point is, those systems that they run, and Delta's crew management system are fairly antiquated. And there's, there's a whole talk track I could tell you about why those are the case. It's great to say, Hey, why didn't they update it? The reality is, there are only a few airlines that could, that could actually use a system of that caliber, and there aren't that many producers of the products to begin with. So there's a whole economic problem there.

You're falling back to my argument, Klaus, what do you want to add?

Well, couple things. One is, while I agree that the the Microsoft allowing this kind of failure mode is definitely a problem. Let's not downplay crowdstrikes part in this, because CrowdStrike has a history of failures along these lines, and very similar to LastPass, they don't learn from their mistakes, and then going to Rob's comment here or Rob's question, I agree with him that this is this is not a problem with Delta's operation, because delta tried to be compliant and have a security agent on all of their systems. And if this was 100% of their systems, then 100% of the systems went down within an hour, essentially forcing Delta to have a full from cold Dr of their platform. And given that the timeline for Delta to recover was reasonable, the fact that it cost them having to recover is not reasonable. And this is, I think, the gist of their lawsuit

is that I still come back to if you have systems in inaccessible places, and you don't have a mechanism to take out of band control and reimage them, then that's you know, that that's a very significant

issue. That's a risk that you should not be should not be taking that. I.

Mean, ultimately, I think there's an operational component that drives this from being a two hour outage into Tim, you and I can you? Can you and I, we're I know Joanne wants to jump in. I want to hear Joanne's, yeah, I

know Joanne wants to jump in, but again, I just go back to there. And maybe this is a conversation for another time as to why these enterprises don't have all of those nice to haves of having a back door, having a pixie boot, you know, access, or something like that there. There are a ton of reasons why that isn't the case. And I think that's the piece that that's the one of the pieces that's missing in this,

which is, which is partially where we start. We're right now. This is, this is, we keep circling back into this. Why is it so hard for companies to have the operational processes and controls that they clearly know they should have? But then we end up caught over and over. Joan, you want to do you want to wrap up the thoughts I have. One last thing from logistics after you finish, yeah,

my take on that is, there are so many holes in Microsoft software. It's not funny. If Patch Tuesday weren't a cliche, I would go for it because of the day of the week. All this happened. But to that point, I don't think delta will be successful with CrowdStrike. I do think that they might be successful with Microsoft ensuing them, because if there weren't a back door or a way for content to have permeated to the kernel of the Windows operating system, you would not have had this problem. And so the only fix, you know, is, yeah, it's an SDLC issue for CrowdStrike, but it's more an ecosystem problem and maybe even policy for Microsoft, because what Microsoft has never excelled at, is retesting and revalidating third party providers software on a regular basis. They go with mean standard it. It happens with, you know, and I'll make it very quick, because I know we only have one minute. It happens in manufacturing all the time. Think back 20 years ago, bad batteries in all the laptops that ended up making the airlines not allow you to bring a laptop with a battery inside on board, and some of those regulations are still in place, like in Europe. I was part of that, because we had, I worked for Celestica at the time, we knew every single manufacturer that we were doing product for was going to have the same issue. It all tracked back to one maker of the batteries who did not who tested, but whose OEM customers never retested. And it's the same thing here.

This is the industry has a lot more weak links. And we know, gee, I

guess I annoyed Tim. No, God,

I know, but yeah, but I just pushed the Broadcom piece back. I think this was a better conversation. So

definitely, yeah, it absolutely is.

It's fun to me when the Tuesday conversations, which are much more technical, dovetail into these conversations which are much more strategic. So

it's great we get out of the weeds every once in a while, next week,

next week, next week's topic I have scheduled as Vc models, that'll be extra fun VC investor capital and check in on how the VC industry is doing and funding, And if it's if it's still working the way we think it should

ease and limited. Prayers, yeah,

that's the way I've seen a shift to that. We'll have a good conversation on it. Oh, yeah, absolutely. Let's see you then next week, bye bye,

bye. Yeah,

wow. What a great conversation. It really is fascinating just how interconnected the challenge of having regular, good, automated process and process controls is and how we continually get caught flat footed by misestimating the risk. This has been a theme for the last several episodes, because we've been confronted with all these challenges where operations is disrupted because one the risk of changing, improving and being more dynamic in operations is. Is miscalculated against the threat of things changing out of your control and breaking your operational infrastructure. We have some great topics coming up, talking about the industry as a whole and what's going on with them, and I hope you will choose to join us at the 2030 dot cloud be part of our conversations. It's always great to have new voices join in and be part of the discussion. Thanks. Thank you for listening to the cloud 2030 podcast. It is sponsored by RackN, where we are really working to build a community of people who are using and thinking about infrastructure differently, because that's what RackN does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly, or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. It's all part of building a better infrastructure operations community. Thank you.