20240305 Tech Ops 006 SBOM

11:03PM Apr 12, 2024

Speakers:

Rob Hirschfeld

Claus Strommer

Keywords:

materials

system

owasp

bill

cve

libraries

software

components

vulnerability

configuration

build

artifacts

produce

part

scan

list

container

pieces

control

guess

Hello, I'm Rob Hirschfeld, the CEO and co founder of rockin in your host for the cloud 2030 podcast. In this series, we continue our tech ops discussion, looking at governance and specifically around a software bill of materials, this idea that we can actually define and document exactly what goes into a system. As we put it together, both from a software and an operation side. And in this conversation, we really look at the software pieces around software bill of materials, and pull it back to look at what it would look like if we did that. From an operations perspective, it truly is a big challenge. And this conversation is a little bit more theoretical than some of the tech ops discussions have been. Because we're only at the beginning phases of this, even so I know that you will get a lot out of thinking through what it looks like to implement the materials in your IT systems enjoy the discussion

with that, if the if sorry, the topic for today was is actually related in the sense. We had gotten to a point. Previously, we were talking about container lifecycle management. And we started getting into a question about knowing what's in the systems. So you know, we talked about how to manage containers. But that didn't that wasn't the same as actually understanding what's in the containers and how we govern them. And so today, I was hoping we would spend a little bit of time talking about what is governance look like? And then circle a little bit on this idea of software bill of materials. In that perspective, does that sound reasonable? Is that what people remember? Coming next?

My memory is short, but I'll go along with it.

I hate to do it two weeks

Oh my goodness. So we're talking about with with container management boy with anything you're getting, you were in from an ops perspective, we're sort of handed in artifacts what what can we do to make sure that the right things running in the right place? What's the what's this? What's the starting point here? Maybe we should back up how do we how do we get what's what's what's the way to get the bits on the bars

over a second

All right.

So we're in the first first step is what are we doing to get get get applications deployed? We have Kubernetes which is basically a Docker management system do we need to include? The deployment systems like a chef puppet Ansible? Well, the loose everybody doesn't matter for this. To talk about govern the governing to start with note, you know, how the other stuff got on the box?

Well, it it depends on on the jurisdiction under which you're trying to implement governance. Like for example, in FedRAMP systems, you are required to maintain much tighter control Dun dun dun really what Ansible Chef or Puppet themselves can provide. Like essentially, you're, you're expected to dementia Bosco what goes into the boundary and then do like continuous auditing, monitoring and vulnerability scanning on that. In other environments, it might be looser than that. Then yet again, other environments like certain European jurisdictions, you cannot use Cloud resources at all. So it all has to be

a service. Gotcha.

And I believe for certain for Japan garments. That also applies and then you we have the The complete opposite of perhaps in in China where you have a maximum level of encryption of where traffic leaving their jurisdiction.

Right, because they need to be able to decrypt it. That makes sense.

So, yeah, so once you identify the jurisdiction, then at the very least you can lay down salt, so some baselines, and say, okay, like, if I, if I need to match up the artifacts, then you're likely going to want to have a local artifact store, whether that's your on Docker server, or TFTP, server or, or apt mirror or whatever it is that you're trying to bring in. But that is typically step one, in that you start controlling what you put in there or not. And that's where you end up putting most of your you gates as to which resources are permissible or not,

I guess, part of the way I'm thinking about right when you're sitting, we've noticed we throw around Docker containers as if they're the endpoint. Boy, but I'm gonna have to take a take a second break in a second. It's, I mean, class, when we're looking at government governance here, there's the stuff you put in the container. So there's the containers, maybe we shouldn't not worry about the containers at all.

No, I wasn't, I wasn't talking about computer specifically, it's just an ideal to continue. So I referenced them a lot, right? Yes, we can even step back on for one step further and say like, okay, these are the software packages, how do you want to do it? Or you can even go as far down as hardware itself?

No, I guess what I'm trying to drive towards is, you know, software bill of materials. And if we look at this as a software Bill of Materials challenge, where where are we building? I mean, software bill of materials is going to be true, whether we're putting the bits on the disk ourselves, or if we're building a container, and then distributing the container, where where are you? Where do you? Can you define the software Bill of Materials piece enough so that we know where we're going to build build from it? is, I guess what I'm thinking? That makes sense. Yeah.

So the bill of materials, roughly speaking, it will consist of two things. One is a list of components that you directly include. And to add to station, the What were you basically certified that you have done, reasonable effort to verify the the authenticity of the components that you build it on, of course, this can be recursive. So like, if you if you pull in, let's say, a package from from, from a public registry, it might include its own so bill of materials to say, okay, it use the specific libraries. So if you remember from let's listen about a decade or two ago, Web of Trust, the problems with Bill of Materials are pretty much the same as as with with with Web of Trust, in that. It's hard to have, determine at what point do you stop verifying?

I guess part of sorry, part of what I'm thinking about with this is that you you're looking at collecting that data for the system. You're building the file. Do you then does our system when you're deploying that then scanning that file and doing a cross check on it? What are we how are we? How are we ensure I mean, it's nice to have the file. Maybe that's the first step. Do we then have to also ensure that the file is being met?

Yes, absolutely. So a lot of on cloud security posture management tools, have the ability now to not only look at the list of packages in, in a VM, or in a container, which really is one type of bill of materials, but also look at the dependencies of those packages themselves on give you a report on known vulnerabilities for those. So that is where, how software build materials that are really used in in a practical manner. The other part also is, this ended up being your declarative audit trail. So, for example, if your if your system that that you're that you're controlling is getting audit, you can produce the bill of materials, you can produce the historical versions of this bill of materials and say, and use that to prove that you've done your due diligence.

So there's so many places to go class with this, and I'm trying to figure out the right. And let, can we just stick to building the bill of materials? Like what what are people doing to actually create that. Um,

there's some off the shelf capabilities built into some CI CD systems. In many cases, what you end up using is, well, I guess it will depend on the level of sophistication of your raw material tool. But they are, for example, what's called in total. So I am the first to to at the stations where your bill of material is being signed. Now, as far as the formatting themselves, there is a common format called Cyclone dx, which is the de facto standard for modern build material setups. Again, there are a lot of CI CD tools and platforms that are able to directly produce compatible outputs.

So in using these and using a tool like this, basically, this is giving you a standard description of what's been what's been produced. Okay. Yeah.

I mean, it's really just an evolution of like, likely Maven. So Maven was, again, very language specific, and under this more generic approach, but it really it like if you start comparing them that there's a lot of similarities and

it looks like this is based on something called OWASP. Is that the or as

always, is the is a framework for for software, security procedures. There might even be an organization behind that I'll have to double check that I

think it looks like there is.

If you pay attention to the news, most likely what you will have heard of is the OWASP Top 10 which is a yearly publication of the the most common let's say security misconfigurations or vulnerabilities to be exploited.

Oh, that's interesting. So wait. I thought OWASP was a definition Are you just now now I just got confused. So, so OWASP is yeah, this is this is similar to CVS in though it's we're talking about like broken access control set cryptographic fit. So this looks like issues that people are aware of the homeless like a CVE list,

I think an example might be for every OWASP entry, there would be instances of it, that became CVS. So the OWASP entries would be things like broken access control, or cross site scripting, and a CVE. Might be this spreadsheet. parser has cross site scripting, like an attack. So you say you create the CVE because it has a vulnerability, and that vulnerability is defined also. And OWASP document.

Overflow is more more about configurations or system behavior. CVE is about exploitability. Of of a system. So yes, in some cases that there might be an unknown overlap, but in most cases OWASP is things like, let's say, again, as I said, like a like if you if you allow cross site scripting, well, that's always configuration. It's not a CV on Nginx.

So when you're Wait, okay, so,

oh, definitive answer for you, okay, go ahead, oh, there are CDs and there are C W es common weakness enumerations. So, the inland weakness, enumeration is the type of weakness and the CVE is the type of vulnerability

and the OWASP, top 10 or OWASP is a list of the weakness enumerations, which

are then layered under or over the CVE is of known libraries. And that is that

not necessarily like you can have a system with zero CVE is the type can still have a misconfiguration. Oh, okay, the CVs are essentially your your you have you produce a piece of software is intended to do something and it hasn't controls and those controls can be bypassed. Like eat like, even if it is perfectly configured, on the other hand with all of us is you have a system that you've exposed. And it does not do its due diligence for performing proper access controls, for example.

Okay. Oh, wait. So that could be regardless of whether or not you're using the code. So you said you could have perfectly good code, the Oh Wasp was going to identify a configuration issue or common combination components. I have an

example of why this might be confusing. So an OWASP entry. One of them is insufficient logging and monitoring, because a common pattern for things that make your system weak is not having proper auditability. However, something like log for j, that that has a vulnerability that makes the logging an exploit target. So you have trying to adhere to the OWASP. By having sufficient logging, however, you have a CVE in your logging that is caused by another lack of OWASP. So you have a injection attack on the logging itself. So while adhering to OSS, you're also not adhering because you're you have a vulnerability.

And does that then translate into needing a bill of materials for these pieces? Or is this just general, here's configuration issues I have. Because we threw in wood felt like another layer in all these pieces I've got, right bill of materials would say I've got a whole bunch of components, I need to know which ones might have vulnerabilities in it. And are you using the ones that are vulnerable or not? We it feels like we just stepped into and I could still have a vulnerable configuration pattern, which is an additional layer of which I don't know how that fits into a software bill of materials. Or maybe maybe it does,

it could be a little bit of both. So in a scenario where You see, there's a library component that goes into the larger build of the software, that component has an exploitable piece of code, that's going to be that CVE part, I need to know what's running in your program that might include a log for j as an example. So that's going to be something that you have to fix, you have to deliver a patch for as an example specific to a security defect as it was seen. Additionally, there could be a component where it is notorious that, let's say this library is improperly configured or going back to the auditory example, in which there's not a sufficient amount of information being provided by that library. Ultimately, as an end consumer, in theory, if I have the the s bomb, I could say, Hey, I see that you're using this library. There's an issue with the payload that you're passing. To me, that was a little more indirect. But in theory, it could provide value in which I could come back and say, Hey, I see that this is commonly misconfigured. Are you configuring it appropriately?

So, so perhaps, to do just step back a little bit? Rob, I think your confusion stems from the discussion having gone from site on the X and Bill of Materials to OWASP? Top 10 Yeah, now, Cyclone dx is, like it's published below material standard. And and it's published by OWASP. But say, like, Bill of Materials, is something that is not directly related to OWASP, top 10, it just happens to be on there the like, something that is produced by the same foundation. So but aside from that, what Martin said is still completely correct. So like, as far as government announced, goes, Yes, he, like, you are expected to produce a bill of materials. And that is then again, used for auditing for verify for saying, Okay, I know that these particular versions of packages or libraries have a vulnerability, I need to write a report saying which of my systems have that vulnerability, if any. So that's what they use to build materials for, whether automatically by by a vulnerability scanning tools, or manually by just running your reports, the complimentary part on at worst for top 10 Is that it? It addresses a lot of the configuration requirements that you end up having to meet in a governed environment. Like thanks, again, like making sure that you have your proper access controls, and making sure that that you have you are reporting that you have fewer reasonable source control management and management. I'm part of that. I'm particularly like in more recent publications like FedRAMP, raishin. Five, they're one of those requirements is essentially, this supply chain risk management, which is where the bill of materials coming into place.

I guess it makes the bill of materials, it makes sense to me how all these pieces fit together, I guess I'm not as comfortable in, like just, and maybe it's just me on the mechanics is back to the walls and peace. The simple idea of like, building all the pieces together, is still missing to me a little bit like it aren't just getting the software bill of materials. I don't feel like I can just get the software bill of materials out of the build pipeline. I feel like when I look at deploying a system, there's so much more that goes into it. Right? We mentioned nginx nginx has its own software bill of materials. And I'm not delivering that for my development pipeline. I'm putting that around the system. So and then I might be wrapping that in the container which is going to have a bill of materials. And the container management system has a bill of materials like how how do I do I have to worry about compositing all this together as a as an artifact? Do I do that? Um

It depends on the jurisdiction. Like in some cases, you may only need to provide a volunteer for your for your direct dependencies. In other cases, I can, I can definitely see the requirement for going as far down as at the very least, included library versions being a reasonable request.

And to be fair, like I love this is in flux at the moment. purposes, like Bill of Materials, is not yet a fully standardized environment. There's certainly some toolings for again, like cyclone dx, like until the other stations that are becoming a standard. But there is no for example, there's no unified vendor solution to say, Okay, I'm building this from from the beginning. And I'm able to give you a full report, at the end of it, of everything running in my systems. The closest domain knowledge that we're getting is like to see ICD vendors like GitHub and GitLab. Being able to automate a lot of the steps up to the point where you do where you deliver your artifacts, but there's still the the piece missing, that's okay. Like once you have your artifacts delivered? Well, there's the part between deployment on full lifecycle management and production, that this would be tied back to that.

And then even even going further step further at the configuration like we were talking about, where you'd want Oh, you could actually certify the configuration process or include the configuration process maybe seems like a lot. Yeah.

But I kind of also completely see something like this likely bleeding over into more human interactive environments, in particular, with, with the proliferation of deep fakes and an LLM based transit systems. Having a, a bill of materials for what you're publishing, whether that is software, or whether that is media sounds much more likely to become a requirement in the near future.

While you stepped into the MIDI, I guess, immediate software. So you're, I mean, fundamentally, what you're what we're doing with the bill of materials is making an attestation statement of what it's composed of. Right, and so at the at the moment, people who are building their own software are at least able to say, this is what I built. And we're not we haven't gotten this far down. And to me the ops what will be the ops cycle, to start doing attestation on a bigger system, because a lot of that ends up being not just the system, but the configurations too. So that's missing. It's interesting to think if you say were you saying with media that you would, you would have a similar thought process and you would show all of the components that that contributed to that media source? Is that sort of what you were thinking like it's a chain it's a chain of it's an authenticity chain.

In particular for for official media or media that needs to be verified, where we're likely going to see cases where like camera manufacturers are going to start providing the capability to sign images to provide the DSD they are analog authentic, and not digital creations. And this can go even further to like, again, like media companies and media distribution companies providing bill of materials to to indicate that what what they're publishing is is authentic material

the challenge I see with the S bombs is I don't know if they'll ever get pervasive enough. Because I don't know that there's enough value. I think for most organizations, in the s bomb part, particularly, we're talking software that I consume, because ultimately, I'm inherently trusting whatever the payload or the artifact is that you're giving me, I don't necessarily need to trust all of the underlying components. And if there's an issue with one of the underlying components, from a security standpoint, typically I'm already picking that up now, with the scanner. And so even from an after the fact standpoint, I'm still coming back to you as the vendor saying, You need to upgrade log for J need to upgrade this package need upgrade that package, I can totally get it from an internal software development standpoint, to a degree in order to do things like dependency maps, and the libraries. But I just don't know if that's going to be pervasive enough to warrant enough companies to jump on board.

Again, it will depend on the type of transition that you're working in, that the one part that is missing from, from you, like doing a vulnerability scan in your CI CD, is that without the bill of materials, you are giving your customer very little guarantee that what you're deploying into production, is what you actually produced, right what you intended to produce.

I think that we'll get into something like the salsa framework from Google, because then you're gonna have to have provenance throughout your entire CI CD pipeline to validate both what was intended to be generated as well as what was generated?

Absolutely, yes. It is a very complicated system. But unfortunately, it is becoming a necessity.

Sorry, where you're going is because I'm thinking about this, from you know, and what, you know, I'm putting on the ops hat for a minute. But we definitely are seeing customers in the field enterprise customers who would love to be able to take a environment, right, that they're moving code into that they might have a software bill of materials for, they might be able to have that framed out. But they would love to be able to say without having to scan scanning would be still important. But be able to say this is the all the stuff that we put into that system, we know what it is. And every time we add something in, you are updating the bill of materials, right, we're seeing customers, who are basically saying, We do not want any systems, our environment, that are older than a month where the bill of materials that went into that system or are, you know, haven't been updated, right, they're on a constant stream of new images that getting produced because the bill of materials is going to that image is constantly being updated. But the ability to come for, you know, they all would love to be able to say here is a document that I can review that tells me exactly what's we've put into this system. From you know, all of the you know, and we're not even talking about configuration, we're just talking about which dependencies, which libraries, all the stuff that goes into building a system, they would love to have a document that describes that. And then scan against and then be able to come back and scan. Because even any default, any variance from the scan to what you think is supposed to be on that system is going to be a cause for alarm. You can you can come back and have on a permitted list. And if you're not on, you know, if things are not matching that permitted list, and it's super easy to say the whole system is compromised. What

you see with that becomes some of those underlying libraries, can I really get access to those? Because ultimately what you're doing is you're still back to that inherent trust of I've given you an artifact, you have to really trust me, honestly, that's what it comes down to. Because I can give you a JSON payload that includes anything in it. Yeah. And ultimately, what's your mechanism to actually really validate what's running against that list that I gave you.

If they had developed materials, complimentary to your to your system scan, and it does not replace it, right

there but you want it the right now though. Oh, by the way, and we're gonna run out of time, the scans that just take a known bad things list, like a virus scan, or a vulnerability scan, and there's scanning for what, what you don't want to have on that system. I think the whole point of what we're talking about today is flipping the script and saying, These are the things that I know were there. And then once you have a list like that, then you can start saying, well, any of those have known vulnerabilities.

It's not just that it's also taking the result of your vulnerability scan and comparing it to the bill of materials, making sure that they match up, right, because if they don't match up, then you have a problem. Right?

Wherever they are, they're going to match up with something like a goaling binary that's been compiled, or is the customer going to need some of those underlying components, the Odin spec,

actually, Gollum binaries are easy ones, because they actually do include the list of libraries that the different package with perhaps something like web assembly would be? More more dubious question to answer. If it doesn't contain its own internal bill of materials, yeah,

might be something, any of those lists of items where it's the moment you get something that doesn't readily expose that, then you run into the hiccup?

Of course, I mean, if you're a control environment, we do need to have our decisions down to the data library, you're likely end up going to be compiling from source.

Sure. But I mean, you're not unless you're building your whole OS stack yourself, which I guess you could do. Even then you're still going to need to know all the things that go into it. Because nothing's nothing's ever truly often source.

Yeah, that's where mine is, it's for me, it's trying to decipher between a bit of a security theater and actual practical value that a lot of organizations you're going to get from as bombs.

It right now, the the cost of implementing as voms, at least from a commercial perspective, is incredibly high. Which is worth having said that, again, like the scene is evolving. And I expect within the next probably five years or so, two things to have standardized pretty well.

I would love to see it. I know that, you know, the engagements that we have people would would really vote would benefit from just having a accurate description of what they're what they're deploying. Right. And this has been, you know, one of the things that that I've learned. All right. So, sorry about that. One of the things that I would, I would, you know, I think would be incredibly valuable, right now we build stuff. And this is what I think we're everybody is. So this isn't a unique track and problem at all. We build things for people, just like any pipeline does. And at the end of that, they know, they can go back and read the code, or they can look at the process and say, Yes, I was built by this. But there's no real output that says, you know, here's the here's the recipe in a consumable way for me to know what was included in the recipe. And that's, that's the thing, not knowing what's what got baked into the cookies. You know, is is a problem.

And it just, it's a shame that you

haven't done anything for that. Along those lines. Oh,

all right, we're wrapping up. Because people just enjoy the cookies, they don't care. Well,

that's what but when we're talking to these enterprise systems, right, and they're starting to be like, Okay, I've got all this amazing automation that is building all this stuff for me. And it won't even you know, it's impossible for me to know what what got stuffed into it. It's it's a big missing piece.

I mean, I sympathize with my testers point like it right now. The effort is to implement as a lot and most people if they don't have to Want to want to do it. But I mean, I, the way I'm seeing seeing the landscape, things are gonna get worse. Again, within a couple of years below materials are going to become a necessity. Not something that is like not not not even because of regulations just because the the amount of threat actors that are that are taking advantage of impersonating upstream sources is increasing. So without a bill of materials, you are, you're multiplying your risk.

I think one of my takeaways I just wish it was easier to execute is that every every component, every process is touching a system needs to be able to generate some output, about what it's what it's done. And then and then to hand that down the pipeline. So it can get aggregated together. We do that like for you were saying for go Lang, where we're like, oh, here's, here's all the things I've mixed into this go Lang binary, and that you could generate pretty easily. But as we walk through the pipeline, and we start adding things into these systems, I don't we don't have a good process at the moment that I've seen, to sort of collect that information. It's an interesting, it's an interesting idea. People would I know, because just my customer experience here is that, if we were if we were producing an output that showed what people had, with what the what the pipelines had built, and added, it would be a huge win. And thank God, I need to think about that more. Is the right, just, you know, thinking back on the conversation, we keep coming back to we need this, we need this, we need this and provides additional guidance, right, we want to scan for compliance. We're still we're still struggling with the thing I keep hoping to get to, which is, and this is what the file looks like. And this is how we build the file. And this is how we accumulate all that data as we build the system.

Questions don't don't have an answer currently, like, for example, where do you store your your bill of materials for your artifacts is currently not a solved? thing?

Right, right. Yeah, but not a standardized thing.

Yeah. But there are some ideas like for example, similar to how cosign stores at the stations as OCR artifacts together with data images. So could you publish your bill of materials in a similar way? The thing is that not every artifact does live in or in OCI registries. So you still need to have another way of storing your and collating your bill of materials, right.

I would just be happy if we were producing the data right now. And then we could then the format's, hopefully would flow from that. But I'm thinking about the way we build pipelines, and it would it would take an intentional effort to useful instead effort to aggregate all those pieces together.

Trust

Cool, all right. This was this was a little bit more. We were we weren't as concise but I don't think this topic can be covered as concisely so it was helpful. In that perspective next week. I was hoping to dive a little bit more into the like the system D and management pieces. So I'll get the rack end team that knows those pieces they were on today and also but why. Excellent. Alright everybody. Thank you. Appreciate the conversation. Appreciate it. I'm determined to make s bombs, reality for infrastructure. So I'll make it a career objective. And then I'll get it

five years.

All right, everybody, thanks by

getting systems to a point where we actually are confident about what went in them. The recipe if you will, is a really important milestone. I think it's a critical missing component for how we build it systems. And one that for me professionally, as I said at the end of this podcast, The end of the discussion, I think, is a really important thing that the benefits that we would have if we can actually say this is what went into our systems really build a bill of material. That in itself, I think is a really significant it milestone for organizations. And one that I hope you are thinking after this conversation about how you can start to achieve in your own organization. You want to hear more about our tech ops series, we have a long list of exciting things before didn't make the recording. But the warm up here, we actually spent a lot of time talking about wasum. And we put that on our tech ops agenda. So tune in, check out the topics that we're going to see you can see the whole list at the 23 dot cloud, come in be part of the conversation on the topics that you're interested in. I hope the series itself is interesting and informative to you. We are going to keep on it and keep covering these advanced technical operations and automation topics. I'll see you there. Thank you for listening to the cloud 2030 podcast. It is sponsored by rockin where we are really working to build a community of people who are using and thinking about infrastructure differently. Because that's what rec end does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. All part of building a better infrastructure operations community. Thank you