20240130 TechOps 003 API Consumtion

9:56PM Feb 16, 2024

Speakers:

Rob Hirschfeld

Claus Strommer

Keywords:

api

data

cli

people

ipv6

credentials

give

machine

terraform

curl

write

automation

perspective

client

talking

uefi

provisioning

handle

callback

interact

Hello, I'm Rob Hirschfeld, CEO and co founder of RAC and in your host for the cloud 2030 podcast. In this episode, in the tech ops series, we dive into how to automate against API's. And we discuss exactly the ways in which you can use API's effectively, if ways in which you can run into traps and trouble, which is really what most of the discussion ends up being about. And really think through how you should be looking at consuming API's, both as a consumer of API's, as we all are, but also in times when we have produced API's, from learning how people consume our API's and what we can do to help make them better and safer. I know that you'll get a lot out of this, and it is just part of the broader TechOps series where we are diving in deep in tips and techniques that improve your journey as an Automator.

Well, it's actually a nice week when we don't have some major tech controversy going on in the background. Oh, you know, we didn't talk about that I thought we would talk about we can do this briefly as the UEFI. Issue. Plus you did you said you want to hear about that? And then we just didn't have time? Or

I don't think it was me, but remind me, is this the reasonable nobility? Yes. Yeah. I think I just mentioned in passing on someone else said, Yeah, this is an interesting to

talk about. Right. Okay. Yeah. Because we had posted I think I gave a link we'd posted our analysis of it. Yeah, the naming creativity was way down on this one. I think they just called it up fail. But I don't I it's funny. I mean, we saw it, we thought it was pretty significant. And like a lot of BIOS hacks it that nobody, nobody seems to pick it up or care. And this one's specifically on net net provisioning, which is not a mainstream news item. So little sad little a little

Sandy was ipv6 related. So and I

could point yet, dude, things that nobody seems to get worked up about. I think I think we have like 10 people in the in the planet who actually care about that, that that Venn diagram intersections

that's personally would love for ipv6 to reach a critical mass of adoption. Like that would solve so many of my nagging problems. Until that there's enough vendor support and infrastructure support in general. It's hard to do, like, unfortunately, like, having a public source and on ipv6 still means that you're cutting off a significant portion of the plan planet from being able to use it.

Is it? Is it really that?

Oh, yeah, like, for example, a large portion of India's ipv4 only under there's several other networks to suffer from the same problem. Yeah.

Because I had assumed with the cell networks, especially and also, you know, the US has been forwarding ipv4 So I assume that people didn't have a lot of choice and let's I guess you went private networks inside of your, your go to firewall and you can you can have a private private sites on the non routable. IPS can how do they make that work?

Like magic as far as I'm concerned, okay. Just

yeah, it's been it's funny, because there was even in our awesome discussions, ipv6, we had a couple of ipv6 conversations I think we had horribly on and then I haven't heard about it much at all, from like, from a customer perspective. Greg, May, you might have more insight on on active interests.

Probably not. I mean, there's a few people who ask about it. But in general, it's an environments that aren't necessarily doing provisioning or fixing for us. It's just not an issue right now. We'll see as we move forward with our HTTP boot sequences, right, that becomes a different story. But right now, as long as everything's still Pixie, it's the six kind of isn't a topic. Do you manage managing machines? Do you think

because that that would be a place where up this up bug is actually more threatening, right? If if somebody's looking to migrate to UEFI from a provisioning perspective, and eliminate the HTTP, HTTP TT DHCP of that whole sequence this that set that back?

Well, you still, you still need a a UEFI. Boot honor. It's just how you get it. And when you move to HTTP or HTTPS booting, you, in theory can use V six easier to pull a bootloader, in which case, you're now open to the problem, though, in some regards, the problem is also not that it was I mean, while it was poorly written code to some degree, it's also the acknowledgement that the EFI stacks on these systems are minimal stacks. They're not in they weren't and aren't intended to necessarily be hardened. So they're not like your traditional Windows or Linux networking stack where they have lots of resources and lots of components that go in make that secure and handle all of the edge cases of which ipv6 added a lot of new edge cases.

Unfortunately, the, the story that consumers were being sold about UEFI was that it was more secure, particularly with, like our own secure boot. So unfortunately, it makes some assumptions that are made. And I'm not arguing that the assumptions are correct, it's just they exist.

The thing I've seen is just secure boot. Enabling Secure Boot requires some operational capabilities that, you know, even our more advanced customers are, are slow to adopt.

To absolutely like, they're the only ones who could really benefit from secure boot, where Microsoft and being able to do lock down the bootloader on their consumer hardware. Right.

I would also add that the financial institutions and stuff that care to spend and have the resources and environments also good to take advantage of it. Because of the chaining that some of the distributions, specifically Red Hat and VMware put into it to make it a viable path. But yet general Linux consumption, cloud consumption, they don't have a use for it, or it doesn't. It's harder to use and

by design

the. And, speaking of design, I'm gonna I'll transition as to talking about the topic of the day. And so my hope with this and as a reminder, right, my goal for the series here is to be able to dive into 200 level content. So I would encourage everybody, you know, as you're listening, if you have stories or something, you know, anecdotes are helpful from a sharing perspective. So it's not you know, I want to go through, we talked about API's two weeks ago, three weeks ago, sort of generally, and a lot of pieces. What I'm what I'm hoping to do today is actually focus in on this idea of automating against an API. So consuming an API. We'll have a different session on consuming it, but then are designing API's. But sort of talking through this idea of if I'm, if I'm writing code against an API from an automation perspective and an ops perspective, what do we what do we need to keep in Mind where the gotcha is, what are some tricks in the trade, things that we can use to help make the systems more resilient and secure? And so I have a ton of sort of cedar questions from that perspective. And last week, we talked about this vendor vendor CLI versus using the vendors API's. And as from a candidate perspective, that might be an interesting place to start. As we talk through this, how the CLI is are helpful. What uh, what do we need to think about when we're actually using a vendor's API directly?

And I'm pausing for

I can, I can definitely give a situation where using a CLI tool is ends up being more problematic than using the API directly. That sounds great. And that is, when you're interacting with a service of a container. Because like, a typical use case, for example, TerraForm

we so we service which which direction? Do you mean service? Where you're going from the container out? Not from the container in? Right? Yeah.

So when, when, when you when you have by, for example, we will use Atlantis to, to manage or for infrastructure, but But really, this is applicable to any kind of infrastructure Infrastructure as Code environment, where not, not every service has a proper TerraForm module available. Or they might have only a third party one that is not valid. In which case, you need to interact with the service directly, either by a script or binary that you write yourself that interacts with the API or via the CLI tool. The problem with the CLI tool is that as the service evolves, it frequently requires to CLI CLI tools to be kept up to date, which, if you package those along with your TerraForm resources, the gives you a version problem. So so actually having this UI tool to interact with ends up being counterproductive, because you have no way of declaratively saying I'm interacting with this specific version of the service

almost almost ends up being a benefit of the right last week, we talked about CLI as having some version agnosticism as a benefit, but here what you're describing is it potentially interfering with your using other integrating with other tools?

Yeah. So I mean, in that context, CLI is are the equivalent of rolling release destress, where you're, where you have more bleeding edge on the on, as long as you stay current. Most of the time, you're fine. But if you need a deterministic version, you're Sol.

It's interesting, because I'm part of what I'm thinking through is we need to talk through some of the challenges with authentication and semantics. But, you know, if, if, when we've when we've tried to do this, and you end up packing a container, full of like, a whole bunch of CLI s and a whole bunch of TerraForm providers and, and updates of that container become a mess. And if you just need to do one thing, or you need to just pull something from an API a lot of times it's you know, if the calls are straightforward to make if, and that can be a really effective way to just pull a piece of data.

Yeah, right.

You know, the challenge I get into because I looked at this from a Amazon perspective, just handling the authentication process to use it, API turned out to be a pretty big burden. It required it's not a single right for to get a token to interact with the AWS API, you actually and this isn't uncommon for their an example. You need to have at least a round trip to authenticate pull back a token, use the token And then make that part of your calls going forward. So there's there's an overhead.

Yeah, it's it's a sharp curve to enter. The thing, though, is once you are using the API, so once you have the authentication out the way, any additional API that you're part of the API that you're using, the effort is lower. It's only a fraction because you've already authenticated what's right. So so the more you use it, the more the API, it is actually

easier to work with, the less overhead because you can almost always run the curl and pull it you know, like, oh, I need something go go pull it out using curl. But I, but that's not trivial, right? I mean, no API's behave in weird ways. Sometimes it's not just as simple as like, here's go get this data, you might end up getting back. All sorts of, you know, redirects are a bit more data than you want.

I'm trying to say, yeah, go ahead. Yeah,

my experience here, I would say, I rarely see myself using something like curl directly. I think maybe taking a step back and saying, you know, what is the environment that you're running in? It sounds like you're anticipating that users are primarily running this from shell script or something like that. And is that always the right assumption, are in my background, I'm coming from inside of a service, and has facilities for handling all the, you know, API calls and callbacks and timeouts and handling all the error conditions. And so the burden of looking into the API once is even lower than something like curl where you have to figure out how to carry forward credentials. And so I say, you know, what, how would I decide that? And I think there's just a lot of nuance in, what is the calling side look like? And what is the receiving side look like? How deeply are we interacting with the API? Because if I'm deeply interacting, I might want to hook into the API using code.

Well, you're hitting exactly the type of question and dilemma that I want us to be exploring. Because, like the, and I'm putting on an automation hat with this, because one of the challenges is if you write something in code, like you throw it to Python, you know, potentially even you know, something more like a go module or something like that. Once you've done that, then changing or updating the API, or even seeing what it did, can become a problem, right? So you might say, Oh, I'm gonna I want the tools that you're describing rapid and purr in a in a Python script. Right. And now, you're you're, you know, now if that API changes, or it behaves in a weird way, might be actually harder to diagnose. Right? There's a trade off, isn't there?

Yeah, yeah, there are trade offs there. So

you have a rule of thumb that you try to follow. Like when when do you throw in the towel and write and write the code?

I think it boils down to how complex the interaction is. So if I'm making a single API call that returns a single piece of data, and I'm working with that forever on, you know, that's or simple cURL call, I could see that. On the flip side, if I need this complex interaction, where I need to maintain state of these complex objects and update them in time, that I really want to hook in, in code.

It makes sense if you're making me think for some reason about like an Ansible script, where you start with like a curl command, because you just need a piece of data and then, you know, a couple of days of hacking Ansible later, and you're like, oh, I should have just written something to wrap it, but

that could be the case. Yeah. Bro, like that. And at some point, you decide, oh, shoot, you know, let's scrap this and, and switch to hooking into the API. Yeah,

it's this is this is the thing that's so strange. Right, a lot of times we're using one tool, like Ansible or TerraForm. We hit we hit an edge where we can't easily get data that we want. So we're we're stuck going I mean, we see this we see this and people using the product we actually wrote a, a plugin for this the curl hammer, or what do we what do we do code curl hammer? Can you can you talk about all of that? Because that was that's, I think, a great example of us trying to solve this problem.

I mean, the callback stuff, the callback stuff. Yeah.

I mean, at one point, we were we had had this idea that we called foot gun, but the callback stuff evolved to be not that

well, so really, all of that the usage was to prevent leakage of credentials and control of access. So for our provisioning environments, we would have actions that we would want to do on behalf of the machine. And since our tasks run in the machines environment, we're pushing credentials and requiring access to integrated services. And so that might not have have allowed access remotely. So think of it like, hey, we want to update the DNS system with this new host name based upon information that we got at the machine like its MAC address, and other stuff like that. So we're going to try and update the host name on the on the DNS server. Well, the policy in a proper one is that machines can update their own address, right? So our whole set of foot gun behavior was okay. So we'll allow an action that will be proxied by the European point, to then take that API action. That way the and that way, it can be some arbitrary script. Well, that's also scary, and requires then somebody else to write that script. So what we then provided was our callback system allows you to put their credentials and authentication for an external API, define the payload you want to do based upon the machine. And then our plugin in go will marshal all that to an API call and provide the result. So the credentialing and all of that can be maintained and controlled outside of the machines access path. Am I talking? Can you guys hear me?

Oh, cool. Yes, I

was worried I'd muted myself, sorry, outside of the access and control of the machine. But the DRP endpoint could act as a secure credential store. And even eventually, using vault and other integrations to have access to the actual credentials, versus having that distributed down to the machines. That way the machine only got the data needed, or potentially allowed triggering based upon what the administrator allowed, and the outbound credentials there. So it wasn't necessarily a hey, let's give the way to wrap API's. It was more how do we provide distributed API proxying in a secure way? And then that evolved into well, they make it easier for our operators, who were just getting like, Hey, can you just do a post to this location? Okay. This allowed them to proxy just that post or the various other actions

either way, interesting name, foot gun. Wealth, there isn't any gun.

Yeah, because we're worried or if customers are going to shoot their toes off.

Sometimes we name things to imply the danger, the adequate, the appropriate level of danger.

The initial implementation basically allowed you to pass a script kind of from the admittedly controlled by the administrator but and then run that in the default environment of the endpoint. And that's got all sorts of security problems, we repetative courage and to move away from

and I think it's, I think you're bringing up a really important point about, you know, part of the challenge with In any of these, any of these these processes is that you can leak credentials. And in ways where as you're working with the API's, I mean with any of these systems, it's one of the challenges is to is to not leak, that type of information. The other thing you said that I thought was really interesting is a lot of times it's, it's a post it not like I started off thinking like, this is about getting information. But a lot of it from an automation perspective is actually posting back to places even more than again. And I think, yeah, forget that. Go ahead.

We allowed both. And the data came back. And we then made it the responsibility of the caller to parse the data. So in general, we focused on encouraging them to pass back JSON objects and stuff like that, but it's not, it wasn't required. And this is where in some regards to me, most of the API discussions are really around. What contracts are you expecting? And how are you enforcing and managing those right, because in some regards, the the callback plugin as we used, it just became a conduit, it was still dependent upon, in our case, the client task writer to understand what service they're talking to the API they're calling the contract of what each action means, and so on. That's why I have a little bit of problem when people are talking about versioning API's. I mean, I think it should be done so you can figure out, but most of the interactions are all dependent upon understanding what the actual contract is between those calls. And the versions are the a way to represent that contract. But that's not necessarily enforced nor guaranteed.

There was some design work that we lead that we saw in in OpenStack, not meaning to give every anybody a fit. But that had this idea of embedding metadata into the calls to allow the caller to specify version, like the version that they wanted supported. Yeah, that always struck me as

I've seen that so far as to, I want to say, Gosh, I'm gonna blink where it really enterprise II software, or keeping secrets, and I'm drawing a blank on the name. But they, you would be able to request the version of the API that you wanted in your initial request. So I guess if they supported multiple versions of the API, you could get

a response back seems incredibly dangerous to me.

Well, I mean, there's a whole push right for the swagger base discoverable API. And that's all well and good. And I find it interesting as a conceptual kind of thing. But the the actual practice, though, we, I find that you're now making the developer of the client have to make that map. And then they're building that into what they're providing be that CLI or a front end or whatever, that then understands either the nuance of versions one, two, and three, and knows how to call the appropriate calls, or deal with the fact that version four might not return the same metadata to then drive and so yes, I guess you can then have smarter clients that will come back and say, Wait, that's something I don't understand how to do, which there's value in that. But it's just as frustrating to the user in some regards. Because they're not achieving what they want to do. Or what I see a lot of in like our uses of like redfish, right? redfish is somewhat discoverable. But you have to then have, you know, a five person team dedicated to writing the discoverable nature to figure out then how to map things in the discovery path, to then make that viable to just figure out how to reboot a server because five of them put the reboot applications in different places. And so yeah, I guess it's kind of it discoverable. So. And it's not I mean, it's while

you're, what you're describing is that you get back something, you have to interpret it and then then you're back to the client having to make a decision about how to how to use instead of having a strict standard that people adhere to, if it's if there's a degree of flexibility there. Then now you're back to a to call minimum plus and plus well, just well,

so now you'd have to make a smarter client Okay, well, we do that through our CLI, right? We handle a lot of the version checking, right? Our 413 client will still work with four, seven and four, right? All of those kinds of things right now, you may not get all the feature functions forward looking. But from an API design perspective, we've kept those drivable. Okay. And if we didn't, we can still make in the CLI the decisions to handle that. But the point is, we as the providers of that CLI took care of it. If we were saying, hey, just do curl and providing the base API, then we're expecting all of our consumers to make that decision. Now, they could they can, they can make the decision based upon a specific version. And that's all well and good. But now they've locked themselves in some regard. And so

you don't know. But this is, and this is, I think, part of the whole question. If you're automating against an API, it's really hard for the API provider to actually know, who's calling you what versions, they're expecting, what they've wired in to expect the output, right, you're really operating in the blind from that perspective? Is there is there something that the consumers can use to protect themselves or you know, this help help the conversation? Good? Well,

let's just say, I think, a different aspect, though, is like what, I guess classes kind of allowing alluding to initially was, if I'm writing to an API, I then get to make a decision what I'm going to pay attention to or not, and then based upon how rich the API is defining its versioning schemes, right? Then you can at least make some choices. But now you're choosing to make whether you're doing it yourself in your own curl or your own programs, right, a set of guardrails that are going to keep you from falling off some edge. Right. And so now you're talking about what's your client side design rigor? How much are you testing? How much are you worrying about? How is this a throwaway? Is this a? Are you going to try and use it? Is it supposed to last for years? Is it supposed to last for 10 minutes, right? All of those kinds of decisions now have to fall into your thought process.

Which is actually put back to where a CLI is nice, because you're hoping the CLI designers have done some of that work?

Well, and so now we're talking. So that's where, to me most of the API automation discussions, all kind of devolve into making decisions around contracts. Right? That's really what you're defining as some contract. I'm going to do a, I'm going to give you a you're going to give me b, right. Now, whether that's HTTP POST, or whatever. And so you can say, I'm going to follow the API because I want to be at that level. Okay, that's fine. But now you're reliant on that's your contract with vendors, right? In theory, you could say, like, I could try and write to AWS is, you know, ever changing SDK from there, go V one, two, there, go v2. Right. Okay. There may be reasons to do that. Or I could rely on the fact that the AWS CLI is going to handle that for me. That'd be a stronger contract. Yeah. Well, and so the contract there is I'm gonna give, you know, call AWS, you know, s3, copy, blah, you know, file to file. All right. Now, I've got a contract that I could do that Sure. I could do it through an API set of libraries. And that might make sense. But if I'm looking for a consistent, you know, way to just upload a file, let me be a better way to do it.

Right, AWS, there was a time when AWS changed. And you know that people have to do this, I'm just using them as an example. But they had a contract when you created a VM. And you could you could put it on a, you know, they defaulted a network or virtual network behind the scenes. And they deprecated that functionality so that when you created a machine, it would no longer allow you to have a default network. And so it it ultimately got to a point where broke the the API calls and the CLI was was breaking too. I mean, that's a back end behavior. That's effectively an implied contract in the API. Is there a you know, sir, defense, for that? I mean, what do you what do you how do we cope with the knee? You know, it's unnecessary change, I'm assuming on their part, but you just you just have to be aware that at some point back end API's are going to break even though the calls are valid, both are valid. Does that make sense, right? Because a lot of times these calls have tons of stuff going on behind the scenes, it's not as simple as I'm dropping, you know, I think you're right, you have to upload a file or you're getting some information. Sometimes, we're doing we're calling API's that have much more complex back end behaviors. I know for our stuff, there's times when you make a simple way, it could be a simple call, just setting a workflow on something, what you what could go wrong. And all of a sudden, there's a, there's a pretty big consequence to that operation.

That's why you end up using the intermediate representation, which is the programming language specific library published by the API owner, as opposed to the API directly. And then at least, you have, I mean, you don't have a guarantee that it won't break. But you know that a specific version of the client library is going to give you a predictable results that the gay owners is going to, or at least is supposed to publish a new version of the client diary one day update their API.

Makes sense, would and the challenge is that you could wire your calls to fail if the API changes, but it's not always clear. If the API has changed in some braking way. And your call would keep working without maintenance, which would be ideal, ideal from API design perspective. So it's like there's this balance of you, you don't want to, you know, you're gonna, you're gonna make the call, if you assume that if a version changes, the call is gonna fail, and you've proactively failed. And that that's as much a problem as anything else. But

that's, in my opinion, a failure of the API owner, and up to fail to versioned API properly. If you make it backwards, incompatible change, you're supposed to increase your version, so that you don't create these kind of transition problems.

So but what about the issue of side effects? From this very simple post or a simple, get fun, but there are definitely times when, you know, you're making an API call. You know, provisioning a VM is a great example, right? You're, you're making a request, you have to wait until that request is fulfilled, to get the sort of things that makes TerraForm. You know, to me, there's, there's weird things where somebody where there's little things that something does that are, you know, a much bigger deal much harder than they realize people realize, but like doing TerraForm, where it waits until it gets the ID back for the machine. It's called, you know, six different calls for you it chains a whole bunch of stuff together, gets back the actual ID from the cloud provider that wasn't in the first or second or third calls. You know, that's that's, uh, that wrapping those all those API calls is a big deal. Why terrifying out, it's one of the reasons that that product does, you know, was so revolutionary when it was doing it. You know, how do we what defenses can we use to look at API's using API's when the goal is to create a, create a secondary action or something like that? Is there. A good a good strategy

depends largely on the design of the API. And you can either interact with it synchronously, in which case, the API would likely have an endpoint, the lack of the type of a wait for this resource to be ready. Or it's an asynchronous call, in which case, like the first call gives you a, an ID of for the resource that will be created. And you use a second API call to watch for that resource to be read. And led, let's be honest, like the same happens on With the command line interface, as well, like some some command line interfaces are, or dwellings are, are synchronous by nature as well. So you have to repeat the same pattern are the ones are synchronous. So it's not like this is an issue that you need to APIs or CLS, right?

No, it's not. I know that. You know, and one of the things I have on the list in here that probably is a dedicated topic, to come back to probably just make it next week's topic, which is this whole web socket event. Component. Because it because right what I know like when we build CLI it, it will subscribe to an event, waiting for things waiting waiting for certain times. It'll it'll subscribe to an event waiting for actions to occur. Actually, I don't want to pull this down this because I, we should save that as a continuation on on API discussions. I guess the thing that I'm I'm wondering about is how do we do a good job using API's? And to know that we're doing we switched from synchronous to asynchronous? Operation, right?

Well, all right, that API. Typically, in our well developed API, you will see the ability to call to do both synchronous and asynchronous call based on on your need.

Literally, literally should be a flag for block or don't block because what you're thinking I'm not

necessarily like, it might even be like, Okay, do this. And the other one is do this a synchronously. Like similarly, like, for example, with with API's that interact with with lists, like you typically have, like a raw API, which may have limited capabilities. Let's see, as I cut off, if the list is up above a certain size, but then you have the paginated API equivalent, that then lets you get more data, you just need to do more calls.

I'm thinking about and Greg, I'd be interested in your thoughts with this because we we've seen this several, there's there's two evolutions here. One is web servers have gotten better about handling sockets to stay open while you're waiting for something to occur. But I know we've had a long evolution of adding having to add pagination because you know, somebody would show up and do I want all records out of your database or her right, where they say, oh, I want this one. thing, one, one instance out of 1000 item list. So they make the request and then the client side filter. And we've had to come back and add in some pretty sophisticated filtering. There's an API lesson in that of, of engineer, making sure people know they can not just bulk request every every record, and then JQ filter it down, or, you know, no, Greg, how do we how do we communicate that also?

There's Well, specifically to the topic you jump to on lists, and data, right? Some of that, as part of the API design, is dealing with, how do you provide the ability to keep yourself from being overrun, right, that's how we've thought about it. And, you know, in all honesty, we've kind of evolved it ourselves, because we've had to deal with the hey, we want to query all a million jobs are like don't do that. Now, we've kind of let you stream that if you want to. But in general, we don't encourage it. And so we've started putting in to the API definitionally. And because of the way we focus and think about our API's, we have the ADD, but not modify and don't delete kind of model so we can incur manually add things to the API that facilitate that behavior, and start taking advantage of it. So we've added pagination, and counting, and all sorts of other things into our list API's, so that we can facilitate the driving have that kind of pattern of like, okay, he wants really three objects, don't ask, don't ask us for 1000. And we'll, and then you sort of tell us what you want to sort. And we'll give it give you the three because it'll save everybody time. So that's become what we've enabled. Now. You know, there's elements of our API definition, that we've kind of added that to the whole process, and I think, somewhat reasonable way. So it doesn't break usage, it follows a generalized pattern, right? offsets, and pages and filters, and all of those things kind of match a reasonable pattern. So that's fine. One of the challenges, though, on some of it, is the

usage of other technologies that have popped up, right, and so and this is where I think we keep exploring periodically as things like you could replace or extend or add parallel, some of the like Graph QL kind of concepts of, I want an object from you, I want the object to look like this, it needs to come from this kind of set filtered by this looking this way. And then your API dynamically responds to Well, here's the object requested, pre filtered to the way you wanted to look at it. And that's interesting. But it then puts a lot of operational work on your back end to be able to handle those. So then it becomes a trade off of, are you? Are you the marshaller and presenter of the data? Or are you just the source of information and in trying to facilitate some content. And I think it's one of that we've kind of struggled with because some API's have said, They're going full in on that built around that and drive that. But then they suffer some performance kind of issues at times, depending on how they're dealing with the data. And then your client gets much more complex, because it has to be smart enough to know how to generate that the that data, those queries and marshal the responses coming back. So it's one of those it's kind of okay for a developer who wants to write their own client, but it's not necessarily what the end user wants, or the end user may want a more directed response from the API. So those become trade offs. And so right now we've chosen not to do the effort, because in our system actually adding like Graph QL, and things like that are a little challenging, because our data storage and data models are harder to adjust that. But so

you're saying that if you were doing Graph QL, then when I made a query against the system, I'd find the interconnected data more easily, rather than No,

no, no, no, it's more. I mean, that's a potential option as well, of saying I want this object, and it's actually a join up these three things. And you and I want these five fields. That's the part I was getting at right is, in our system, we may have a machine object that represents you know, a megabyte of data as a raw JSON blob. Well, I don't want to, I don't want to ask for 100 of those right, or 1000 of those. So what I want is, I want every machines IP address, right? Well, so we've now created into our system, the ability to say like, Well, okay, send me that object and give me just the fields, right? Don't give me parameters, or give me just the specific parameters with the fields. Right. So we've, we've created our own data limiting set, where there are technologies that we could try and leverage or look into leveraging that would allow us to say like, look, I'm asking for a machine object, but really, the object I want back is a list of objects that have name, address and access password, right? Okay, just send me that data back. Don't send me back. Anything else? Okay, that's cool. But now you need a richer client that knows how to build those queries. And then you need a richer back end. That has to deal with that and back in

and you need a more savvy you need a more savvy this is to me the whole the whole idea of of what works I'd like to get to is this idea that, you know, if you're, there's two things in this one is you need a person designer who's trying to get information out to think through the fact that they might be able to make a more advanced query. Which is part of to me a trade off, right? You could build an all these things into an API, but you might not have somebody who's, you know, understanding to how to build the queries in a in a sophisticated enough way. Right. Good. Well, so

then now we're on to, you know, why is your why is the person using the API using the API? Right? Assuming assuming they have something they want to accomplish, right. And

this, to me is where from there like what what I'm hoping to drive to in these in these in the in the TechOps. Series is, I'm using I'm using the API because I am I have a machine, I'm automating against it. And so because there's there's traps, well, you're describing to me is this classic, as human, I'm okay to pull down a whole bunch of stuff and sort through it. And so I might code that into my query, and then let the client do it. And then you throw that into an automation. And now it's happening once a minute. And, and you have it's an API calls. So you have, you know, the server doesn't really know where that calls coming in from and you're being attacked, because somebody threw what looked like a really simple Python script, or they cut and pasted a curl command with a JQ after it into a into a CI pipeline. And now you're getting hammered, right? Because you didn't think, Okay, wait, before I retrieve a whole bunch of stuff, I better, you know, maybe I can filter it on the server side down to the one thing I want, or the smaller set or, you know, test for that that's that's sort of where I'm where I'm, I'm trying to go with these questions is, I'm consuming API. How do I make that durable? How do I make it smart and perform it? Just I think we're getting on the other side, right. One of the things I know for some API's we've we've built, we've put in rate limits, which can be dangerous. Is you have a you have a thought on how to do that gracefully, or how to do it safely on both both on the consumption side, or on the server side.

And just to give a concrete example, maybe like when GitHub decided the API limit on pulling images was much lower. And now everybody's lab environments can't pull container images.

Today's example was the Docker Hub.

Docker Hub. Yes, yeah. Docker Hub.

I think that's a hand. Sorry,

I have to go ahead and really just saying that this was an example of Go ahead.

I was gonna say, on the other hand, it is also promoted a much needed revolution in adaption of container proxies that had previously been largely unimplemented. Because people were like, because the effort was didn't have enough payoff. So yeah, that's

one of the things that we're like, I think about like, there was we and we've talked about this, like two years ago, Chris shorted posted an ISO in s3, several gigs of transfer, somebody, you know, holding it is part of a CI pipeline, and, you know, generated $1,000 Plus bill, overnight for it. It's, you know, it's a funny thing, like when you're consuming an API to be conscious of it. And also, being aware that you know, you might be and we've seen this happen with us where your response times degrade because you're being rate limited. And in your API might not, you might not have coded awareness of that into the into the platform as you as you do API consumption.

On the flip side, also, you have much more fine, fine grained control over what API endpoints you consume using the API. Direct them versus our CLI, which is opaque.

That is very true, you can get into serious trouble with whereas CLI just says, okay, sure you got it. And one of the things that we didn't we didn't talk through and is probably not even a 200 level concept, but is looking at the return headers. And, and actually, there's usually a wealth of data. When I'm writing automation, I almost never spend time looking at return headers, new x's, I will but like, there's actually a lot of information that that comes back with the data that we don't, I don't usually think about that. That's probably, to me even more advanced than what I'm thinking for this

content type encoding.

Cool. All right, that's the hour I'm gonna pick up. I'll make a note in the calendar. We'll pick this back up with the event processing on API's for next week, because I think that this, this sort of idea of, of how do we turn an API into something more scalable, and reduce some of the load on the back end is a good one from from people to code, and it definitely changes, you know, it can dramatically improve how your how automation works, and how resilient it is. So keep that topic for next week. Thank you, everybody. Thank you, because this is what I was hoping we would the detail the type of conversation that I was hoping we would be having with this. So thank you. Thank you. Appreciate your time.

Thanks.

Hope you enjoyed the conversation here, I'm really excited to see the TechOps series being concrete and specific, really addressing the knowledge gaps that I see in instructional material to teach you and share techniques to improve how we consume API's. Even if you're experienced developer or DevOps engineer, the things that we're talking about are helpful reminders, provide additional design context help you avoid pitfalls and traps, in building systems. And it's always important to us to have people join, bring your experience your questions, your observations into the sessions, you can find out our schedule, and more at the 2030 dot cloud. I will see you there. Thank you for listening to the cloud 2030 podcast is sponsored by rockin where we're really working to build a community of people who are using and thinking about infrastructure differently. Because that's what rec end does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. All part of building a better infrastructure operations community. Thank you