20230831 CoDev with LLMs

5:47PM Sep 8, 2023

Speakers:

Rob Hirschfeld

Rich Miller

Keywords:

code

vmware

ai

models

trained

code base

write

talking

technical debt

customers

tools

people

software

question

llm

incorporated

programmer

coding

systems

work

Hello, I'm Rob Hirschfeld CEO and co founder of rockin and your host for the club 2030 podcast. In this episode, we dive into whether large language models can effectively supplant developers and DevOps engineers is a question we've been asking before. But this time, we actually go deeper into how the models can be trained, and can if they can be trusted, what is the upside the positive use case in which we really turn MLMs into the type of weighing person experts that they have the potential to be versus simply something that turns up the volume on how fast you generate code, we talked about the downsides of that type of model. And the potential positive upsides of how powerful using these tools as assistants could emerge to be and could emerge, as I think, a key aspect here, where we're thinking about what is the potential to really transform and improve the outcome for work, rather than just being a faster coder? So I know, we will dive into all these questions, and I know, you will enjoy it.

I want to ask you, Rob, about going back to your article. And the one that you you've incorporated in in the schedule in the in that I remember you writing and then kind of reporting on

that I've been working towards. Yep.

Yeah, the blucon stuff. And so I, I went back and reread the, you know, the SK boost and, and, and and Adrian's post, and I didn't get the sense from Adrian that he was calling out technical debt as a as a threat. So that I want to make sure I understood what you were referring to when you call and you pointed to

him. And this keep going Sorry, I shouldn't interrupt you. And then the other

the other side on, you know, the SK ventures stuff. I think we all had a Well, I had I had some reservations about the conversation, I remember talking to you just just before you went and I don't know if there was any, any clarification there. But I really would like to understand one point in particular. And that is the assumption that AI MLMs in being incorporated into DevOps is about working faster, as opposed to quality of the quality of the operations.

That's the key. This is I think, what's what's unclear and I'm trying to figure out, okay. And I think there's a lot of enthusiasm like the SK ventures post has that has a lot of enthusiasm for I'll even add the adjective breathless enthusiasm, because to their, to their credit, and he they're trying to do the right thing. And they just rebranded glue con to SW two con, which I think is an interesting, interesting conceit with this, where there's an element of being very enthusiastic that something has changed in the market, the way we we do the work. And I think this this, you know, like with the printing press, going from handwritten copying to wide replication, and so he they're trying to look at that, that type of transformative technology. So there's an there's an interesting question there. I was just on a call yesterday, where we were talking about what are the picks in new strip all this away? What are the changes to picks and shovels like the core tech that that were the core things that you need to make this stuff go faster? What and we had a conversation a couple of weeks ago, where we recategorize technical debt as maintenance, future maintenance. And so one of the things I think, was not well was signed in that article. And I think in my conversations in the past, as I looked at, I've been using technical debt to imply future maintenance of code. And so So I make y'all corrected this last time I really liked. It's been having me think about it, that anytime you build code, I've always said, you, you're creating technical debt, what you're really creating is, is a maintenance obligation for that code. And I think those are often conflated in in part in industry parlance, but I think are actually worth noting, because technical debt really should mean, I did something expeditiously knowing it wasn't wasn't well, wasn't done to standards so that I could accomplish something

it was expedient expedient.

And so, the challenge that I see is, I do think that there is a huge push in market for people to be expedient in their work. And I feel like there's a broader challenge that we're not asking people to do work that has long term durability, we're we're not using we don't have a good way to measure it. We don't have a good way to test it. We certainly don't incent people to do that. And one of the interesting things about the AI craze at the moment where the way people are using it is they are because all that pressure to work faster and not work together. I think these AI tools, by design are helping people work faster. They're not they're not helping people work together. And that's my concern. Weirdly, they could, but

the collaborative, so you're pointing out collaborative work. So if there was a coordinated, coordinated DevOps, let's call it

there was a really good example in the session actually at VMware, that Diana and I were both in where they were demonstrating romantic didn't actually demonstrate this, they just talked about it. But they were the VMware as a test used its own code base, to train a model to make Krump to make AI chatbot support for VMware engineers using in VMware is code based. Does that make sense? So So I am a VMware programmer, it's it's see it's incredibly complex, it's got 20 years of institutional knowledge with very strict controls and governance and things like that. Chat GPT could help you write C, it cannot help you write VMware C.

And the reason

because it VMware, they have style, they have patterns, they have institutional knowledge, they have variable naming convention, they have all this stuff that goes into how they're doing it. Now, the danger here is that check GPT could get you very close or the appearance of close because it'll help you write C code, there's no doubt. But it won't help you write VMware C code, because it doesn't know anything about VMware C code. And since it has

a, it hasn't been trained, or it hasn't doesn't have access to a,

a, it has

a repository, and it's specific.

Correct. And there's you know, as we all know, with these LMS is it's not actually following any rules. So it's not like it's it's ingesting VMware has rules for coding. It's just learning from the body of code that VMware has established. So what VMware did is they they had a they they trained a model to include all their code base, and then they let their their programmers use that prompting tool. I'm trying to it's like code. There's a word code guardian or code, like a co pilot, it's a code pilot. Yeah, code pilot is the is the first people

and with what result,

um, this is what's weird. Their their test was on if people liked it or not, and felt that it was helpful. It wasn't particularly qualitative, quantitative. It was sort of qualitative. And yeah,

like sounds like a brand new. So there probably isn't any, any way to quantify it. It syntax.

And what what I, what I was sad they didn't have is more concrete examples on, you know, here's here's they did have something where they showed this is the prompt that you would have gotten if you were just using chat GPT on our library. Know if you asked for a library, they did this was a concrete example. If you asked for the prompt, to write a routine without being using the VMware train data, you got a reasonable answer, but it wasn't particularly VM flavored. If you've made the same prompt from it after it had been trained with VMware as data, it actually used VMware tears, naming convention and parameters. And like it did for that for that engineering team, it did a much better job on how things were how things were going, I don't know that it was smart enough to be like, Oh, this routine looks like these other routines, maybe you should consider looking at this library or refactoring for this or something like that. That's where I think we want to go with it. But that but the ability to train on your own code base and your own rule set is actually compelling.

And this is what a lot of folks are working on right now. And then if if I had to plan to do just kind of a rough take on what I've been hearing and seeing and people with whom I've been speaking, because a lot of attention to the incorporation of LLM generative AI that is specific to the enterprise to the company, that's, you know, that's developing it, that embodies their own policies, their own standards, their own approach to programming, even to the point where there be one of the things they're clearly doing is taking extensive looks at their API documentation, both the formal documentation and the informal documentation that they write ups, the, the and, and like their Slack channels and

stuff. Okay? Exactly.

And they're throwing that into into these what they're calling our AGS, which I can never remember what it stands for, but what it basically is a separate, isolated language model, and database combination that is trained on specifically, the techniques and the band and approaches being used by the by the enterprise. And the general purpose foundation models are being used as an interlocutor it's being they're kind of being used as the the front end the UI for for people, and the source of knowledge and the source of real real value in these things comes from these kind of sequestered, private, you know, smaller, smaller language models, smaller databases, quite focused. And that is more Well, first of all, it's faster, it's less expensive than actually going and trying to retrain one of those foundational models to incorporate it, let's just which is out of out of the question for most people. So I've had two conversations in the last week with big companies in notable software companies and SAS one for software company, one's a SaaS company, and they're, they're really, really going at it, especially in this

one, as as in trying to help their teams get do a better job of working within the code base they've got. That's, that's where that I think that's very exciting because the ROI on those improvements is, is bigger, you're, you're not expanding your code base, you're,

hopefully you're improving, this is not working faster. This is improving the quality of what's what's being developed. And so,

so much I'm sorry to interrupt you rich. But there's many topics that have been discussed here. One is the longevity of software being questioned. I write code, I'm trying to be expedient about doing it. I'm not necessarily looking at the quality I'm trying to be expedient, right? That was the original. And the question of longevity, and how long that piece of code is supposed to endure, is not being measured in any way. Right, because it's part of a product. That's number one. Number two, on these iterations, small language model iterations that you're referring to now. If you get a piece of code from a software vendor, it's like a cake mix, how fast you beaded, how exactly how much liquid you add to the milliliter, the calibration of your oven, all of that sort of stuff. None of it is standardized. So what you're talking about with VMware, and using the small language model to improve the quality of the code really comes down to which engineer or which programmer wrote what line of code in collaboration or in a vacuum? And how was that actually implemented in the customer side, which could be chocolate, vanilla strawberry complex, you name the flavor, because all you got was a piece of software that's like a cake mix. In particular, companies like SAP are gonna have huge issues by doing this, because suddenly, as they're rewritten code, gets pushed back on their customer who has for 40 years bastardize their code at every implementation point, because no two companies are the same. No, two visions of what the right solution should be. Absolutely supports the argument that you are creating technical debt every which way. That's two points. The third point is in the process of doing this, how are they not breaking their own moat?

breaking their own moat? Yeah, interesting. Who's the? Who's the day in this in this?

Yeah. Okay.

Who's the day event ISV,

whether it's VMware or stab anybody, right there, opening the door. Because if you think about it, six months from now, you just rewrote code using a small language model to quote unquote, improve its quality, quality, you push that out as a patch, or fix to a customer whose software system suddenly goes as a result, because that particular piece of code, no longer functions in their environment, the way they've either a bastardized it or be originally got it and never updated it.

And how is that different from when an engineer at VMware writes a piece of code without the benefit, but needs to do a patch and it gets to deliver it out to customer? Why is the quality well worth in

the idea that the VMware engineer this is right now they're doing it mostly in blind are adding a whole bunch of time and tests and things like that overtop the idea that you might actually have a code assistant who's going to review it against the standards that they have in other across their code base,

the rate at which generate and generate tests in general. Right. And, you know, the I guess my point here is any I mean, it seems great, but I'm not following some parts of your argument there. Jaren.

Well, my argument is, yes, I agree that you could be producing a certain amount of technical debt. But my bigger concern is in terms of the proprietary aspects of certain code cannot really be considered proprietary. If you're using a co pilot to rewrite that code and change it in a way that then makes it non functional for your customer base. Just don't have set, like you're familiar with SAP, you know how particular the each install is different. There's no over periods of every,

every every one of them is is customize it to that of just using those.

But that's not necessarily creating business advantage here. Right,

I guess I'm trying to understand if What if what a, an SAP engineer writes and designs then gets incorporated into the district. Next distribution is

meets

the appropriate internal standards, it's maintainable, it's tested, probably, with some, hopefully, it's actually undergone more tests and more rigorous or at least important and relevant testing, before getting pushed out. Because the, the premise here is, this patch is critical. These customers, this customer, or these customers are very important. Get this puppy out there fix the problem. If that solution is, in fact, better in the sense that it doesn't break something else. It's it meaning it is it creates

a piece of software underpinning that actually remains in place and unaltered for a longer period of time than a quick patch. Why is this? Breaking the moat? I didn't because

Okay, so when I, when I say breaking the moat, what I'm referring to is you now went and created something that you're pushing out, that is no longer usable, let's say for certain customers, it may be perfectly designed for, you know, the top 1% of your customers, but the second percentage, no longer funk, it no longer functions for that group. Why they come to you? And they say, you, you know, this is unusable, this renders our systems unusable. But

why are you assuming that when they use the AI is going to be so specific or so, you know, targeted towards one customer or a small group of customers? If in fact, the principles that are being adopted here he are. Don't break systems for the great majority of our customers fix it seems that it's fixed for whoever's you know, the the loudest squeakiest wheel, but do so in such a way that you do not have a that kind of repercussions. There was an assumption in there that you that I didn't quite follow that. It's not okay.

You okay, perhaps I misspoke. But really what I'm trying to get to is there's patch and fix, or new feature function based on customer demand. And then there's using an LLM or a small language model to go back through your code to, quote unquote, improve its quality based on some set of values, right, you're doing it for a reason? Is it because you were working on expediency in a vacuum and not in a collaborative way? Originally? Was it? Was it the level of skill of the programmer who actually did the coding or the engineer who did the coding, whatever those whatever those terms and conditions were, you're now going back to make the code better or make it have a longer shelf life. But in the process of doing that, ie when you push it out, you realize that this is not a fix or an update. This is more like a re architecture of the code. It's like when they went from on premise to cloud and nobody could do it. Right? Still a majority of SAP customers can't move their stuff to the cloud. Period full stop.

This I see Oh, I still don't see. Right. I mean, my concern would be that you've got developers who are using the general models and then showing up and thinking that they're solving problems. But it's their square peg in a whole bunch of stuff. What Rich's talking about what I, what I what I think is necessary, is that we're actually doing the opposite. We're bringing in expert guidance, to evaluate the current code and then give you it's like having, you know, a skilled senior developer looking at your code and helping you do that work, they their ability to improve their codebase over time should accelerate dramatically on the assumption that they're actually there. I think there's a safe assumption here that if that working code is better to use as a screening thing, then then what's in the models, I would love to see. And this isn't the case, new foundational models for development trained on curated Best Practices coding. Sorry, that's maybe maybe there's maybe there's an evolution in here, where we're looking at specialized foundational models that are used, trained on more vetted and that's a small model.

And in the last two weeks, llama two has it has released or has announced the release of code llama, which is been trained on it is specifically for code, it has been trained on quite a large number of heavily use languages. And they then announced a specific a specialized version of code llama, for Python. And these are being developed and released in models of the kind of 7 billion parameter 13 and 30 billion parameters such that in the 7 billion category, and individual literally running the this small language model on a desktop is feasible. And they are coding assistant in the way that Rob has just described. That is, this is an experienced architect and developer assisting programmer in the construction of whatever their their objective is in writing it. And that strikes me as kind of the right way to think about it and the right way to go about it. And yeah, it will probably take some of these companies much longer than a month or two months, to Gen these things up. But in point of fact, there are a number of efforts, both from the hyper scalars and from the more open source folks to literally address how you build these smaller language models. And what in my mind is most important, how they work with other kinds of databases besides just these auto regressive databases. And that means they're working with auto regressive or generative AI, generative AI uses auto regression. That's sorry for the thank you for the speak. Sorry for the geekspeak it's the the point here is there not a number of them have gone to GPT chat GPT they've been looking at how to specialize the the, basically the how

to incorporate their own specialized versions and their own rules and regs for for putting software in place.

And they're finding it still not sufficiently valuable, which is why they're also investigating and there's some interesting work going on and how you make one of these large small language models work with graph databases. with some other kinds of other kinds of data databases, and with some really interesting and really, very encouraging results. So so yeah.

So then that's a I mean, it's a very valid set of points. I guess I'm looking at it from the point of view of I read something, and I'm trying to put my finger my brain, on the finger of it to say, this is what I was reading. And this is why it became of concern. Because Oh, yes, I do remember what it was. It was either that it was in a sub stack, it was talking about the article was written by a coder and talking about the idea that if a co pilot went back and looked at his code, with five years experience versus 10 years experience, how many is sort of quality issues were arrives, and how the coding itself how his own coding had to evolve to be less messy, more articulate, smoother, all of those kinds of words wrapped around it. So when we're talking about this, I said, and I hear what you're saying. And I say to myself, like, it's the it's going to end up being multi flavors of the same thing, based on the evolution, if you will, of the code base, and how you're going to, like, what's the advantage of doing this? Other than you have a quality issue, and you're you haven't measured the quality of your code, or the longevity of it? How are you never going to use a small language model to do that? Or to write code faster? To write code faster? I completely agree with as long as the models are not hallucinating. But on the other side of it, what is it really like? Is it raising more issues that it's satisfying, I guess, is the right word. Which you're telling me that it's not reasonable, though?

I'm just not. Quite frankly, I don't think anybody has been doing this long enough to know what issues it's raising, what long term issues, but it what they are incorporating is review and what they are incorporating is separate, independently developed. API's that are literally doing QA and long longitudinal analysis of what's getting generated, looking at trouble tickets generated when this is being done with SAS, in particular services, in kind of ITSM and ITSM services from that are cloud based. And that is the whole purpose, you know, to actually look at the various means by which you establish what is better? And it's not just the speed, it's, you know, for some value of quality, you know, does it fix the problem? Does the problem recur elsewhere? Does it cause, you know, a network effect? Could a product have problems, all of those things are going into the analysis? And I think that that's a rational and a reasonable way to take any new technology and apply it to something like DevOps. And so, my, I guess, my question areas is the is the question one of people are just scribing far too much to the tools to the LLM is and rushing willy nilly into throwing them at the at the at the development process and that the DevOps processes and just being being kind of silly or naive. And their belief in these things, or are we also now looking at people who are taking a what I would consider a much more rational and a long term view of this and in closing on its use the things Same thing you would do with any new technique, new new individual new team new approach to, to programming a new programming language, you put it into practice, you sequester it, you watch it carefully. And you look at the, the the impact. You know, I, I grant you that, you know, the notion here this, this will get into our, you know, the the, you know, the depth of expertise, our issue, and that is suddenly everybody's an expert VMware product.

You're very that's a very way to Yes, that's right. Is everybody at now an expert?

Huh? No, not why they are definitely not. But if I can improve across the board, anybody who aspires to be a VMware developer, has gone through at least some basis on which to, you know, understand how to build these code, how to build these ops? Yeah, um, that's,

that's where it gets very, very interesting. From that perspective. I, this is the I think, and it's funny, in some ways, the infrastructure side is even worse, because infrastructure is not just the code in front of you, you actually have to understand the target. That's what people forget, with all the automation that people are building, it's like the automation controls something else. It's not in itself. Anything. Exactly.

So you know, you're, you're leading my brain down the road that says, in short order, once they go through all this analysis, will it be a i That red sun, next iteration of a code base? For a software vendor? It VMware trains, the model creates the copilot as a scribe,

I don't I don't? I don't know at what point you turn it over completely to an AI to do an automaton,

I would actually say that it's your elders writing it, not the AI at that at that point. But it's a funny, it's a funny question from that perspective. Because what you're really doing is you're saying the fact that we hired good experts in the past and believe that this worked and time tested, is the is the way that we're getting good time tested code in the future. Just like a lot of the other challenges we're gonna have with AI from that perspective,

I'm old enough to remember and software engineers or developers, coders, programmers, as they were known. Come complaining about high level language compilers, you know, each other and efficient, they're inefficient, they don't do the right thing all the time. You know, let me let me get at those those, you know, tight loops here. Let me let me get rid of the if statements. Let me let me, let me tweak this puppy. I don't know, I understand the concern. But I think there is like, almost anything else, if you find something that is that good at improving the quality of a larger number of professionals to do their job, then I'm all for it. But it's, there's a rational and there is a there is a way of taking any any tools, any of these things and you know, adopting them and adapting them in a rational fashion. And we have had what's the Greenspan irrational exuberance around around AI, and you know, I go on, and I've played with and seen some of the stuff that shows up on YouTubes, for example, and their their toys, they don't work they bury or they barely work, and they break. I mean, it's like, no, I'm sorry, I'm not interesting my systems to these things, not for not the way these guys are our grandkids. head out.

So no, I mean, you know, Diana, you wanted to say something. So I'm not going to take more than two seconds to finish off of rich, which is think about the fact that you're not only training the AI to make the code better, but you're teaching it all of the code base that existed to begin with, therefore, that institutional knowledge and expertise is now contained within a tool to things differently than the original elder that wrote it, but may have an expedience goal or agenda may have a collaborative goal or agenda, any kind of, you know, verbiage you want to put around that, what so why wouldn't you trust it?

I would, I would expect and this is what my hope would be with some of the AI would come back and say, Hey, there's code in this other places a code base that could be adapted to use what you want. And you could actually go and, like, improve, like paydown technique, there's amazing things that we can do, if we're using these systems to analyze the code that we're doing, that we have, right, it has to have the knowledge of what you're doing the context of what you're doing. And that's where that, to me is where the, the original, and I'm still doing talks about this and trying to help people see how much how careful they have to be. If you're using these generic models that don't have the expertise and knowledge in your system, they're going to pull you down practices that are not, you know, considered for your, your environment, they're going to do actually exactly what the way I'm gonna see if I can term something. It's an AI Seagull from this perspective, right, if you're asking if you're asking these tools to work and without knowledge of what your environment is, it's an AI seagull. Right? Diane did the analogy because you didn't get you didn't. Here's the seed the seed, the seagull is there's bosses who are described as seagulls or experts who are described as seagulls. They fly in, they look around, they poop on everything and then leave. Oh, that's the that's the you're running the risk of it being an AIC golf. Where

as we kind of come close to the end here. Now, given the amount of time you spent thinking about it, speaking about it, so forth, where do you come down on the use of AI in DevOps today?

I think it's incredibly risky today. Because because the infrastructure behind it is it has no context to make judgments about what's really going on. And I don't think and I'm actually in a couple of weeks, I'm going to be doing training to help people think about this better. Don't people have to be taught to prompt to give people enough context about what you're building, to do a good job doing that building. What we're, it's funny, what we're really talking about is using the same models, but having them be prompted and trained on your on a more more tightly defined thing. And so and, and assessed and assessed on the on the output too. And so, I am incredibly optimistic that these tools can be used to really highlight ways in which people can collaborate better extend and use the code, they have better work within the practices that they have. I'm incredibly optimistic because I think our biggest problem when we look at that use of our system, which is focused on integration, is they don't see things that are already the tools already available. And what what happens in DevOps, is, we have a tendency to not look for the right tool for the job, but to keep fashioning amount of bubble gum and duct tape. And so what our biggest challenges is not whether the system can do the things that our customers need need them to do. It's actually can we help them find the ways that already does it, and teach them to use the current rather than build a new Ansible script or a bash script or a new TerraForm plan that does something we already do or that we already got 85% of the way there and if they invested their time, they would have a reusable tool for the next time and it you know, be like training. I love the analogies to the trades. Be like training somebody being an electrician, who was like I don't want to use the stand You're wiring, and you know, I'm gonna, I'm going to build my I'm going to braid my wire. And who cares about the color coding conventions that you have, or I'm just going to do it. Because they can't be bothered to learn what's how things are done.

And even if they follow the color coding, and so forth, the idea of inventing or creating a new piece of code, when in fact, there probably is something that's already in existence or functionality in the base functionality that can be can be utilized to solve the problem. expediency says, it's just like going and searching for stuff in a data lake. You know, it takes me it takes me too long to find the stuff I need, I'll just generate it,

as

well. And the problem with the history and the training libraries we have is we have so few standards on the DevOps automation side coding, it's actually much higher, repetitive, repetitive desk, in the DevOps training stuff. People solve the same problem, so many different ways that, you know, you you, it's very hard to know what you should do part of it, I mean, this, and we are at time i need to jump. Part of our benefit is just coming in with a way to be more opinionated, even, not even having opinions. It's just like, hey, we're gonna help you enforce opinions. It helps to have some start, but your do have to run. And we'll see y'all next week. Thank you.

Bye, bye.

Wow, what a thoughtful conversation. This is a place where I really liked that we are looking at how the technology could be shaped how the positive outcomes of these technologies are being examined. And we're thinking through how we want the systems to behave, and what could be a positive impact in the industry and really transformative in so many people's lives and daily jobs. If this is interesting to you, and I if you're listening now it probably was that you would come and join us be part of these conversations, the more insights we have, the more questions, thoughts, the better. And we would love to have you be part of the 2030 dot cloud conversation, you can find out our full schedule, be part of our book club, we're talking about depth of expertise next. Find out that and more at the 2030 dot cloud. Look for our agenda link. I'll see you there. Thank you for listening to the cloud 2030 podcast. It is sponsored by rockin where we are really working to build a community of people who are using and thinking about infrastructure differently. Because that's what rec end does. We write software that helps put operators back in control of distributed infrastructure, really thinking about how things should be run, and building software that makes that possible. If this is interesting to you, please try out the software. We would love to get your opinion and hear how you think this could transform infrastructure more broadly or just keep enjoying the podcast and coming to the discussions and laying out your thoughts and how you see the future unfolding. All part of building a better infrastructure operations community. Thank you