Jack's Meeting Notes

4:25PM May 28, 2025

Speakers:

Kian Kyars

Neel Kant

Jack Hopkins

Morten Nielsen

Keywords:

time constraint tasks

timing metrics

LLM call

Google Cloud Credits

SQL database

GitHub workflow

pre-commit hook

continuous deployment

action representations

observation representations

CoLab

development environment

testing run

foundation models

make file

Cool, welcome. Let me just try and set this up for all the meetings. I think you just invite it to the cool, nice, um, yeah. I mean, other than, other than this kind of, like housekeeping. I mean, what do you, what do you think is a good priority in terms of direction, like, what would be, what would be kind of a cool thing to work on,

asking me, okay, well, I can respond. I mean, over the last week, I was just kind of running evals to get more familiar. So I haven't done like much development work, but the one thing I was looking at as well as like time constraint tasks, just because I think the exploration exploitation was, like, quite compelling to test. Neel, have you like push stuff regarding that? Are you still like developing it?

I haven't actually worked on the time constraint tasks, but, yeah, so, like, pull request number 207 timing metrics and tracker that was meant to be, like, a prerequisite for time constraint tasks. So, like, the reason why I made that one was that so that, you know, now we have an object, something that's going to be keeping track of, like, time, and then you can see, like, okay, that LLM call, that actually produced the program, because sometimes, like, APIs are overloaded or whatever, right? So you can't just, like, naively be like, how long did it take for me to get a program? You gotta be like, okay, that call actually got me the program. It took 15 seconds or whatever, right? So this, if you build off of this branch, I mean, I don't know Jack if, if we need to be, we don't necessarily need to be tracking the time on the tools, because for time constraint tasks, is more about just like timing on the LM, yeah.

So dominates like 99% the latency.

Yeah. So then, like, if, maybe, if you, if you can, like, take a final look at this, and if you have any, if you request any changes, I can make those changes. But otherwise, you can maybe merge this in, and then from here, like, Kian, if you want to try time constraints asks, like, that'd be, I think that'd be a great first, like, cool thing to work on, because, yeah, like, yeah. And then if you have any questions on that, like, let me know. I can, I can, I can help with that, for sure. But, yeah, I think I can work on, like, gym interface. And then this would, this would be cool if you want to work on time constraint tasks. Yeah, that sounds great for me. All right. Nice. What

do you want to consider at some point, maybe in like, the next meeting or whatever, like an actual testing run on some of these new foundation models. So for example, Claude, for best. But I think it'll be really nice just to lay out all the different settings we want to test. So when we start doing it, we can each take some part of that setting and then just run it so we can get a full set of results. But everyone you know you, for example, Kian, you taking responsibility for time bound tasks. Okay, now I guess one question is, is it possible to use because we have a bunch of Google Google Cloud Credits, is it possible to use those credits for calling LLM similar to how you can do an AWS with bedrock? Because I know you can use AWS credits with bedrock to access arbitrary llms For eval, but we don't have any AWS credits, but if you I can always look into this, but maybe you don't have it off

top your mind. I don't have this off the top of my head, but I remember when I spun up like an instance to try and do some Docker stuff, like the Gemini assistant in GCP is pretty sharp. Probably knows the answer

that's fair. Yes, yes, let me Google that for you. The new, yeah, okay, I'll look into that. What else more than you were saying something before. What were you saying? Sorry, I just had in my mind. I mind database and factoring out the database, like, what if we could make it default to using an SQL like database locally and doing all the provisioning there? Because I feel that that would give us the best of both. So you don't actually have to configure anything. You get it for free. But if you actually want a proper database for real scale, real scale share between nodes, you need to provision one. Would that be a fair, a fair option? Yeah, I think that would be,

would be nice. It's mostly it would be nice when people are just like, they just want to spin something up to see what is, what is this project that you don't need to make like but then again, I'm pretty sure the CoLab will just solve all of those problems. So maybe it's not really worth the time to look into. I mean,

I think it probably is a good idea. And I say this because I think colab as a like a development environment, is pretty limited. You might only get like, a half an hour connection or something, which means running anything that, running anything that requires a database, you probably want to run it off collab, locally. Oh, yeah, it was mostly if,

if my only point is that it's that I just want to run it and see what it's all about and not have to set up a database. Then the 30 minutes is fine, yeah, then I can, like, explore the project, see what it's all about, figure out if I actually want to spend some time on it. And then if I do, well, yeah, set up an SQLite database or whatever else to actually get something really going. Okay, it's fine, I think, yeah.

So I know definitely in our readme right now, ask people to, like, provision their own database with a schema. But that's also really done. We should just either include, we should include an sql file which does that, and just automatically provision the database schema accordingly, and just try and cut down on the number of steps. Yeah.

Yeah. It might be. It might make sense to just create like a make file and then have the ability to just run like a make all and then, like, that way you'll get the SQL database and the run ends and whatever. So, yeah, getting it as close to a single command as possible would be cool.

Um, yeah, we use the main file.

Yeah, that's fair. Okay, yeah. I

think just like making things feel like familiar is generally like a good idea. People are like, Oh, make file, make all I've done that so many times like, this is probably going to work, and then it works.

Surprise motherfucker, something else is working.

Yeah, oh yeah. Today I had this as I was reading through trajectory runner and everything else. I was like, You know what I miss from like, my old company, just having, like, winter in a pre commit, and then the code is always looks beautiful. So that's something on my list, too, at least, like, when, when other people start, like, actually looking at the code. I want it to look, you know, clean. So when I get the chance, I was, I was going to use rough. I don't know if anybody else has any strong opinions, but rough is pretty great. So, yeah,

yeah, rough, sounds good. I mean, what would also be cool as well when you do that is set that up as a GitHub sorry, yeah, GitHub workflow to automatically lint on PRs while you set up a pre commit hook. Because I think as we get, as we build this out, getting closer and closer to continuous deployment and continuous release should be the goal. So we want to start putting all these steps into our automatic build pipeline, basically, along with verifying the test work, etc, so we know whenever we release something, you know, it's, it's of a certain standard.

So when you're saying, Make it part of like, it should be like an automated check, like it should just check it or, okay,

yeah, yeah. So basically, like, you can't, let's say, merge into main. Well, you can't tag a release unless these things have ticked okay, just, just for good housekeeping, because I suppose the more we can put into that phase, into the pre, into the PR or the pull request phase. The more we can, like, give people like we can. We can make it easier people to commit code, like, share code. We don't have to worry about code quality as much. Yeah, yeah, yeah. Is there anything else we need to talk about? Or should we, should we call it there and then reconvene in a week? I appreciate, yes, man, that I'm still, you know, I still, once I've got you the collab and you've managed to get it to work, we can then look again at trying to fix your local environment. Yes. Sorry.

Oh, I just said thank you. Oh, okay, cool. Yeah. No, no worries.

Yeah. And the same thing, I guess, with you, Morten, we need to, we need to figure this out. What's going wrong. Okay, Morten, if it's okay with you after this call, could, could you share your screen and we could just, like, actually walk through this? Because I have a feeling it's going to be a small issue, and I think hopefully we can get it fixed quickly. Yeah,

I have a have a good feeling that Neel was onto something in terms of, it's probably because of on Windows and the backslash, forward slash, just fuck something up. I will, I will try to look inside see if that fixes thing. This

is just the windows tax, yeah, intentional.

I will see if that fixes the

problem. Okay, cool. Well, let me know, and we can hop on a ball and try and try and work it out. And I guess, Morten, did you have any more thoughts about this, like the, I guess, the the action representations and the observation representations and being able to make them aligned with the speed run data?

I've looked into it, but haven't gotten to anything useful, okay, yet, but I will let you know. Yeah, it is. It is going to be a bit challenging to

come up with meaningful progress on this when you can't actually get anything running so I do appreciate that cool. Well, I've gone on much more to add already. I mean, unless anyone else has anything they want to talk want

to talk about,

yeah, and so Kian, yeah, please just let me know if you have any questions or if you want to chat. And just grab some time on my my calendar. And yeah, I'd love to talk to you more about this time constraint task thing. I'm happy that you find it interesting too. But yeah, I don't really have anything else to talk about right

now. Okay, cool, fantastic. Well, lovely to see you all. I expect something later on today with the CoLab for you to try out, and let's catch up next week,

I guess. Yep, Next, all right.