uses a different language model that's more open and in somehow they're able to like feed it the training data and I believe they were feeding it just like their documentation from the website and even things like the content on their forum and maybe their content on their discord channel and even like just shit Gregory has said on Twitter or like in in transcripts from like more public calls and stuff just anytime you're talking about regen network, they wanted it to know about it. And then yeah, kind of the bar you could ask it a question and it would give you back mostly correct answer. So but they had to spend a lot of time training it with all their very specific stuff, but I'm trying to find if there's anything public about it, other than on the discord but not seeing anything.
Yeah. I think that kind of speaks to like, the long term. It really becomes about training data, where it's coming from, like, who controls it, who's generating it, you know, what we trust, all that kind of stuff. It's something as like open source communities we need to start thinking about, you know, is the importance of training data. I don't know.
Well, how about how are people using it now? I kind of like put in and feel free if you just want to keep rolling over here answering questions and looking at stuff. But I mean, I put in an example for me, that I know in my work is like we have a lot of long discussions about software specifications that are pretty, like complicated, grammatically, you know, but it is really good at summarization. So that's one of the things that I was wondering is like, you know, record using otter, you know, then it knows the different parties, right? And you put that in a chat GPT and see if you can get draft software specifications and to do lists and stuff. But curious if anybody else has other processes, they feel like they they are would want to use it
I tested it out as like, you know, I have some questions that normally I would Google or like did Google and like Monday around StackOverflow for you know, like, given how to how are you supposed to do this certain thing with a certain library. And it very confidently gave me code that looked reasonable. But was like that nonsense or like just wasn't functional or like didn't it like reference stuff that didn't exist? You know, which is you know, a theme. So like, but you know, if it's something that is probably already on StackOverflow it couldn't make it. You know, you don't have to search as deeply maybe, but then you have to spend more time validating it because you don't have any of the signifiers you would have like on StackOverflow or something you would have somebody like commenting on it like oh no this doesn't work, or it would only have a few up votes, or like, you know, that sort of thing. So, I don't know, it's kind of a mixed bag there.
So writing code, like maybe not, maybe not great.
I mean, yeah, I don't I mean, it's like, if you need it to do like a very basic task that has already been, you know, a bunch of people have done then, and maybe it needs to, like, tweak it somehow, like, yeah, I can do that. But like you just need to be able to validate it. So like, it depends on like, if it gives you nonsense, and you don't realize that it's nonsense. Or you it might take you a while to figure out that it is nonsense. But this sort of transformation thing like we talked about this previously, but like, kind of like your example, I think if we were I mean, I don't know how people are thinking about using it, but like I could imagine if you wanted to use it for surveys, I like this idea of having like a schema like the thing that you're asking a question against is like something that is a thing against which you can validate its response. Like I think that's like a good model for interacting with it.
I guess my perspective that I keep bringing up and that Greg and I talked about for like six hours at pozza is that if we had that schema already, I don't want to give it to get up to on Greg to give it to me. Yeah, again, I have an issue with it produces a good enough result to get through my basic like sniff test, but not a good enough result to trust or learn anything from and so I would just be spending all my time trying to figure out what it got wrong. When I could do a better job having the information myself and figure out what I'm getting wrong. And then like serving our end goal of visibility and trust to users and like if we had the schemas that's the thing that I think is the sticking point not transforming the information we get from people into a more compressed and maybe not correct, like condensed version of whatever that is. Not to be a hater, but I just I'm not saying that it would make any part of my job less complicated. I think it would only make it more complicated.
I had a similar thought about just like, trying to like, learn from learn from the machine learning just like what direction to go in, like, perhaps like a phase of discovery. You know, it's like just, you know, if you if you're able to find out that like, oh, you know, it's it's actually using this like slightly modified version of the schema or it's using some slightly different validation tool to verify the schemas or to generate them. You know, that it can point you to a tool that like doesn't require any AI, you know, an actual mutation that you can use and the AI doesn't have to like show up in the actual results of whatever you're doing.
Sorry, the thing perhaps, is that in a way, it's wrongly labeled because it's not an intelligence. It's a language model. And I believe a good metaphor is it's like a toddler like four years old toddler, it knows the language but it mostly does not know anything it's speaking about. So that's why it produces those kinds of aberrations. Those models will not think they just relate graphs of information together and go from similar to similar and that can bring you to all kinds of errors and when it fails it can have critical failures without realizing it. The same way that for example, fake information gets distributed through Twitter by real human beings. Yeah. Yeah. It's never gonna be an intelligence that could be a century away from us. Like making a machine think is way more complex than making it speak.
Yeah, but I think what it does do well, and this is where I find nuance with what Vic is saying is, I do think we should always look at technology and ask how is this making us better? How is this making our intuition more effective right now? How is it replacing our intuition? That for me, it's really important. So I think having experts in the process is critical, but I do think, like, I'll give you an example. You know, we're working with the conservation districts with this project. 13 through the USDA, and they're trying to they have 3000 conservation districts who are supposed to report up feedback right, up to state and national levels, and there's just like they're not it's very difficult to take 3000 different groups, of which contain you know, many individuals each, and pull that feedback up in a way that doesn't become highly technical or organized to those people and doesn't become an annoying form that nobody wants to fill out never does. And then summarizing that so, I think in that case, summarization capacity, right from a discussion is helpful, in conjunction with an expert summarization, right that you kind of have both at the same time. I don't know that's my, that's.
I also kind of think, and this is maybe more just philosophically but whether or not it's intelligence or not, it's kind of irrelevant to to use as a tool like it doesn't have to think in the way a human thinks to still be very effective and good as a tool. It doesn't have to be alive. And I'm shocked that a model that was trained just on language processing is as good at coding as it is because yeah, if you want it to do something really complicated, for sure, there'll be errors, but it can still do quite a lot. Like I'm a newer coder. I've only been coding since grad school. And it's definitely better than me for sure. I like have to check it sometimes and fix things and so you need to be able to debug but because it's been trained on all StackOverflow it's like pretty good at existing questions. But obviously when you connect multiple things, I don't think it has very like good long term abilities in conducting complicated long problems. And anything new I don't think it can extrapolate to new ideas at all.
Well, in a way that's because scolding is your syntax. It also go the spirit otological mathematics. It's not a proper language, because it has no real semantics. But again, it can as we'll say, it can have drastic failures. And I would say it's got at the level of scripting, which is what you'll find most often in StackOverflow, you'll find a step of coding which will solve a problem, but typically not a, you know, the ability to think systemically. So you typically will find a an imperative procedure to get an answer versus, let's say a class or an object or something with an interface right so that's that I believe, it's great like, as you said, it's like an already great service the same like it can summarize stuff good send you to the state of the art, and they could give you like a hint on where to search perhaps that's that sounds like a lot.
I'm just gonna respond to some of these. Can it be used to summarize data into graphs? Yeah, I think it can could it take like hundreds of scripts to generate use cases? The problem is you're limited in the data you can put in, maybe that's an area where you should look at something like series or maybe you should, somebody needs to evaluate the paid version or something. Terms of inputting more information because that is a serious limitation like 4000 tokens or whatever. It's not actually that much.
Yeah, I think open API might have a training feature.
Yeah. The eat skill source right. I never checked but I assumed it
is yeah. Yeah, which is funny because it's called Open
Yeah, I don't have other like specific things. If people have any interesting ideas that they want to bring up or like the concerns or maybe links to alternatives. Kind of kind of open discussion for the last couple of minutes. This has been great though. I think a lot of good questions. And so yeah, if you have a question something you want to get out of the group, now's the time to ask.
Or maybe something we could do together with it. I mean, we are all on the same field. So an example would be if we do have, if it does have a training, like component, you know, compiling sort of core information about some of the platforms that we have, so that it has that right to reference and an implementation that maybe we would use or ontology lists for example, or better know things that could help our community.
Any thoughts, any other thoughts? Any, any other thoughts? We don't have to stay if people feel like they got what they needed out of it. But if there's any other sort of thoughts or discussions or ideas
I feel like aI note taking tools are can be nice. So like if you have meetings that you want to make that transcripted that can be useful. So like, just a thought coming out of this.
I'm recording this with otter right now. That's I'm gonna send everybody out like a summary of this meeting using chat GPT based on the auto recording and see what happens. Yeah,
yep. But yeah, just in general for like, whatever open stuff like trying to make engaging with the community ecosystem more accessible. That seems like a useful aspect.
I would note that one of our partners who, who has written maybe, I don't know about a third of what's out on the internet about integrated landscape management. When they then asked questions about integrated landscape management and as Chad GPT to like, create information or summarize integrated landscape management, they felt like it like they got sentences that they had written so that I thought that was kind of interesting result that maybe it depends on what's already out there that might have been part of the training set. In terms of what you get back.
My other question, what about this sort of a challenge question, like, you know, we collaborate a lot with an open team and go, and a lot of my time at least, is spent just trying to figure out what everybody else is doing. Like imagine a world in which, you know, we all put our meetings into sort of some common training set so that if Steve wanted to be like what's RSI up to recently? You know, Chad GPT could just like give him a three sentence summary. Like, is that a useless world? Is that a useful world? You know, has that like? Yeah, there's just a lot we could do with this if we wanted to. It's just determining what
Yeah, it does feel like if you could control training sets to sort of your organization's output, and then access your organizational knowledge. That seems pretty powerful.
It does. Yeah.
It being goes Thoris doesn't make it illegal, like, most dangerous, like, I feel like it's giving too much somebody's giving you too little in a way. Somehow, it's like a step further to what happens with let's say smartphones for example, like they will have access to everything though, and you can tweak the code for example, or know what else is there, training it, giving it like, BSS, etc
I was wondering, does anyone here have any like contacts with people who are more like in in volved with the kind of AI like science or especially interested in like some of the ethics and because I think a lot of this stuff comes back to that again, like, you know, who's whose data is being used to train who's, you know, who's creating the models, how are the models accessible to different people?
You mean people in the industry?
Yeah, but I guess like, you know, with kind of a bent towards the kind of, I think values of, I guess go and thinking of in particular, but just I think most people here can make that assumption.
Have a similar question, because I've been connecting the academic side for like ethics and AI, and I'm interested. I've been asking at some of the places where folks like in Google and Facebook so I'm interested if maybe other folks have others that you maybe I don't know if we if there is interest in starting like a separate discussion through go because there's a lot that I have been pulling on ethics. I'm interested in what you all have already kind of seen in our thinking because there are multiple angles. And but yeah, very interested if folks have interest or connections.
Yeah, I would post it. I'm sure that we could have discussions on the forum about it. I think there was also a highlight discussion about it. With some good thoughts back and forth.
At the Linux Foundation conference, a while ago, I met the founder of a AI ethics company that I haven't really kept up with, but it's making me think of their work. And then it's called ethical intelligence for now. I'd have to check.
I think the big thing that's coming up for me is kind of what you said Steve, like, at least on our end, like for survey stack, like this potentially helps lessen the primary burden that we experience or that users experience which is data input. That is like the single biggest problem. Everything else is secondary to that. It potentially, I should say. But I feel like it's all about the quality of the data set. It would be really cool like as a community, NGO, or open team or whatever, individual organizations and then summed up to have our own training sets. You know, that that we have some level of sharing. I just like that, that seems like the, you know, in the same way that we're trying to connect and collaborate and share around software and data standards like that seems kind of like the whoops. In the I don't know where I cut out but that seems like the next kind of thing we should expect is sharing around training data. So we have common languages through AI tools like this.
Yeah. That would be great. And yeah, I can imagine a future world where organizational cooperation involves sharing training data around shared topics of interest. To as as sort of a knowledge exchange cooperation thing.
You could even imagine an extreme case where sharing of training data is effectively like a solution to API integrations, depending on how smart things get you know, it's not that far away. Like, if it's a question of like, Yeah, I mean, it's really not that far away from that. So.
Cool. Well, any other thoughts? Conversations, questions, people want to ask about this.
I have ongoing thoughts specifically about our application in like agriculture communities as a set of individuals who aren't as well versed or perhaps, like excited about tech tools, and how that impacts gathering feedback. But yeah, if anyone else wants to keep talking about those particular applications, that's the area that interests me.
Like my challenge is eliciting feedback often, rather than building trust to elicit feedback before we get into all the things where this would be applicable, and I think that that's like, kind of core to the day to day work that I'm doing. Right
that's that's feedback on like, kind of the efficacy of the tools
on how the tools work on what the tools are for on whether it's worth it to start using any sort of digital tool, rather than paper spreadsheets. And if in our like data requirement, we say, you know, your feedback may be used to train our chat GPT model everyone's gonna be like Okay, I will not say nothing.
Makes me wonder if you could use some kind of machine learning to instead of like evaluating, like, perhaps evaluating like actual verbal feedback, but also just to like, evaluate, like user interactions, like if you were to plug it into some kind of like telemetry tool, and assess like, what? Assess something from that, rather than having to like harass people over email.
There's a lot I was thinking that to Jamie, there's a lot of recorded. I was just looking at my own life just recorded all the speeches I gave on the same topic over the last several years and some of the transcripts and can you just like plug that in? And I was looking at some other stuff. We had a lot of resources under the corn growers for like all the things we kept repeating to farmers or farmers to us. And like I think I don't know if that answers your question, but that's kind of where I was going on is can we leverage that to make it I mean, one, it helps. I think there's public it's all public content, right. And so is there a way to kind of pull that in? Because it's mutually beneficial gets that gets more leverage out of the work? We've already done, but also hopefully helps others down the road when they're asking these questions. So I don't know if anyone knows how to do that. But I'd be interested in
trying. I think that's part of what Paul was sharing is you can create your training dataset in that platform sort of setups and that would, that'd be a place to look. I think, yeah, it's
been a really long time since I've coded so I'm your look to you. Oh, no. So maybe if there's someone here if you're willing to I would love to be if you're doing it and you don't mind having someone to watch or testing now I'd love to see how to do that. And, and maybe it is, um, some of it is might be spoken not all in transcript form.
Yeah. I have to say I second that I would love for someone to like pick this up and run with it. So I don't have to spend my nights on it because I probably won't get very far. I think there's a lot we could learn. Like together. I also think, I think maybe what's going to have to happen is we need a whole new language around a whole new sort of like, public understanding of these training datasets and their privacy. Right, right. Now, if you say all your data is private that has all this, that they people understand kind of what that means. But if you were on a call, and you said, Well, we're pushing this into our private training data set, if people understood what that means they may have a similar perspective where it's like okay, I suppose that's fine, because they know what that there's just language that's gonna have to get built and understanding in a broad way for for not to be scary, not just language, but actual features. Right? Those are features I don't even know if that's possible. Like is the training data fed back into the main chat GBT? I have no idea at this point?
Yeah, yeah. They're probably not going to let you know either. What they are doing? I would say good advices. If everybody anybody is going to do any work with them and send datasets, keep them for yourself very well organized and stored them. Because eventually there's going to be an open source model. And eventually, you know, the, if their business model goes well, they will be more expensive, more closed, more restrictive as it happens with these things. And it will be a shame to lose all the data you've sent to them just because you sent it there first. So it's a very good practices if you are gonna send it anything to them, store it, keep it for yourself, and eventually you will be able to send it to something that really belongs to you. And currently, you can see how it works. And it has a lot of aberrations in which it just repeats, some, you know, some literal piece of data that was fed into it. People have found a lot of private data and stuff there's been issues also in other similar models, like the convolutional models that produce a white art, for example. It's fairly typical for them to just beat you know, somebody drawing, typically a drawing that, you know, the trainers had no rights to use, right? So it's basically stealing. The practices in in privacy have been very shady with all these models. I don't know if this one which is bigger has been more careful. Probably they have. But yet, it's still moving on. You'll see you can see when it speaks literal chunks of stuff that many times they don't have the right to use. Eventually so just i Sorry, I just saw circular back. If you'll send anything to learn, just also store it for yourself because eventually there's going to be an occasion to feed it into another module and more open and have more control on it.
And that's a really good ending point for the conversation is like, you know, we all obviously want to have an open source version of this and we should support them as they appear and all that jazz and keep your training data. Great. Well, thanks everybody. I mean, I don't have any specific outputs, but just wanted to have a discussion and a sharing opportunity and and hopefully this was useful and you know, follow up with individuals or through the forum. Go for him if you if you want. I definitely would love to have someone who I can like hand things that I want to try out in chat GPT so if there's anybody on this call, you know, somebody who's kind of like, capable and willing sounds like maybe Shefali was in a similar boat, so please just let me know. Otherwise thanks.
For that one question.
Yeah, anybody needs to know, I guess. Yeah.
Would anyone else be interested in like, reaching out to like cold calling people who might know about this stuff better because I feel like we're in a middle zone where, like, we want to try to convey this stuff to people who don't have any technical background. But I also feel like, I'm like, way out of depth on this stuff, and hardly the expert to try to inform that process. So like inviting somebody in who like, knows this stuff better. Like if
I'm a data scientist, I know how their models work. I would have a chat I could prepare something especially about language models, which are not my main focus of attention.