Download Otter for your meeting notes

RUNEUAI 07 Intro to Data Structures | Otter.ai

RUNEUAI 07 Intro to Data Structures

Bernard Goldbach2h 18min

Filipe

00:00

The Sound of Music A lot probably, don't you?

Stephanie

00:02

Oh no, I never see them, so never, but I, I come from the city where they make the movie. Yes, I grew up there, and I don't watch the movie, never. But a lot of tourists come only to see the place

Filipe

00:22

and Almudena, how about you? Do you have you seen the Verano azul TV series? Do you live there? Oh, your your mic is off. Your mic is off. Almudena, we cannot hear you. Verano Azul was a TV series, the Spanish TV series that was very, very popular in Portuguese Portugal, Modena. I know that you're speaking, your lips are moving, but your mic, let me ask. I'll send you a message

Almudena

01:00

in Spain, I have put Bernie a tool in the TV a lot of times. Everybody before Me has seen this series, this TV movie.

Filipe

01:15

I know that the beach is on the south of Spain, isn't it?

Almudena

01:19

Yes. Malaga. Malaga,

Filipe

01:24

okay, so something interesting about you, Almudena,

Almudena

01:28

I like hiking too. I I go the University of Burgos. Have a group of senderismo with hiking, and we go sometimes to the mountain in in my city, Burgos, is very mountains around it, and is so variable, different

Filipe

01:59

landscape.

Almudena

02:00

Landscape is so different, not so and we all particularly, I go hiking and be two with other groups, and I invite to the work that is,

Filipe

02:20

I think, Bernard and what makes you interesting? What are, what did you do when you were wrong? If that that your son doesn't? Oh, yeah.

Bernie Goldbach

02:31

Well, the thing that makes me interested also makes people worried, because I used to work in America's largest office building with special cards in and out of the door. So that was the Pentagon. And there's plenty of many, many stories from from that time that actually factored into a series called The Americans. It's a spy series. So that's that is one facet of my life, working in America's largest office building.

Filipe

03:02

And I've heard a lot of stories from you, like driving cargo planes in Egypt and being arrested and driving the office things like that. But that's another story. Okay? Vanessa, what makes you interesting to know?

Vanessa Okoli

03:22

I like psychological thrillers. Yeah, I like reading. I'm trying to get into reading more. I always have phases where I read, like, a lot of books, and then I stop it. Stop doing that. So, yeah, I think that's something interesting about me.

Filipe

03:37

Okay, who's your favorite writer?

Vanessa Okoli

03:41

Um, I don't know if I have one. I just go online and look for recommended books, and then I just read it. I'm like, Okay, this is nice. And then, yeah, I don't have a specific author that I really like,

Filipe

03:53

yeah, my wife is the same, okay. Vanessa, thank you, Sara. I think from my screen, my rose. Sara, have you presented yourself already?

Sara

04:03

No, not yet.

Filipe

04:04

Okay,

Sara

04:06

um, I have too many places I want to visit in my bucket list, but I think of it. Everyone has

Filipe

04:16

What's the strangest one?

Sara

04:20

Okay, so I'm also very much into hiking. I know, a bit boring at this point, but I just saved today, and there's a hike you can go in Madeira to do, and, like, you cross the island in seven days by, like, just hiking and using a tent along the way. And, yeah, that's my newest

Filipe

04:41

Portugal, right?

Sara

04:42

Yeah, exactly,

Filipe

04:44

nice. So hopefully

Sara

04:46

soon, nice.

Filipe

04:49

Okay, Maria O'Brien,

Almudena

04:55

I'm probably the opposite of them. I've done probably too much traveling, and I need to settle down. The most interesting thing I would have done was Anaconda hunting in the Amazon and looking for Cayman in the middle of the night, I'd seen all these little beady crocodile eyes, and we were in a little dugout canoe, and I camped up the Andes at minus 20 degrees, nearly got hypothermia. So I've had my adventures.

Filipe

05:25

Oh, nice.

Almudena

05:26

Yeah, yeah. Far too many. Far too many of them now to list, but um, yeah, so that's me

Filipe

05:33

cool

stories to tell your grandkids. Did I miss someone? David Stephanie, Ava Sara, okay, okay, so I'll start my my session. Okay, so I will be presenting a PowerPoint, which I will be I will make available at the end of the session at the decks folder, okay, I'm not giving it to you now because it has lots of solutions to some problems that I'm going to give you, but there will also be some data sets that we will be working with. Okay, so you will be feeding chat, GPT, some data sets and crunch data. And when I during the lesson, say, go to the Data Sets folder and get file number one or file number two, this is where you try to get it. Okay, so, okay, so I'll try. I'll try to expand my screen just one second. Can you confirm that you are seeing the cursor? Yes, okay, okay, so I will, I will today. This is a very, very informal lesson, okay, I don't think we need to submit any homework, but I will be speaking about structure and then structural problems, as the title suggests. And I will be arguing that AI

is very good for one kind of problems, but maybe not so good for the other one. Okay, so we'll try to define the limits, the boundaries where you should use AI or not, in a sense, okay, so by summarizing things that you've done with Bernard, okay, you've seen that you've you've used success, use AI successfully to do several tasks that usually computers were not able to do before AI came into place. So for example, the art of summarizing a text or the art of translating a text. Those are two, and we'll see that this is a class of unstructured problems. Okay, we also created with different information modalities, so images and videos, and we saw that chat, GPT or your favorite AI engine is also good on those So in conclusion, AI increased the set of tasks that computers were able to do, okay, but now I want you to think about this. So should we give AI the tasks that were already being done the classical way? Okay? So for example, and we have several students here with the in the in the part of economics and and they will probably work a lot with data sets, okay, and table, or tables, that data set, is a fancy word for for table in a sense. So for example, when we use a software such as Excel to calculate the company's profits this month based on product purchases, we can ask AI to do this. But should we or what are the products that are less sold, and in what regions so inquiring the data, querying the data to get valuable insight. Should, should we? Should we let AI to do this? So I would like you to spend five minutes discussing this in group.

Okay, do you think that there might be some tasks that you should not give to AI, or in a different way, what tasks should be done by computers, the old way?

Filipe

09:41

Sara

09:43

Well, I think that it depends on the company's privacy policies. So if there's document that should stay private, then it is not advised to use an AI tool that is not protected by feeding it companies info that are not supposed to be disclosed.

Filipe

10:06

Well, I agree, right, although that was not the kind of answer was looking for. Okay, obviously that I'm not, I obviously am not aware about the data, the data public versus private, and if this is in a European server, if they are in European servers or not, that that's a good a good point. But what I'm asking is here, that is that computers, before AI used, used to do lots of things right with software such as Excel. And now I'm asking you, if would you trust the AI to do something that Excel used to do. So, for example, if I use Excel to calculate the sum of all my my my sold items, I know that in Excel, I get the precise answer, a very deterministic answer. But how? How do I trust? How well do I trust the answer given by AI? You see what I mean? Okay, would you? Would you say that there is a set of problems that you would not trust AI to do, and would rather use the old classical softwares instead.

Sara

11:27

Well, from my own experience, but I think this was a few months ago. I tried to do that with chatgpt, but it couldn't get all the like, if the dataset was too big, then it wasn't accurate. So I would still use Excel if Excel works, because I just, I'm just of the idea that I'll use the fastest and more most accurate system I can. Okay, so that case, in my experience, it didn't work that time. I don't know, maybe now, with chat GPT five is more accurate, but personally, I would use Excel if Excel works.

Filipe

12:07

Yeah. So I've never, I've never tested that boundary about the size of the data set might influence the accuracy of the answer of chat GPT, but that's that's something worth thinking about, and we'll try to see if there are some examples that corroborate your opinion or your argument or not. But so you probably, as students, don't use Excel very much. Do you or do you think that it's a tool that you often use?

Sara

12:42

It depends on what kind of study you're doing.

Vanessa Okoli

12:50

For business, we do use Excel well for data data analytics, that's something that we used back in third year.

Filipe

12:58

Okay, so in

Stephanie

13:00

statistic we also use Excel, also in a university,

Filipe

13:05

yeah, so you use it as case studies in your courses, but not in your personal life. So in a sense, you don't, you don't have an interest in life, in the sense of, I have to gather all my So, for example, when I was young, and music was distributed by CD ROMs and not by mp three files, I might have a list and my computer in my Excel with all the CDs, the ones that I learned. So I really didn't need a computer system to organize all my information, because my life, my life was simple, right? I was a student. So you probably as a personal, a personal in your personal life, you don't need to manage that quantity of information, that you need a table, a data set, and the software like Excel to compute with that. With that data set,

Vanessa Okoli

14:03

I actually do use Excel. I just remembered for my budget, my monthly budget, I have, like a template, so I input the data, and from there, it calculates, maybe how much I need to put in my savings or emergency funds, things like that. But that would be an example.

Filipe

14:20

And do you use chat GPT to help you organize that those categories that you

Vanessa Okoli

14:25

know, I found like a template online, and from there, I just kept using it every month. I don't use chat GPT for

Filipe

14:32

that. Okay, okay, okay, so any, any more thoughts about it?

Almudena

14:40

I think that software like instead or access, is very potent and to me solve all the problems. Maybe I need how one form in a formula, how? How can, how much I put something for, calculate the media. But I, in my youth, I use estate with many, a lot of data, and I don't need a year for for nothing,

Filipe

15:21

okay? Because that's an interesting question. So you mentioned a formula, right? You have a data set with data in, and you want to produce data out, and you have a formula that transforms data in, in, data out, how you how would you trust AI to verb. You verbalize the formula. I went to sun. All those do two columns and then divide by two. And we expect that he always get it right. If it's not better that to use Excel or just a pocket calculator, right where you where you compute, where you compute the answer. So that's,

that's the argument I'm about to do. Okay, okay, so I will be speaking mainly about data sets, which is a fancy word for for tables. Okay, in a sense. So you probably know that the what would you say that is, so we have economy students here, and they will be telling you if I'm going to lie or not. But what do you think is the biggest asset of a company?

Filipe

16:42

David Stephanie, what? What do you think, or what is the biggest asset of a company?

Filipe

17:00

Okay, I would argue that it's the it's data okay, it's data sets. So all the information about its customers, its employers, its service providers, its products, its raw materials, the fleet of cars, the number the all the travels that the car made, all the all the sold of the products, all the buys of the raw materials. So if a company loses the data their their database, it's pretty much the end. You I know this story. I don't think it's a false a false story when the the attacks in the September 11 happened to the Twin Towers, to companies that were on those towers, most of them got bankrupt, not because the building was destroyed, because we can always build a new offices, but because the data, the data on the servers there were, there Were backup copies, but probably were in the same building, and they were lost. And I think the FBI even mentioned that there was a new definition of a backup copy, that is, if a copy is not 10 kilometers away from from the original copy, it shouldn't be considered a backup copy. So this is just to say that all of you are probably going to work in companies, NGOs, companies, any kind of institution, and you'll have to work with tables, tables, tables and compute things in tables. Probably know, a little bit of Excel, probably, know a little bit of Power BI to do to draw some cool graphics. And we'll try to use chat, GPT or AI to see if it's if it offers some value in the automation of these tasks. Okay, so for example, this data table I have here the number of extra large avocados sold, but also the large ones, but also the small

ones. And then I can compute the total right by being the sum of these last three columns. So my question would be, and as Almudena was saying, if I use Excel, okay, I just have to create a formula that where I say, send the columns on the right. If I use chat, GPT or something that is based on text, text prompt, you know that probably the quality of your prompt may influence what you understand. Is the problem, okay? And so in theory, it might get a wrong answer. Okay, so just for curiosity, just for your my survey, try to see these three kinds of prompts that do more or less the same. But where do you think that you usually are? Do you usually if you have to ask chat GPT to do this calculation. Are you a person that usually give this kind of very simple, very straight prompts? Or, for example, you get a little bit more abstract because you don't say you don't, you don't speak. In this one, you speak about avocados. This one you just go to Column D, E, F, so more objective, more abstract. And then there's this one, which is probably on the middle space between these two. How? So what I think I'm asking is, how do you create the prompt? Can you tell me what was the expression that you use, that you would be using?

Almudena

21:08

I don't, I don't ask this. I asked, How can I put a formula to calculate to what the three columns.

Filipe

21:22

Okay, so that's an interesting approach, because I will be, I will be trying to argue that in a perfect world we would left, we would use Excel, so the old problems that computers used to do good, we would use the classic solutions, Excel, etc, and we would just use AI in in new solutions and structured problems, which I'm about to define, but if I don't know how to use the software, for example, Almudena probably cannot. Doesn't know how to create the formula in Excel, so she asks chat, CPT, not to compute the answer, but to compute the formula, to create the formula, and then Excel is computing the answer. Cool, and does it usually work? Yes.

Almudena

22:24

Yes, I find the solution.

Filipe

22:28

Okay, so we will try that in a little bit with my data sets, and they will you will be able to test it. But how would you so, just for curiosity, what would be your prompt? Would you say, go to Column C and create a formula. How would you explain to chat? GPT,

Almudena

22:51

I if I want to add the three columns, I ask, how is the what? What is the formula to add three columns?

Filipe

23:05

Okay? And your experience is that by being that with impression, he understands what you mean, cool with

Almudena

23:16

things more complicated this, I know what we

Filipe

23:20

will.

We can test it. We can, we can try. We can try the two approaches where he computes the answer. And so, for example, chat GPT is able to compute the answer. Okay, so, but the disadvantage is that, as you know, Almudena, if you change one of the data, the input data, if there is not a formula here, you will not update the results. So your approach is better, because you say, Well, don't, don't do the calculation. Just put the formula and then, Okay, nice. Okay, so now I am going to challenge you. Bernard, do you do we have, do we have a prize for the most, the quickest person to give an answer to this problem.

Bernie Goldbach

24:20

I told people that we're going to have this prize fund that starts to appear. So I don't have a prize right now, but we record this as one quality point. And every time we meet, I will tell people who's on who's the top of the leader board. So we will simply record that tonight, and then it'll it'll be a running tally of the leaderboard, okay?

Filipe

24:42

Because this, this is a, this is a made up problem, but one that I thought so I was in a, I was in a shopping mall, okay, and I was preparing to get on the electric stairs, and the electric stairs were not working. And what I thought was that something like, I think the shopping mall is trying to save energy, right? So I think that one day of the week they they do not use the stairs, so that, so it can save a lot of energy. So what would be the day of the week that they would choose. So my my guess was that if they have a table with the number of estimated people by day, okay, they would try to use the weekday where people would be less in the shopping mall, okay? Because I don't want to annoy the customers, so I would like to know what is the weekday that usually has in a year, okay? Is it Monday? Is it Tuesday? Is it Thursday? Okay, so that's what I'm going to expect from you. So I'm going to give you a I will be showing you a data set that has

the number of people that were were on the Mall on the first of January, on the second of January, until the end, and the first one that shouts the day of the week that usually has less people earns this contest. Okay, so let me explain better what I want. So if you, if you go to, if you go to, to the data sets folder, you have this problem in two different flavors. You have a very small data set with that which I think that if you don't have a paid version of chat GPT, that's the one that you have to use. Because if you use a very large data set, it will tell you, I don't want to analyze data anymore. Okay, you just finish your your, your quota, okay, but so, but see what the data set is. So the data set as the dates, okay, and the number of visitors. To answer, what is the weekday that usually people are less people I would have in Excel, I would have to know the the function, the formula that computes the weekday from from a date which I don't know, and then to aggregate all the all the results of the same weekday to know what is the weakest day? Okay, since I don't know that, I will ask chat GPT to do it. Okay, who wants so? So Get, get a data set, okay, either the first one or the second one and shout, it's a Monday, it's a Thursday. Okay, so

Filipe

28:29

also

as a hint, this course is called flows in AI, right? So, in a sense, there might not be one, one prompt that does all the problem you might want to divide this in. Let's calculate the weekday. Let's now do the aggregation. Okay, so you have a flow that works like an algorithm that you will use in the future to compute this kind of problems. Okay,

Vanessa Okoli

29:02

as well. Wednesday, the answer,

Filipe

29:05

you know the answer Wednesday, I think, what's the name of the person who shouted, Vanessa. Vanessa. Have you used the small data set or big data set? Because,

Vanessa Okoli

29:17

okay, all right, large,

Filipe

29:20

large one. Do you have a paid version or a free version?

A00332810: Ava Brady

29:25

Paid?

Filipe

29:27

Okay,

I'm asking because I want to, I want to see the limitations of of what you can do with with the free version. Okay, so, so Vanessa thinks that it's Wednesday. I'll try to see it. I have also Wednesday. Also David the Wednesday, yeah, the large set and the paid version. Okay, thank you for the feedback. You

Filipe

30:04

How many prompts Did you Vanessa or David? How many prompts did you have to use to get the answer? One, just one. Could you write it on the on the chat window so I can probe it? I used to,

Vanessa Okoli

30:25

but I was doing it as you were talking, I was typing it, and then once you said to add the data, I said to use that data.

Filipe

30:32

So okay, yeah, so we have, we have some answers that say Sunday. I'm assuming that it's the smaller data set, right? So Stephanie, Vanessa, Ava and the Zoom user, which I think it's, it's Bernard, oh, this is, this is a nice hacking so I thought that the Zoom user was some kind of AI that Bernie used. I'm going to rename myself as AI. Is zoom AI, and people will think that I am a bot when I'm not. Okay. So,

Filipe

31:17

oh, David Lucas, David did day of the week you got? He answered, You do weekday. He didn't like give me a list on which day, which they will be the best, yeah. So, for example, fourth of April seems to be the weakest day, something like that. No, it shows me the Wednesday is the best. Then next will be Monday, Friday, first day, Sunday, Saturday and Wednesday, Thursday. Okay, you all felt so I see that in Vanessa and David, you feel the need to explain, oh, I want to save energy. You need to justify why you are asking the computation, right? Should this be really necessary? I'm not sure, right? I'm testing this with you. So, okay, so Stephanie. Stephanie is very, very objective, not objective. Very, very short answer. So it doesn't bother chat. GPT saying that it's for, saving energy purposes and but she uses an expression, which she which is important, which is average.

Filipe

32:55

Okay, so, because my, my concern was that some of you, instead of saying something like calculate the average per weekday, would say something like, calculate the sun per weekday. Okay, and the sun has a problem, right? Statistically, statistically speaking, because, for example, in the small data set, since I have more Sundays than than Wednesdays, he will, he will tell that Sundays is probably the worst day, the one that has more people, because you're just asking to send so average would probably be a best, a better estimate, right? But obviously, this is not an error of chat GPT, this is an error from you. So for example, Sara, Sara, I have a Saturday in the largest data set as a sum of visitors, and Thursday as average. Sara, use the small data set

Sara

34:04

now the large, large one, yeah, but it's because I finished my free trial of chat GPT five, so I just asked him to create the formulas and tell me how to do it through Excel. So that's why it took me longer.

Filipe

34:21

Oh, could you? Could you? So you did as Almudena did, which is create a column with the weekday of each date. Yeah.

Sara

34:31

Now I don't have, I don't have chatgpt Five premium, so it doesn't let me work on a document itself anymore. But I asked him, like, how would I What's the formula to identify which day of

Filipe

34:44

the Okay, okay,

Sara

34:45

I created a pivot table, okay, third column that I just created with that new formula, and then see so the sum and the average.

Filipe

34:55

Okay,

okay, so I'll try to, I'll try to answer that question myself, so I have the paid version, so I can drag and drop. Okay, I have the large the Large set. And let me try to see, let me try to ask something like, calculate, which is the week day? Which is the weekday, where, on average, there, so let me, let me, let me see. What is the name of the columns? Number of visitors, okay, which is the weekday, which is the weekday where, on average, there are less number of visitors. And you probably, if you, if you, if you have this customized in something like I want you to describe with lots of details, your your reasoning, you will get you will get you how you computed the answer, right, oh, which was not the case. So, for example, in my So, for example, on my case, it tells me that, on average, Wednesday has the few visitors. Okay, so I'm going to explain your reasoning. Okay, how do you have customize your chat? GPT for a very quick for an answer, or with, with an explanation before the answer. Okay, because let me so

I got the different data and you write, parse the data. Okay, so he did some corrections, add the weekday pool, grouped by weekday and took the mean, the average of number of visitors. Compare those averages. The smallest mean is Wednesday. Okay, okay, you also did some kind of statistical test to try to see if, if the conclusion that I got, if Wednesday is a statistic, if Wednesday is really a day where there are less people, how is this, just because of the the noise of the data? Right? In a sense, sometimes there is some fluctuations, but there is some statistical significance, and it turns out that they thinks that it is right. So, so I'm not getting your your your answer, am I can you? Can you confirm? Could you? Could you ask your chat GPT to explain the reasoning and say in the and try to see how we did it?

Filipe

38:24

Oh, no, so, so Sara, Sara, got the same answer than I have right Wednesday. But Vanessa Sara, no. Vanessa David, did you say Wednesday?

Vanessa Okoli

38:41

Initially? Yes, Wednesday.

Filipe

38:47

Okay,

so, so I would argue that in a perfect world, if you know how to compute these in Excel, right, you would compute this in Excel. Okay, if you don't know how to use Excel, then chat GPT is a great help, because it helps you to do that analysis, right? Or you can take a blended approach, and like Almudena said, you can, you can ask the chat GPT to create the formula, and then you just probe, you inspect the formula, and if you see that everything is correct, then Okay, right. So let me try Almudena, and I think that Sara said about pivot tables, right? Who said that? Who talked about pivot tables?

Sara

39:39

Yes, it was me, in this case, chargeability would be faster if you have the premium or, like, if you still have access to chargeability five with the free account.

Filipe

39:50

Yeah.

So, so I'll try. I'll try to use Excel slang. Okay, Excel technical awards like pivot table, which I think is a good approach. So for example, create a new column where the week day is computed. Sorry about my English, through the date, something like that. Then aggregate, aggregate in event table. So I think that so statistics is the art of creating summaries, right summaries of the data. So we are creating a smaller data, smaller table where I just Well, I aggregated the results by weekday, in a sense, aggregating a pivot labeling column would have the weekday and the right column would have the the average, average of people. So you see that I'm not. The English is bad, the grammar is bad, but I think that chat GPT will, will, will do the solution, return me the Excel file. Okay, let's see. So Almudena, I would assume that this is generally how you work. You give them the file, you ask them to give you the file.

Almudena

41:40

No, really, I asked, how must I do?

Filipe

41:44

And okay, and

Almudena

41:46

I do in the steps,

Filipe

41:49

okay, let's see. Let's see if this got it correctly. Oh, it didn't. So for example, it just gave me. It just gave me the, well, the summary that I asked, right? So I would see that it's true that the number of visitors. So now I would create some interactions here to say, oh, no, I, I, I want the original file. I want original flair with the formulas, something like that. Okay, so he asks me, did the formula is in Portuguese and English? Okay, I'll put it Portuguese, but I think it's it would be relevant, because I think that, I think that you would translate, it would automatically convert the formulas. So let's see how this goes.

Filipe

42:57

Whilst

he's thinking, let me, let me skip a little bit, because a little bit. So my son told me that if I want to get you motivated, I must use the language, the cultural language of of your generation. So do you know this meme? Have I used it correctly? Can someone give me feedback.

Vanessa Okoli

43:24

This is my first time seeing the me.

Filipe

43:27

Okay, thank you. So, okay. So, chat, GPT done the job. Okay, if there was an error, would be that if I asked to some to send the number of people and not the average, although this would be some kind of human error, right and there and and this is the problem, because AI, this, this popularized. AI is based on our speech, and our speech may be imprecise. So I would say something like, send the number of people I would be leaving. I would be leaving the chat GPT on a wrong on a wrong assumption. Okay, so

let's, let's so so have a small a small distinction between structure data and then structure data. Okay, so let's imagine that I want to do a survey to ask people how they commute to work. So how far is your workplace from home, and what means of transportation do you use for work? Okay, and then obviously some demographic questions about the name, age and marital status, something like that. Okay, so let's assume that I don't structure this on the table. So the table is the most structured way that humankind probably used to organize data, right? It's the oldest and more and probably more efficient, but I obviously could use some and structure information. So for example, in this one, I begin with the name and the age, but this one I begin with the name and the distance. Here I begin with the with the merit status, something like that. So this is completely, completely unstructured, right? And this is what AI was done for. So AI is very, very good to work with things that are unstructured, something that Excel, and the old, old ways of computing, which are very deterministic, right? Go to this

column, sum this column, do the average of this column. This was what AI excelled. Okay, and so I want you to do something. I want. I want you to feed the chat, GPT, this text. Okay, so this is in the is, this is the second data set. And tell chat GPT to create a table from this data. Don't be too precise, just say, create a table. So you're telling them, try to see the patterns. Okay in each of the sentences, try to and he will sort it out. If it's a name, if it's an age, etc, etc. Okay, and tell me if you were, if he was able to create the table from from the text,

Filipe

47:01

I'll

do it along.

Filipe

47:15

Then, if you could share with me your prompt so I so I can know what, what kind of reasoning you

Filipe

47:39

do. Stephanie, just say, yes, it create one. Oh, that was not the prompt. Was it?

Stephanie

47:51

No, I only said create the table and he create one. It was the last three words,

Filipe

47:58

a perfect one. Yes, okay, so you didn't have to say that these sentences have names, have ages, so we started out right,

Stephanie

48:10

yes, I show you. I make a screenshot. Okay?

Filipe

48:23

You

will be sharing that in the chat window, right? Yes, it's loading. Okay, thank you, Stephanie, thank you very nice. Yeah, completely. Oh, but you're on the the car. You got the drives. Okay, cool. Mine, it hasn't so mine, it was put just the car, okay, so in a sense, my, my Vanessa, didn't want to summarize, not summarize, but clean the data, or to structure the data, that's the expression, but to see if there are correlations, if there are associations between the variables.

Vanessa Okoli

49:10

So I just chose to copy and paste that one, because I know tables might look funny. Okay.

A00332810: Ava Brady

49:21

Cool. Cool,

Filipe

49:27

nice. So, so, my, my, my reflection here is that. So he was he, he was able to sort out the information and got rid of all the text that is not important. He, categorized it right in the sense that this is a name, this is an age, this is a marital status. He also created structured information in the sense that, for example, married, married. He used the same word, right? And then if we want to compute the number of people that are married, the word married must be right, written equally on the on the older people.

Does this make sense? So you saw that in the marital status, there are categories married is one of them. So we chose this word and didn't put Mary in one and Maria in another one, which is useful so that i That's why I was telling Vanessa that when she put car drive, that would be a little bit of noise, and then we would have to clean it up. We'll speak about that in the next in the next session. Okay, so no problem here. Okay, obviously that, for example, you see that I purposely put one of the distance in meters, and the other ones are kilometers. This would create a problem if now I wanted to use Excel to compute an average, right? Because there are two different units. So as you'll see, data that comes from tables usually have some noise. Okay, there is some some information that is not correct or not uniform. So you will see that this session is mostly on creating tables, using tables to summarize articles, web pages, etc. But then it demands that we have a critical thinking skill to see that the table created needs some cleaning and some money formisation. Okay.

Filipe

51:56

Okay,

so, so this is the most structured information there is okay, but now I would argue that also be aware, because sometimes tables have unstructured data. So look at this table. If I ask people to do you like your work? And He answered me like a free text. This is unstructured, right? And this is what AI is good for, as you know. So with the in Bernard classes, you saw that chat GPT is very good in summarizing text. Sometimes the text that needs summarization is inside tables. You're going to do some client, client survey or or your your you're getting the comments out of so you have a blog, and you have comments, right comments from your posts, and you don't have human power to process 1 million, 1 million comments. You can ask AI to please go to every comment, put it in the category, I like this post, or I didn't like this post, and now compute to see the sentiment analysis of the person look and and Stephanie, do you do in your classes? Do you do sentiment analysis on your data analysis classes?

Stephanie

53:38

No. Have

Filipe

53:40

you heard about the expression sentiment analysis?

Stephanie

53:43

No, that's why. No, I don't think so.

Filipe

53:46

So,

for example, you get a tweet from someone, and you try to assess if the person is angry, happy. That's sentiment analysis. So be aware that I did some errors on purpose. Okay? Because we will know that chat GPT is very robust to this. And so what I want you to do is something like, I want to know if people are happy in their work or not. So please chat GPT try to do a summary of this information. Okay, so I'm not going to give you the prompt. Try, try to solve this problem. Okay, so I'm your boss. I want, I want to know so you work on my human resources department. I want to know if the employees are happy or not, if they like to work. But we have 1 million, 1 million employees. We are a McDonald's or a Coca Cola, right? And so now I have 1 million rows of of 1 million employees to process. I don't have time for this, so I would have to use chat GPT, okay, so please go to your go to your data sets folder and use this job commute survey slash table and ask

chat GPT, well, I cannot say the answer, Right? I'm the boss. I just want the answer, are people happy or not? Write your write your answers in in chat, in the chat window. Please do I

Filipe

56:21

So Vanessa has already put something here. Work. Satisfaction levels vary. Three people express clear. Two shows this is faction, and now he correlates the satisfaction with being married or not, okay,

Almudena

56:40

danger

Filipe

56:42

okay, what prompted you use Vanessa? What did you ask specifically chat GPT to do?

Vanessa Okoli

56:51

I said, look at this table, summarize this data, and what is the feedback given? Okay?

Filipe

56:59

Because

my, for example, try to see how I approach this problem. Okay, I'm going to solve it my way. So I want structure. I want not quantitative data, but I want, I want to create, I wanted to create categories. So I will say something like, let me say the name of the column. So it's not this one.

Filipe

57:33

you like your work? So create a new column where categories are defined based on the problem. Do you like your work? Okay? Because Vanessa, what you did was so much summarized, so much summarized, that you cannot check if he got it right. Person by person, okay, but if you, but if I ask, create, continue the table, but put on the side your, your reasoning, right? So for example, he's asking me, well, Filipe, can I do three categories, the positive, the neutral and the negative? I will say yes. And now let's see. Now I can see if, if his reasoning well. So I'm going to assume that he reasons well, because AI was done to work with unstructured problems. Okay, so unstructured data in a sense. Let me see how this goes.

Filipe

58:58

So,

so, for example, Vanessa, I got, I got three neutral, two negatives and one positive.

Filipe

59:17

Please pay pay attention that if it was a human to do this by hand, he himself would have doubts. Seen some of these answers, right? For example, most of the times I would interpret this as positive. Or, for example, this, I wouldn't see myself doing another thing. Oh, chat. GPT. Got it wrong? See, I think

Vanessa Okoli

59:48

that's kind of neutral. I don't know if that's a good thing or not.

Filipe

59:51

Well, I think it's positive, right? When I say, I wouldn't imagine myself doing a different job, right?

Vanessa Okoli

59:58

Maybe because they don't have options. I don't know

Filipe

01:00:02

my my argument is that a human judge, by judging these sentences, would have difficulties in some of them, it would make some errors. And obviously chat GPT is going to do errors as well, but I'm going to argue that the number of errors that chat GPT does or AI does is less than a human judge. Okay, I'm going to show you some examples. We still have 30 minutes, but this is a this is an approach, right? So I would use this if I have to process 1 million rows so there is no human resources for for computing this task manually. And so I give this to AI and expect a 5% error, which is not important, right? Because if the data is very clear on, 70% of the people are happy and 30% are not. If he errors by 5% or 10% I still got a strong signal with with little noise. It's enough, it's it's reliable enough to to to a company to decide on the policy or something like that. Okay, so again on David or Stephanie, do you? Do you? Do you usually use examples where you try to access customer feedback, for example, and for example, you probably are encouraged to, oh, be aware that, since we don't have human power to process an open answer, please demand that the customer checks a boy an option out of these 10, right? And this is, this is a little bit poorer, more poor, because if you, if you ask the client to fit itself on a category, you're doing that to make your work easier, in the sense that you can then count the number of answers you put Excel, and you can, you can have not a very tiring work, but AI expands the possibilities of customers creating more rich answers in the sense that they are Open and that it processes that those, those answers Stephanie and David and this Sara any any advice on this customer feedback processing with AI. Did you get any ideas from this exercise? No.

Filipe

01:03:06

Is this one is very, very, some errors, right?

Bernie Goldbach

01:03:15

Filipe,

put some things into the chat through. I'm just going in with my mouse to the Google Drive, and I right click and ask questions in Gemini, and the answers seem to be relatively authentic doing that, but as I think Sara has pointed out, I believe I'm using Flash 2.5 Gemini model when I do it. I'm not sure everybody can do that model,

Filipe

01:03:46

but you, if I understood correctly, you're saying that Gemini has better accuracy than chat GPT for this kind of task.

Bernie Goldbach

01:03:54

It's

down to the prompt and potentially, yes, but it's model dependent. So my model is Flash 2.5 which is a higher model than normal. I'm not sure my model flash Gemini 2.5 seems to come in without me toggling for it. But I think Sara has pointed out that that's what I'm using. Not sure how I can spot the model number in in how I do it, so I went directly into Drive, right, clicked and activated the model, activated the AI,

Filipe

01:04:40

but

maybe at the end of this class, you could try to to give your thoughts about it okay, because I have experienced on this sense of comparing models, etc. And so you may, you may want to add a little some, some of their thoughts.

A00332810: Ava Brady

01:04:58

Okay,

Filipe

01:04:59

model comparison. Okay, okay, so we are going to skip this exercise because I don't have time, but there is the same data set, but in an image, right? So is chat GPT able to draw text from an image and structure it as a table. This might happen to you sometimes, right when we we scan a document with our printer or scanner and it scans as a jpeg or a png and not as text. Obviously, if you give this image or we don't have time to do it. But if you give this image to chat, GPT and say, extract the structure, the structure, in other words, the table, he will, he will do it correctly, right? Okay, so in a sense, and I hope I'm using this meme correctly, okay, I'm most fan of AI for working with and structured data. Okay, so an image, a JPEG file, is pixels. It's black pixels, blue pixels. And chat. GPT is able to extract patterns from it, pre text, right open. Text is unstructured data, and he's very good at summarizing, translating, categorizing, etc, etc. So this is what I brought to computers that usually worked only with

quantitative data or ordinal data in this in the qualitative ordinal variables, right? Like married, married, single, etc. So let me, let me give you this, this funny example, because I just want to tell you about false positives and false negatives, and how you by being able to understand what the false positive and the false negative, you are able to assess critical thinking if the answer that the chat GPT gives you has a lot of false positives or false negatives, and which is worse. Okay, so let's, let's think about two problems concerning health. Am I fat or not? Or do I have cancer or not? Okay, as I'm going to argue, the one on the left is a very structured problem, because there is a formula that I can compute and determine if I'm fat or not. And let's not think if the formula is always accurate or not. What matters is that humankind, the doctors, are able, are able to with the computer the classical way, by computing data in, data out, being able to address this problem. So if I have 100 doctors, they will use the same formula. They will they will reach

to the same conclusion, which is, I'm fat or I'm not fat, right? But the same doesn't happen, for example, with detecting cancer. So for example, there is no formula in the sense of, if a spot on an x ray is larger than three millimeters and the pixels are shaded gray with this amount of grayness, okay, there is no formula. What happens is that a doctor, when he's training to be a doctor, sees 100 images, and his professor says, these ones are in are people that have cancer. These are not okay. So the human brain learns by training with lots of examples of cancer, and lots of example with non cancer. And when, when I see a new a new data set, a new image, I'm I'm able to see this resembles more a person that I study that has cancer, rather than not, right. The difference is that AI, this is AI stats, I feed them with lots. So AI, as you know, probably know, trains with data. But the difference is that instead of giving them, giving ai 100x rays, I give them 1 million, right? And so he knows much better than a doctor. What is the pattern? Okay, the structure of an unstructured data and errors a little bit less Okay, in a sense, was it clear the difference between an unstructured problem and the structure problem. From a learning perspective, it is much more difficult to study to be a doctor than to be a rocket scientist, because rocket scientists everything is formless. Everything is deterministic. Everything is structured, right? And most of much of the work of a doctor isn't structured. So he has to see 100 images to find to find the pattern. Any thoughts on this. You'll see

shortly how this is important for for for you, assessing the quality of okay, so AI and humans, human doctors, sometimes they error, okay, because this is not a structural problem, and there are the so called false positives and false negatives. So a false positive, it's when an AI or a doctor, human doctor, thinks that I have cancer, but I don't have okay, it's a false positive. But the opposite may also occur. Okay, so for example, I say, or the AI says, I don't think that you have cancer, but you do. Okay. So two kind of errors, one is worse than the other. We will be seeing about that. But in theory, AI errors less than a human Okay. Were you familiar with these concepts of a false positive or false negatives in your courses, in your line of study,

Vanessa Okoli

01:12:19

we actually learned about this.

Filipe

01:12:22

Okay, in what context can you

Vanessa Okoli

01:12:25

data analytics and data analytics that we did? Yeah, my lecturer said that it's better to have a false positive than a false negative.

Filipe

01:12:36

Yeah,

you've already answered the question that I'm about to put it about which if, if AI has took error, what would be the best error or the least? The least problem one? Okay, okay. So for example, you might not know or but let's, let's, let's imagine that when I go to Google and ask for the Battle of Midway to search texts with the Battle of Midway. Again, Google has to read all the web pages in the world which is unstructured, test and try to categorize it every page with the topic there are false positives and false negatives. So a false positive, it's when Google gives me one entry on the list which he thinks it's about that topic, but it isn't okay. It can happen. The false negative, these are the worst. It's when he checks the page, the page is about the Battle of Midway, but he thinks it's not about it. So we never lists it, and if we don't list it with it's not we don't know that that page exists. Okay? So again, I know a little bit about Google's algorithm. What made him popular was not AI, what's it's an algorithm, something, something very, very deterministic, but I'm assuming that today, Google also uses AI to detect this kind of things. Okay, have you ever, have you ever happened to to search for for expression in Google and what he returned it was not about the thing that you asked for, and if so, do you think that it had to do with the quality of your prompt, the expression that you wrote on your Google search engine or not. Any thoughts about this?

Filipe

01:14:58

Okay,

so let's try to see, just for the fun of it, if you give a very big text to chat, GPT, for example, you go to Google, you put the expression reproduction of butterflies, then you enter one of those pages and you copy, pay, copy all the text and like, let's try to see if chat GPT was able to guess what is the topic of that page, right? Because look at this, Google guesses the topic of every page in the, in the, in the history of internet, right? It summarizes the web page to a single word. In this case, Battle of Midway an expression. But so try, try to do that. Try to do a copy, paste, ask chat, GPT, what is the topic of this text, and see if it matches what Google think it is okay.

Filipe

01:16:11

you want to use this text I have here, I'll put it on the chat window, but I

Filipe

01:16:28

Okay,

I think Zoom is not allowing me to insert the text because it's too long, Okay, but so tell me on the chat window, what you got? What was your expression?

Filipe

01:17:15

Bernard, you've underlined some of your text. Any, any reason?

Bernie Goldbach

01:17:22

No, that's how it came in with with Snagit, for whatever reason, when I screen kept it, I don't know why that happened.

Filipe

01:17:29

Yeah,

this, this gives me also an example, and Bernard, as we talked about one day, creating this course for for for teachers, right? So let's see if I ask chat GPT, just to underline the sentences that are more important, right? So for example, put in bold the relevant, relevant concepts of this text. Okay, so for example, to all your students, when a teacher gives a text to a student, you should also give the learning objective so that you know for each sentence what you should get as a as a signal and as noise. Okay, yep, so I should not ask chat GPT, just to put in bold or underline the relevant concepts, if I don't tell them what, what? What are the learning objectives? So, for example, my learning objective, is to know the, I don't know the phases of butterfly reproduction, something like that. Now I paste the text and let's see if he underlines it. Love is in the air. Look, this is a false positive. You agree? Yeah, and maybe there are some false negatives, something that he didn't underline and he should. So now, now I'm going to assess that. Okay, so

I would say that, according to the to the learning goal, learning objective, I would say that he did a pretty good job. So the the true positives, right? But some, some, some noise. And there's, I think Sara was say, we will see that sometimes false positives are not, are more, are less danger than false negatives. We will, we will do that, we will try to reflect on that. So, for example, a large, white I would say that this is noise. This is a false positive. It's the name. I believe this is a name, a name of a butterfly. Okay, so now I see that my problem here is that there are lots of false positives. So how could I change my prompt so that I could tell them I have to be more precise on the phases of butterfly reproduction, because he's not getting the topic as it should. So Silvia. Silvia is showing some in Bernard reproduction, life cycle of butterflies mate and lay eggs. Okay, so Bernard is saying chatgpt may take editorial license and dissociate love in the air with reproduction cycle. Okay, okay. So, so let's, let's send it here. So,

so now let's, let's give you some some skill that I think you'll be using as students, because your teachers ask you to read academic articles and you ask AI to summarize academic articles. Can you tell me what would be a false positive? So let's, let's assume that you did a summary on butterfly reproduction. What would be a false positive and the false negative? Could you? Could you tell me examples of a false positive.

Filipe

01:21:50

Let's give an example that everyone knows. So for example, the water cycle that we learn on school. Let's say that I have a two long, two pages long text, and I just want a summary of the stages, right? Evaporation. Then there is rain, etc. And he does a text, an almost perfect text, but with some false positives. What would be a false positive? What would be, what would be on a text that would be a false positive.

Filipe

01:22:42

Does

anyone try to guess? Okay, I'll show the answer. Okay, so for example, love is in the air, right? Love is in the air is something that AI included in the Arctic in the summary of the article, that it should not be in the summary. So we think it was something important. So it's an irrelevant detail, okay, sorry, irrelevant detail. It might be hallucination, as you know, right? Or, or, or, for example, in an academic article, it gives offer emphasize, emphasizing parts that you know that I don't want this level of detail. This is not problematic, right? Because it's an extra text. I read it, I say, oh, let's discard this, because it's noise, but it's not a problem. The real problem is a false negative. So there is a sentence that is very important to be in the summary, and I miss it right? It doesn't think the page is about the Battle of Midway when it is about the Battle of Midway. So he omns That part, and this is more more serious, right? So to be short, right? If it has a high false positive rate, it got a lot of noise, but

no problem. But if it has a lot of false false negatives. So your meats important details, you might get a different conclusion from the article, right? So usually offense are worse. It's like, it's like Sara said about, about this, this part of the of the of the problem. If AI says that you have cancer but you haven't, it's a little bit scary for the person, but then the person says, Oh, thank God, I don't have cancer, right? The real problem is that, well, in this, in this science as well, if it says that, please don't do any treatment because you don't have it, and then you do right? It has more more consequences. So the advantage of this is that if you have to read 100 articles, you don't have the human power to do it. Yeah, for your thesis, you have to read 100 articles. If you give chat, GPT, the 100 articles and say, summarize, me the main details. There is a small consequence that it's not troublesome. You will put some irrelevant details. Okay, just read them. You lose one minute of your life? Well, the problem is the false negatives, which

is the relevant, okay, but the advantage you quickly summarize 100 articles, if you expect, and it's not worrisome that there is a 10% of 5% misinterpretation. I would say that 100 articles, you would compensate the small errors. Okay, so next lesson we are going to see that we will do some, some examples that we did here in our school with marketing, that I asked, I asked chat GPT to go to 100 web pages to try to find the name of the director of a school, because we wanted to give them target advertising. Advertisement, the text in the school's web pages is very unstructured. Each web page is different from each website is different from other website. And so you might expect some noise, okay, but no problem, because if I get a list of 1000 directors and he misses one or two, I still get the 999 put potential clients in my in my in my task. Okay, Bernie, would you like to wrap it up?

Bernie Goldbach

01:27:16

Yeah, I am going to try to grab content from the chat zone, because this is one of the most active sessions when it's come to people adding commentary to the Zoom chat, it's deep, it's diverging in some areas, converging in others, and it's a very interesting set of comments. A lot of the comments were captured by the Zoom recording, but in my last in my experience, most of the comments in the chat don't synchronize with the activity, so I'm going to be spending time grabbing these comments. They're really good, and I do appreciate the attention to detail and the considerable amount of repetition people did to kind of ask again, ask in a different way, and to try a different strategy. So fair play to to to the comments made in the chat soon. Really, really good. I would add that I took screen caps and some other things. I'll be really interested to find out how. I'll be really interested to find out how the AI behind the scene, not just the recording AI, but the AI that's running on a service that's called H 5p is going to summarize what happened here. People who did a quick look at the interactive book know that there are quizzes that the AI asked, and it'll be really interesting to find out, will the AI ask questions of the questions? So Filipe proposed questions we're meant to work on. It's going to be interesting to figure out, will the AI iterate those questions and ask questions of questions? Because I have no idea what's going to happen, and I can say that I feel privileged to be here with all the rest of you as we're experimenting with all this stuff. I believe, by looking at how my Gemini was working and how other people's chat GPT was reaching the end of its useful life, you're running out of credits or running out of the testing the patients of chat, GPT five, I think by banging back and forth, I've learned that, I believe, because I'm paying for extra Google Storage, that might have given me a tier I'm not aware of in Google workspace, which seems to me that to have happened because it was able to, with a simple right click, improve the quality of the question that I gave it. So I right clicked and asked things about sentiment, and I asked things about the day when fewest number of people were on the escalator, and I didn't have to be very structured in the way I asked the question, and I got answers that were probably more than 60% high, confident answers. I would have asked them a different way and asked them to present to me maybe an Excel framework for the answer. But I'm surprised by that. So I guess what I'm saying overall is thanks for being so active in the chat zone. Thanks for providing me information about what works for the interactive book. And please, when you get the link to the interactive book for this session, look at it, because it's going to have some really interesting dimensions of deeper thought. And then finally, when we're together in in Portugal, you were going to be able to see side by side, or face to face, how these different technologies, tools and licenses work. And there's another level of learning in that you're we're about to go into a space in a month's time. That's it. I think it's actually the highest end use of applied AI in

Europe at a university level. It's really going to be impressive. That's all I have to say. Filipe, want to add anything to this.

00:0000:00