Loading...

stories to tell your grandkids. Did I miss someone? David Stephanie, Ava Sara, okay, okay, so I'll start my my session. Okay, so I will be presenting a PowerPoint, which I will be I will make available at the end of the session at the decks folder, okay, I'm not giving it to you now because it has lots of solutions to some problems that I'm going to give you, but there will also be some data sets that we will be working with. Okay, so you will be feeding chat, GPT, some data sets and crunch data. And when I during the lesson, say, go to the Data Sets folder and get file number one or file number two, this is where you try to get it. Okay, so, okay, so I'll try. I'll try to expand my screen just one second. Can you confirm that you are seeing the cursor? Yes, okay, okay, so I will, I will today. This is a very, very informal lesson, okay, I don't think we need to submit any homework, but I will be speaking about structure and then structural problems, as the title suggests. And I will be arguing that AI
is very good for one kind of problems, but maybe not so good for the other one. Okay, so we'll try to define the limits, the boundaries where you should use AI or not, in a sense, okay, so by summarizing things that you've done with Bernard, okay, you've seen that you've you've used success, use AI successfully to do several tasks that usually computers were not able to do before AI came into place. So for example, the art of summarizing a text or the art of translating a text. Those are two, and we'll see that this is a class of unstructured problems. Okay, we also created with different information modalities, so images and videos, and we saw that chat, GPT or your favorite AI engine is also good on those So in conclusion, AI increased the set of tasks that computers were able to do, okay, but now I want you to think about this. So should we give AI the tasks that were already being done the classical way? Okay? So for example, and we have several students here with the in the in the part of economics and and they will probably work a lot with data sets, okay, and table, or tables, that data set, is a fancy word for for table in a sense. So for example, when we use a software such as Excel to calculate the company's profits this month based on product purchases, we can ask AI to do this. But should we or what are the products that are less sold, and in what regions so inquiring the data, querying the data to get valuable insight. Should, should we? Should we let AI to do this? So I would like you to spend five minutes discussing this in group. Okay, do you think that there might be some tasks that you should not give to AI, or in a different way, what tasks should be done by computers, the old way? 
the number of people that were were on the Mall on the first of January, on the second of January, until the end, and the first one that shouts the day of the week that usually has less people earns this contest. Okay, so let me explain better what I want. So if you, if you go to, if you go to, to the data sets folder, you have this problem in two different flavors. You have a very small data set with that which I think that if you don't have a paid version of chat GPT, that's the one that you have to use. Because if you use a very large data set, it will tell you, I don't want to analyze data anymore. Okay, you just finish your your, your quota, okay, but so, but see what the data set is. So the data set as the dates, okay, and the number of visitors. To answer, what is the weekday that usually people are less people I would have in Excel, I would have to know the the function, the formula that computes the weekday from from a date which I don't know, and then to aggregate all the all the results of the same weekday to know what is the weakest day? Okay, since I don't know that, I will ask chat GPT to do it. Okay, who wants so? So Get, get a data set, okay, either the first one or the second one and shout, it's a Monday, it's a Thursday. Okay, so 
let's, let's so so have a small a small distinction between structure data and then structure data. Okay, so let's imagine that I want to do a survey to ask people how they commute to work. So how far is your workplace from home, and what means of transportation do you use for work? Okay, and then obviously some demographic questions about the name, age and marital status, something like that. Okay, so let's assume that I don't structure this on the table. So the table is the most structured way that humankind probably used to organize data, right? It's the oldest and more and probably more efficient, but I obviously could use some and structure information. So for example, in this one, I begin with the name and the age, but this one I begin with the name and the distance. Here I begin with the with the merit status, something like that. So this is completely, completely unstructured, right? And this is what AI was done for. So AI is very, very good to work with things that are unstructured, something that Excel, and the old, old ways of computing, which are very deterministic, right? Go to this 
column, sum this column, do the average of this column. This was what AI excelled. Okay, and so I want you to do something. I want. I want you to feed the chat, GPT, this text. Okay, so this is in the is, this is the second data set. And tell chat GPT to create a table from this data. Don't be too precise, just say, create a table. So you're telling them, try to see the patterns. Okay in each of the sentences, try to and he will sort it out. If it's a name, if it's an age, etc, etc. Okay, and tell me if you were, if he was able to create the table from from the text,
will be sharing that in the chat window, right? Yes, it's loading. Okay, thank you, Stephanie, thank you very nice. Yeah, completely. Oh, but you're on the the car. You got the drives. Okay, cool. Mine, it hasn't so mine, it was put just the car, okay, so in a sense, my, my Vanessa, didn't want to summarize, not summarize, but clean the data, or to structure the data, that's the expression, but to see if there are correlations, if there are associations between the variables.
so, so this is the most structured information there is okay, but now I would argue that also be aware, because sometimes tables have unstructured data. So look at this table. If I ask people to do you like your work? And He answered me like a free text. This is unstructured, right? And this is what AI is good for, as you know. So with the in Bernard classes, you saw that chat GPT is very good in summarizing text. Sometimes the text that needs summarization is inside tables. You're going to do some client, client survey or or your your you're getting the comments out of so you have a blog, and you have comments, right comments from your posts, and you don't have human power to process 1 million, 1 million comments. You can ask AI to please go to every comment, put it in the category, I like this post, or I didn't like this post, and now compute to see the sentiment analysis of the person look and and Stephanie, do you do in your classes? Do you do sentiment analysis on your data analysis classes?
for example, you get a tweet from someone, and you try to assess if the person is angry, happy. That's sentiment analysis. So be aware that I did some errors on purpose. Okay? Because we will know that chat GPT is very robust to this. And so what I want you to do is something like, I want to know if people are happy in their work or not. So please chat GPT try to do a summary of this information. Okay, so I'm not going to give you the prompt. Try, try to solve this problem. Okay, so I'm your boss. I want, I want to know so you work on my human resources department. I want to know if the employees are happy or not, if they like to work. But we have 1 million, 1 million employees. We are a McDonald's or a Coca Cola, right? And so now I have 1 million rows of of 1 million employees to process. I don't have time for this, so I would have to use chat GPT, okay, so please go to your go to your data sets folder and use this job commute survey slash table and ask 
chat GPT, well, I cannot say the answer, Right? I'm the boss. I just want the answer, are people happy or not? Write your write your answers in in chat, in the chat window. Please do I
my, for example, try to see how I approach this problem. Okay, I'm going to solve it my way. So I want structure. I want not quantitative data, but I want, I want to create, I wanted to create categories. So I will say something like, let me say the name of the column. So it's not this one. 



model comparison. Okay, okay, so we are going to skip this exercise because I don't have time, but there is the same data set, but in an image, right? So is chat GPT able to draw text from an image and structure it as a table. This might happen to you sometimes, right when we we scan a document with our printer or scanner and it scans as a jpeg or a png and not as text. Obviously, if you give this image or we don't have time to do it. But if you give this image to chat, GPT and say, extract the structure, the structure, in other words, the table, he will, he will do it correctly, right? Okay, so in a sense, and I hope I'm using this meme correctly, okay, I'm most fan of AI for working with and structured data. Okay, so an image, a JPEG file, is pixels. It's black pixels, blue pixels. And chat. GPT is able to extract patterns from it, pre text, right open. Text is unstructured data, and he's very good at summarizing, translating, categorizing, etc, etc. So this is what I brought to computers that usually worked only with 

quantitative data or ordinal data in this in the qualitative ordinal variables, right? Like married, married, single, etc. So let me, let me give you this, this funny example, because I just want to tell you about false positives and false negatives, and how you by being able to understand what the false positive and the false negative, you are able to assess critical thinking if the answer that the chat GPT gives you has a lot of false positives or false negatives, and which is worse. Okay, so let's, let's think about two problems concerning health. Am I fat or not? Or do I have cancer or not? Okay, as I'm going to argue, the one on the left is a very structured problem, because there is a formula that I can compute and determine if I'm fat or not. And let's not think if the formula is always accurate or not. What matters is that humankind, the doctors, are able, are able to with the computer the classical way, by computing data in, data out, being able to address this problem. So if I have 100 doctors, they will use the same formula. They will they will reach to the same conclusion, which is, I'm fat or I'm not fat, right? But the same doesn't happen, for example, with detecting cancer. So for example, there is no formula in the sense of, if a spot on an x ray is larger than three millimeters and the pixels are shaded gray with this amount of grayness, okay, there is no formula. What happens is that a doctor, when he's training to be a doctor, sees 100 images, and his professor says, these ones are in are people that have cancer. These are not okay. So the human brain learns by training with lots of examples of cancer, and lots of example with non cancer. And when, when I see a new a new data set, a new image, I'm I'm able to see this resembles more a person that I study that has cancer, rather than not, right. The difference is that AI, this is AI stats, I feed them with lots. So AI, as you know, probably know, trains with data. But the difference is that instead of giving them, giving ai 100x rays, I give them 1 million, right? And so he knows much better than a doctor. What is the pattern? Okay, the structure of an unstructured data and errors a little bit less Okay, in a sense, was it clear the difference between an unstructured problem and the structure problem. From a learning perspective, it is much more difficult to study to be a doctor than to be a rocket scientist, because rocket scientists everything is formless. Everything is deterministic. Everything is structured, right? And most of much of the work of a doctor isn't structured. So he has to see 100 images to find to find the pattern. Any thoughts on this. You'll see shortly how this is important for for for you, assessing the quality of okay, so AI and humans, human doctors, sometimes they error, okay, because this is not a structural problem, and there are the so called false positives and false negatives. So a false positive, it's when an AI or a doctor, human doctor, thinks that I have cancer, but I don't have okay, it's a false positive. But the opposite may also occur. Okay, so for example, I say, or the AI says, I don't think that you have cancer, but you do. Okay. So two kind of errors, one is worse than the other. We will be seeing about that. But in theory, AI errors less than a human Okay. Were you familiar with these concepts of a false positive or false negatives in your courses, in your line of study, 
I think Zoom is not allowing me to insert the text because it's too long, Okay, but so tell me on the chat window, what you got? What was your expression? 
this, this gives me also an example, and Bernard, as we talked about one day, creating this course for for for teachers, right? So let's see if I ask chat GPT, just to underline the sentences that are more important, right? So for example, put in bold the relevant, relevant concepts of this text. Okay, so for example, to all your students, when a teacher gives a text to a student, you should also give the learning objective so that you know for each sentence what you should get as a as a signal and as noise. Okay, yep, so I should not ask chat GPT, just to put in bold or underline the relevant concepts, if I don't tell them what, what? What are the learning objectives? So, for example, my learning objective, is to know the, I don't know the phases of butterfly reproduction, something like that. Now I paste the text and let's see if he underlines it. Love is in the air. Look, this is a false positive. You agree? Yeah, and maybe there are some false negatives, something that he didn't underline and he should. So now, now I'm going to assess that. Okay, so 


I would say that, according to the to the learning goal, learning objective, I would say that he did a pretty good job. So the the true positives, right? But some, some, some noise. And there's, I think Sara was say, we will see that sometimes false positives are not, are more, are less danger than false negatives. We will, we will do that, we will try to reflect on that. So, for example, a large, white I would say that this is noise. This is a false positive. It's the name. I believe this is a name, a name of a butterfly. Okay, so now I see that my problem here is that there are lots of false positives. So how could I change my prompt so that I could tell them I have to be more precise on the phases of butterfly reproduction, because he's not getting the topic as it should. So Silvia. Silvia is showing some in Bernard reproduction, life cycle of butterflies mate and lay eggs. Okay, so Bernard is saying chatgpt may take editorial license and dissociate love in the air with reproduction cycle. Okay, okay. So, so let's, let's send it here. So, so now let's, let's give you some some skill that I think you'll be using as students, because your teachers ask you to read academic articles and you ask AI to summarize academic articles. Can you tell me what would be a false positive? So let's, let's assume that you did a summary on butterfly reproduction. What would be a false positive and the false negative? Could you? Could you tell me examples of a false positive. 
anyone try to guess? Okay, I'll show the answer. Okay, so for example, love is in the air, right? Love is in the air is something that AI included in the Arctic in the summary of the article, that it should not be in the summary. So we think it was something important. So it's an irrelevant detail, okay, sorry, irrelevant detail. It might be hallucination, as you know, right? Or, or, or, for example, in an academic article, it gives offer emphasize, emphasizing parts that you know that I don't want this level of detail. This is not problematic, right? Because it's an extra text. I read it, I say, oh, let's discard this, because it's noise, but it's not a problem. The real problem is a false negative. So there is a sentence that is very important to be in the summary, and I miss it right? It doesn't think the page is about the Battle of Midway when it is about the Battle of Midway. So he omns That part, and this is more more serious, right? So to be short, right? If it has a high false positive rate, it got a lot of noise, but 
no problem. But if it has a lot of false false negatives. So your meats important details, you might get a different conclusion from the article, right? So usually offense are worse. It's like, it's like Sara said about, about this, this part of the of the of the problem. If AI says that you have cancer but you haven't, it's a little bit scary for the person, but then the person says, Oh, thank God, I don't have cancer, right? The real problem is that, well, in this, in this science as well, if it says that, please don't do any treatment because you don't have it, and then you do right? It has more more consequences. So the advantage of this is that if you have to read 100 articles, you don't have the human power to do it. Yeah, for your thesis, you have to read 100 articles. If you give chat, GPT, the 100 articles and say, summarize, me the main details. There is a small consequence that it's not troublesome. You will put some irrelevant details. Okay, just read them. You lose one minute of your life? Well, the problem is the false negatives, which is the relevant, okay, but the advantage you quickly summarize 100 articles, if you expect, and it's not worrisome that there is a 10% of 5% misinterpretation. I would say that 100 articles, you would compensate the small errors. Okay, so next lesson we are going to see that we will do some, some examples that we did here in our school with marketing, that I asked, I asked chat GPT to go to 100 web pages to try to find the name of the director of a school, because we wanted to give them target advertising. Advertisement, the text in the school's web pages is very unstructured. Each web page is different from each website is different from other website. And so you might expect some noise, okay, but no problem, because if I get a list of 1000 directors and he misses one or two, I still get the 999 put potential clients in my in my in my task. Okay, Bernie, would you like to wrap it up? 





Europe at a university level. It's really going to be impressive. That's all I have to say. Filipe, want to add anything to this.