15s.03

5:21PM Apr 30, 2024

Speakers:

Keywords:

science

prediction

good

social science

explanation

theory

paper

incentives

studies

tenure

thought

commensurate

people

compatible

social scientists

find

solution

point

talk

evaluate

since it comes from charitable sources where the warm glow is kind of enough for them, they're not too concerned about the quality of the science. It's part of their evaluation process, but I don't, my impression is that they're not too strict about it. But it's like, I

think when it comes to outright fraud or something like that, sticks there, but I think what I thought you were asking is like for specifying the constraints on generality, like how would you well that's what I consider

like what I'm gonna say researchers conduct like there's also like, the hacking does not strictly fraud. Yeah, like

hacking, registering, harking. Conditional stopping of data collection, like all of these are bad practices over claiming the generalizability of the results, but that's what will get you in nature. And then the funder will have the incentive to actually say that, Oh, we funded like we've been acknowledged in 20 Nature science papers this year of research that we funded.

I mean, this goes back to like the whole Effective Altruism thing, right? Like if your goal is I want I think that's what's that what's best for humanity is a cumulus in science, right? Then you don't care about the nature or whatever. What you'll put on your website is, these are my standards. All of the studies I funded met the standards I'm contributing to, like nobody's doing I think

they do. They are doing a little bit of that. Like, I know that when we got some funding from the Sloan Foundation, like we had to commit to that, all the data will be publicly available. All code will be put on an open source, license, a record that's informed that they are funded every they may they require us to take measures to set that people who use our tool it will be easier for them to pre register their studies. So I think they they can say these things because I think they want to align themselves with these goals of science. It's just it's not necessarily effective. Right? Because like we can, oh, we provide the Amyl file of the parameters. That's how we have different pages that you can, you can do what you can do. We can put the code on GitHub, but no one can use it because you're not maintaining it and neither isn't enough documentation of how to actually spin it up or do anything with it. So you always can like Oh, satisfy the requirements without satisfied without kind of achieving the spirit of that environment. And

honestly, like, it's clear that like, if you tell people this is what we need to do, like, but

I think it's the people would say that oh, there is it's not not a normative value, like Ezra Zuckerman is against pre registration, on the grounds that pre registration hurts science. So it's not that there is consensus that pre registration is a good practice or a pee hacking is a bad practice or null hypothesis testing is not a good way to evaluate these it's not there isn't consensus on those reforms. And then it's like, Oh, why are you forcing me to do bad science by forcing me to pre register my studies?

Well, also doesn't fund it. So I think if you're funding you kind of have a duty to be

super ease. Right? If there are enough people who disagree and then once you start Oh, what about a policy for people to have we only if you find the goofy fund, you would register this or like that are all of these arguments on philosophical grounds on these that registration is not a good thing. So I'm like enforcing something that's not clear. That isn't harmful, and then we will have to be the ones who force people to do bad science. And

still be laissez faire about it.

legitimacy like

nobody seems to be very opinionated, to the point where they will enforce things strictly.

There's some trust that has to

be focused on the funding agencies. Right. Well,

that's that's why I imagined like has the most leverage and ability to be that

gentleman as it sounds, I mean, probably they have more because if you can't publish the work that you do. No one is no one is doing it, to keep it for them to kind of learn and to be intellectually stimulated, and then that's where it stops.

evitable so one last point, another point of leverage that we don't typically discuss. There's also something about how the tenure review committee at Berkeley Haas, a big part of the review, like they take very seriously is pre registration of your studies. For better or worse, right. They think this seriously and he's like, oh, you can't just like write a BS pre reg like they. They look at like, Did you over prepare? It's too broad, but was it appropriately written?

Yeah, but it's,

it has pros and cons. But that's because

and also like, as if you say that at Wharton in the mid because you're Simmons as the deputy have research. Big. A big evaluation criteria is whether you do like peekers on your site or something like that, but that's like it's one man driven thing. If by the time they have holds is up for tenure, that might, that guy might not be there and then preregistration is that I think, yeah, but

I really like having it depart for example, tenure review, right? Like if it's systematically

and I thought it's formal, it's stated anyways, anyway, for me, it's like all of the tenure process thing is an ad hoc behind the doors, rumors on how things actually happen. Okay, probably.

So, I guess some do. The reason that I chose this one was not because I privately because I think there's no meetings that we had to do for this session. But also there's another topic where myself is that we only want to go over like more in depth that has been, like, mentioned a lot of times but um, since we're in the second master's, I think this is a good time to bring it up, which is about incentives and incentives as on how to make this like enterprise of science. Like more the way that it should. However, different scientists might define it. And the image that I'm showing is a point of convergence. Everything's off. Yes, thought of watching the sunset. And this is also the second type of common sense. Theosis. And what I liked about this, this book, especially this kind of opening a stand on social scientists, the belief system about explaining how their practice of creating knowledge is different consciousness, where natural scientists like don't have to do that. They do not sell products. The reason why they're not really obsess about distinguishing factors and common sense. So we're going to get into what I think this means, but I wanted to provide a brief overview of what we have so far. So last year science for the first two weeks, and we've been selling existential crisis maybe for too long. By the time it got to me here, we knew we knew that this was broken. We have some discussions about it. And I liked the structure that I'm invited Matt during the 10 week, because I think this week, that week was will be the transition to how we can think about solutions like quicksand, for example, this kind of perks and benefits analogy was not only the description of the product, but actually really provoke us to think more about how to build like more dots to connect the bricks or how we think how we can think more about the structure where at least the stick is. And so we talked some of the solutions like I think two weeks ago is way more about predictions how we why we should take them seriously how we can not maybe less about how he can take them seriously but on why it is hard to predict and why even why we may have to kind of invest with this kind of unpredictability. And last week, we talked about heterogeneous effects and how and why you should embrace small effect sizes, and also the constraints on the generality. This week, I think it is less clear compared to the past two weeks like what they're going to do, but I don't have to learn intended this. Well. I think one thing that we can discuss around the meetings that we held to incentivize scientists to take these two things so I don't think something's like we know that we have to take this seriously. So incentive says individual side it's like I think I want to pause at the sliding maybe briefly. Like if your brains about what your incentive sign and what you think your colleagues is incentive slash like relevant pieces, students or junior faculty or even senior faculty, and like, whether you think that is good or bad. So yeah, so I think there is a tension between the researcher trying to do good science and researching, trying to go for job security because um, I don't need to tell you why tend to be really protective of Kevin's for most of the center's I guess at all I could be an exception but but but yeah, like so you want stability now not only for our own sake of you know, having stayed stable job, but also as a researcher like God, we want to kind of do the work that is more risky activity. So that yeah, so I'm not saying that this thing is system should be like, about, like, I'm not going that way. But there's also the practice of doing actual good science, because if you were to say that on this kind of tenure evaluation system, and doing good science were kind of very highly correlated in terms of the objectives like it is based on the assumption that on the tenure process, or you know everything that he's doing, like the publication hiring promotion, you appointment processes, are actually reflective of what we mean by doing good science. So we have the ability to evaluate scientific publications. in ways that will be ideal for science. Your thoughts on this tension, which image

and perhaps one thing to the west and that kind of that I'm, I've learned, which is, this picture doesn't really change much once one gets tenure, right? It's not that once you get tenure, it becomes the case that oh, now I can like really align the work. I have the stability and security now I can align the work that I do with idealistic, like goals of fires, because tenured professors have PhD students who have to find jobs and have three papers as part of their faeces and job market assault or third job market paper. And it's, it's there is part of your responsibility is for these students to find jobs to get promoted to get tenure themselves and your collaborators. There is some sort of responsibility towards them. But also it's it's how you continue to be evaluated, right? Because when you are, we're not currently when I'm, whenever I'm meeting with prospective students or who might want to work with me they would ask me, Oh, where do your students go? Because like, if I'm going to go through this training, what is my what are my career prospects is if when, when someone and like I know because I don't Can I the first year I got hired, like can I give offer to the same student? Right, so she asked me like, Oh, what are your students? Like, I don't know. Like, you'd be my first student. And then she goes, and he went to Dunkin and asked him like, Oh, what are your students going? I don't know. Ask Abdullah. He's like he's one of my students. That's that's where they go. So it's like you need you do you do want kind of to for personnel, your incentive is still for everyone around you, who you collaborate with who used to be successful. And it's the success criteria is the same tenure success. criteria. So it's not that once you get tenure, it's you're free. From these incentives. You're still at the midst of it. It's like doing science is a social enterprise and you do it with other people.

I don't want to focus but I think that is exactly the origin. If you start making excuses that, oh, I'm doing this. Like you've ended up finding more and more reasons to continue this maybe dishonest practice of doing science, because oh, I need to be precedents. But yeah.

To find our one is to consider just very simply intrinsic and extrinsic motivators and the extent to which, like, how much weight we place on each one and when in our careers. The other one is like the definition of good science, at least in social sciences and clear. So I would argue that within your discipline or subfield everyone thinks they're doing good science and that the incentives are aligned. The problem is fragmentation and lack of agreement on what good sciences

it's going to get to okay, sorry, no, no, no, no. It's always good. Yeah, so this was just one way of like, reframing the question and and there are many things associated with this and guess. Yeah, the first one is, of course, tenure, everything leading up to tenure and everything that happens even after tenure. But also, Jason talks about like, constants. Again, intrinsic motivation, like this kind of thing. Could be like something more stable. All this kind of doing science could be more internal. Like, do you want to feel good that you're a scientist and I'm not an engineer, for example. Or not? I'm not a doctor or lawyer. But yeah, like. So the second part is the self satisfaction of doing good research. And we were going to talk more about what could exist in us and whether there's consensus at all within social science, but on one one common theme that seems to appear in a reading this was that just the one that is not. And it goes back to Richard Fineman, I suppose hyper pop science, which is on the first principle is that you must not fool yourself and UI the system. And this includes of course, camping, hiking like or even just kind of any practice that could make your scientific findings and results less credible and less generalizable, but possibly more likely to get published, or to feel good about like having the sense of having understood something. So

so that goes, maybe when one good thing to do is to note, whether this is something that currently exists in the problem space and it belongs to which node and if not, is that a new distinct node that should be added? So as we go unravel this, these thoughts we see how do they connect to the to the problem space.

condition. So when you said problems like golf, I will be talking about the symptoms of suffering.

These two points are two different distinct things that contribute to the state of the social and behavioral science. The first one is related to the incentives for the individual scientists obligation hiring promotion tenure, like that is part of the problem like the Publish and perish, so it's there. What I'm really saying and it's the second related to the second point, for me, it's like more of a confirmation bias. I think it came up at a price it was all exhibits like confirmation bias. It's like, you want your theory to be correct. You want your explanation to be the right one, you want to be the one who knows how something works. And then it's very easy to convince yourself of that. So the second point is more of a human attribute of it's not only that we have limited cognition, we have a confirmation bias. We will always want to confirm our favorite explanation of how the world works or favorite theory or whatever. And while the first one is more than institutional level, which is how we get evaluated.

So I don't want to send this class today but we talked about machine learning, because to for hypothesis generation, they want to be skipped. But, um, like, I think one memorable quote from him is that so the problem is social science. So, like, he was talking about, he called it social science in general. He said, no one really commits to the theory. So you can people are really good at making post hoc explanations, and you have to stay there yet. And if you look at all the theories and all the hypothesis, people have been generated, there's really nothing that you cannot like once you have structured data. And so anyway, when he said on people are not committing to the committing to the theory, I read it. I heard it as like people are not really willing to evaluate the prediction that is made. So

I'm not willing to to actually make a prediction, which is something that we have made and I are kind of rediscovering as we have this project where we asked experts to make predictions like people who studied game theory and who studied up liquid games to tell us what are the conditions under which punishment is going to be effective? I mean, we have author, the brand right is part of the project who wrote many of the papers on the punishment in nature and science on public good games, and it's very hard and we there is designing the instruments with us, that we will use for the experts, predictions, and it's very hard to get them to make predictions. Like I don't want to do it

so, I think that is one is people don't want to make predictions. And they use a language that's elusive language and terminology that kind of to make clear that whatever happens, you can come back and you say your prediction was wrong. So you can evaluate. People don't make predictions so they don't get evaluated, and people don't try to evaluate statements or predict in the form of prediction. So I think it's

so like, in this paper, what he did was, what they did was they generated hypothesis and to show that these kinds of hypothesis, like you need something useful, like the the meat, like I'm keeping predictions about what the judge is actually going to decide on lay people. And like whether it's some some kind of like constructs that has been terrifying discuss among psychologists are predictive of these kinds of features. And they I think the final step was set up as the actual, like district attorneys to kind of make the prediction or the judges still to make the predictions of how the other judges are going to predict like based on the hypothesis generated. And so I asked him, like, did you think of like doing this to psychologists because I'm thinking No, like, especially if they're specialists in this kind of topic of facial features like they are going to be experts. So they're going to know tons of hypothesis in ways that are much more clever than, than the people. And if you were at ad what you find from this kind of machine learning hypothesis, like how do they pay for it? And he was like, Oh, that is a really intense single post. Like, we might have thought of that. But um, people genuinely would not care. Like if, like psychologists like getting better at making predictions. Like they would change just like, you know, getting better at making predictions and understanding better why they're making these kind of predictions. But um, I think um, he he was kind of pointing to the he was making it more of a strategic comment, like how to fake the paper, like, depending on the audience, and audience like especially reading more on the policy side or the Econ like takeaway side. They don't care where the psychologist like good at making predictions. So I think that is telling about how people through social science and social science because he didn't mention the woman if it was like people and when it makes sense to do.

That actually tangential different kinds of discussion reminded me of perhaps I'm a single node in the in the problem space. I'm not sure exactly how to articulate it, but it relates to Mankins paper. on common sense and sociological explanations, which is I don't know if it's interaction of what but many social scientists evaluate their theory based off of whether they make sense subjective sense of having understood something rather than on using scientific predictions, which is like the more scientific standard for evaluation of theory. So like, unlike as I haven't had the whole response to this paper, for which then can respond it. But it's the whole idea is that oh, we come up with a theory as it relates to the verse teen or whatever that we are both the subject and the object of study. So we come up with an explanation. This is how society works. I am part of a society and that makes sense to me. So it must be right. So that's how you evaluate so it's like I don't know if there are two different things within the problem space but one is there is lack of making predictions. We don't evaluate predictions. And we use common sense as an evaluation criteria for explanations. These are three things that are somewhere

common sense I think, sociologists or social scientists in general might like to say, Oh, something that we agreed upon, because

yeah, I think it'd be conflated with it. Makes, it provides to gives you that subjective sense of having understood something. And there is a paper that I saw in response

to Turco and and Zuckerman, I really like how Duncan ended his response. So he says, My prediction, therefore is that as sociology becomes increasingly data rich, and sociologists become increasingly familiar with the methods of course at inference and out of sample testing, all arguments different that we use common sense we don't use out of sample testing, it will become increasingly clear that many interpretively satisfying explanations are really just stories. Conversely, it will become clear that the explanation that survived rigorous testing are not as satisfying as the stories with which we have become accustomed. And at that point, I predict that the sociologists have to choose between storytelling and science. Thoreau and Zuckerman, in contrast, predict that no such choice will be forced upon us that in fact, the most rigorous explanations will also be the most interpretively satisfying. I think they are wrong about that. But that is the nice thing about predictions we shall see. Right once you force and that's kind of and that's how kind of he committed to a prediction like I predict is I predict this is what's gonna happen. And their response which is like 10 pages long, with a whatever convoluted logic was used in it. He derived this prediction, I predict this and they don't like it they predict the opposite and I think they are wrong, but that's the nice thing about prediction. We shall see. So

we are the you're the class, we get this discussion with the exact paper. We invited them to talk about the papers. And I really like how the pied piper explicitly spelled said, it's a supplement. You call it a cake. It's just kind of an argument. But here's the point. I don't care if you can see

what's missing in this prediction is I think he should specify when we were going to

evaluate this or how to evaluate it. So when

because like if they say like he says this is wrong, but they took arranger command my reply Wait, wait, wait, we need like 10 more years.

Yeah, I made the prediction is a little bit takes time into account in the sense that it will increasingly become this so he's like, predicting the trend. So it should be that at any point in time if you are looking over time, then the trend there is a positive slope

existence of positive slope makes Duncan right makes that can lie.

That's why then becomes just a statistics. Right like there is no oh there is if there is a trend action is not monotone versus it appears not monotone because of noise, his as his prediction is monotonic increase. And then yeah, like how do you test that given the data and the statistics it becomes there is the noise the signal and it becomes a statistics but he made a prediction for which he could devise a test to convince him that he's wrong.

But Duncan thinks that explanation and prediction need to shouldn't be separate, right.

So the prediction that as sociology becomes increasingly data rec, and sociologists become increasingly familiar with the methods of causal inference and out of sample testing, it will become increasingly clear, these aren't that many interpretive ly satisfying explanations are really just stories. Conversely, it will become clear that explanations that survive rigorous testing are not satisfying as the stories to which we have been accustomed. So and then the rest of the paper is kind of the logic of why he believes in this prediction right because they are explanation and correctness are not separate and all of the rest of the paper, but that's kind of the prediction that he makes for which you could devise a test to convince him that he's wrong. And then he drives the prediction from Zuckerman and Turco. But what I, the reason I'm bringing up this paper, it's because it kind of highlights these three different things. For now. You know, what we did, we said that he thinks that is we the lack of committing to making prediction, it's kind of the same as when a reference point we don't make predictions. We don't evaluate predictions. We use comments. We use common sense to evaluate explanation, explanations that play the role of a theory and in science.

I think um, so I have one reading this kind of commentary in response from one thing that I've been wondering is that they argument in the committee was part of it, are you living in this kind of system? So it makes sense for us to, you know, the explanations about ISIS them it is not like natural science that is I think, what you'd like to see and I think on one one way to kind of support that point is that it's just set up to like nature doesn't care whether it be really long donkeys like understand gravity's obese or not, like rapidly just exist. While I'm there many social constructs that exist because we think they exist and we institutionalize them. And, sure, I think that would have made a better point in the commentary than I'm saying that and finally say that, you know, kind of we get this explicit comparison between social and natural sciences. And I just wonder how Duncan would have responded. Because I know that it's not saying that explanation is like useless. I know that that is not really important, but um, in ways of making predictions like it could be re challenging to have the values of a prediction as we do for natural science

starting with this goal, that we have in mind commensurability then I was like, how do we evaluate that in this lens? Is that based on predictions, like we're just guessing here, like are too, too scientific papers commendable if they are right? In the end, if we are able to make a better prediction about something that we are reforming, but that would imply that we have an instrument to this like, well, two, or two papers commendable like, one explains for example, and mechanism properly.

So what does commensurability mean to you?

That's a question. Because I was also thinking about this. Is it compatible in terms of like, we just like means we're building up towards something, right. So what is the perfect like minimizing prediction error? Or having the perfect explanation? I mean, at the end there, that will be probably somewhat the same but we're getting close to the, to the kind of person but yeah, how do we evaluate it to things kind of add up on each other

and things things can be commensurate but contradictory or inconsistent, right? So commensurability just means they are compatible, right? They are about they are of the similar of Legos. Right? And it's like, and maybe like two pieces can't fit together because they are of the same brand. So they are commensurate. They are compatible, but they just don't fit together. Because if for the object that you are building, it's these two pieces they don't fit you have to get rid of one of them because one of them is wrong. It's false positive. It's, it's it doesn't contribute to the structure that we are building but we are building a different structure than it would fit. So just for me like commensurability is that they are about they have the same brand of Lego. And commensurability is when two things don't fit together, but you don't know whether they don't fit together has nothing to do about the structure that you are building, but you couldn't even know whether they are of the same brand and they shouldn't be fitting together because you can't even compare them they are not using the same like spacing in the in their holes or whatever. And then it's just you can't even compare them you can't even reason about them. together at the same time. So that's where commensurability so commensurability is necessary but not sufficient for cumulative science.

Requires some kind of evaluation criteria. Not comparable with respect to watch

with respect to the phenomenon.

Then you can ask what is the phenomenon and how do we agree that we're both seeing the same thing? Because observing a phenomenon is an act of like, theory. Like there's theory embedded in your observation, you're choosing to pay attention to some things and not to others. The commencer ability thing is interesting.

And commensurability is elusive in the sense that things seem like they are commensurate but they are not but not the other way around. It's not that things that don't look like they are commensurate they are actually necessarily see we we have a conflict on whether they are commensurate or not for example. Again, like in physics, it would be like quantum and relativity. These two theories are implemented, they use different parameters, different language, different concepts, different conceptions and theological constructs to make their predictions and when we try to unify them is to try to come up with a theory that will make these two things commensurate with one another. But as soon as you say that, actually they are not compatible. One words on this scale. This other thing works on this scale, and it's not that we are trying to fit them together, then you don't have a commensurability problem. But if you had two theories about the same thing that again, like for example, evolutionary game theory and game theory will be commensurate with one another when it comes to trying to explain whether punishment is going to be good or bad for public good game. But once you try to have a theory like reinforcement learning, which requires very different kinds of input and structure, then it's not commensurate. They are not compatible to one another. And it's fine that they are not commensurate because we make them make predictions and then we say that oh, the one that makes the correct prediction is nothing that is better representation of how reality actually work and the fact that we don't know how to fit it with rational choice is not a problem that needs to be resolved. I think the other day when I talk about commensurability, I think about it in the things that are assumed to be the same, but they are not. For example, empirical results about punishment in public good games. Two studies, they are both about punishment in public good game, both are empirical observations, one finds, and they do it under different conditions and they find different results. And it can be the same authors, but then they are incommensurate in the sense that you know, maybe these results are not contradictory because they are. These two studies are not compatible. This one was on a public good game. Both are about cooperation, but this one was probably a good game. And this one was a field study about like sanctioning amongst fishermen somewhere. And it's like abstractly they seem to be related, but they are. They are not directly compound. So

you don't know whether it is context I think. Simple, if you can specify it and it becomes you know, not not contradictory. But

yeah, so it's, again, from a stability and design space. metaphor is the space. It's the one that makes the studies compatible. The studies are points and the design space make them compatible. There is you can discuss or talk about similarities and differences. But once you don't have a design space, you don't even know whether things are similar or different.

Is it necessary but not sufficient for humility?

That would be the argument I'd make and that we make in the vsps. It's necessary to make things compatible, so we can accumulate but once you make things compatible you're not guaranteed to have comparative science because

like making predictions, a way of making things is measurable. You're basically choosing a dimension where you can compare all of them. You're choosing

an outcome dimension, right? Not necessarily the description of the thing, right? No,

but you could evaluate like, even if, I mean you have to choose a different outcome dimensions, obviously, but even if you have two vastly different studies that no way if there may making a prediction about the same thing you could serve compared. It's

insufficient, though because you can be right for the wrong reasons. So you could have really good predictive performance. That is just the underlying reasoning is wrong. But

that's a problem coming from the evaluation design space right. But what you are saying was given that we designed the evaluation right, that is like Lego block the first brand has toothpaste and Lego block to the second brand is threes paste, and now we're defining six paste that can make them commit commensurable. So

there are there are, I think, maybe within the picture of what I think you're saying, and see what that what that be. So we have two different studies. That say this is

this is the first study and this is the second study. The shape of the study, is that outcome that they observed. So for example, let's say these are studies about public good game and effective punishment. The study found that punishment is good, which is a circuit of this study found that punishment is bad for promoting cooperation. So without having them in some space, and maybe both of these, these there's some theory that correspond to the predictions that they made and corroborated. And so this study, actually is good because this is a description of a headline that she mentioned and hudway is a benefit of adding punishment. So it would be effective there if you didn't have a space for which we say that these are different. Of course, there might be a study three. That's on top of this one that finds the opposite results without having so there are two things First, this these two results can't be true at the same time, because they are on top of each other. They are all the same thing. And they found different results, and they made different predictions. So one of the myths on this market failure of application or efforts, we're missing a dimension for which these things are actually different, but we don't know that they are different. But given that space of things that we think make things commensurate. This is this is a failure, potentially a failure of application to secure this is not a good situation to be in. However, not having this tower on top having this one. That's, that's fine. There's all things we have a space to tell us that these two things are different. So the fact that they exhibit different outcomes or the models make different predictions is is fine and there is a question then about potential boundary conditions or something like that. That's, that's what if we don't have the space, we don't know whether we are in this situation. Or actually we are in this situation, because there is no dimensions along which we can compare things. Coordinate System System is the design space that was so convincing to me that incommensurability is when we don't have the court systems. So we don't know whether things are similar or different or on top of each other. And if we don't know that we don't know how to put them together like how to move on. Next. Once you have a word that existed then you can start having these conversations. What I'm going to do for next week, I'm gonna do who can remind me how do I remove it how to strike command shift a call I what I'm what I'm what I'm gonna do, I added Duncan watch response. Data and Zuckerman the whole exchange is too long. Most of us have read it in different contexts. I think it really their response. The final response of Duncan summarizes his main point better than the original paper brings up what third collection point is, and then I think it highlights many of the issues that we are talking about in this class. I'm going to add it to my very short paper. I haven't read it without reading the other exchange. So maybe that's why it makes so much sense to me. If no, if you read it and it doesn't make sense. Maybe you'll have to go back and read that. But no one signed the whole

exchange picture. Yeah. It's kind of complicated, but just thinking through commensurability and like, using the analogy of a partially ordered set, where you're commensurate along some dimensions, but not others. And what that could mean for overall, kind of you know, identification of truth.

I'm a little bit confused between commercial boot and compatibility. Why does everything start with C? Which comes first.

First, you have to be commensurate. So we can ask the question, is it compatible or not? For example, when I say this situation, because we have commensurability, and we know that the circle is on top of this top. This is it's committed. We know they are on top of each other. And now we know that and this is why we know that these results are incompatible. They can't be both.

So so the order is get commensurability and then make compatible

this and that are compatible those results because we know that they are different from one another. So they are compatible with with one hour, so you can't compare. So compatibility is a question of you compare two things. You can't compare without having commiserate I mentioned along which you would do the comparison, but

design space also evolves. Can we use incompatibility to improve commercial ability?

Yes, and this is the case when I say that if you have a situation like this, you have a couple of options. You say these results are incompatible. How do we resolve them, we remove one of them. If we remove this pharmacy that's actually fraud or false positive or a bad study. So we're gonna move one of them. Now we restore order and this is another thing to say that actually you know what both of these are true, but they are not compatible in this coordinate system. Because we are missing a third dimension for which they are actually separate from one another. And then you use that incompatibility to improve the common stability of the design of the space that you have as an outcome of interest.

That was what I was trying to capture previously changing the environment or changing the product or the agent, like to axis is there any literature on that? outward or inward? So philosophical.

Another thing also, another thing that you could do is improve estimates right, which is to say that actually this top has not in the existing coordinate system. It's not on top of the circuit, but because our measure is crude. The standard is actually here and we have to improve our measurement system. But all of this is possible. Once you have commensurability to talk about similarity and differences and where they are coming from what you don't. Currently the state we have is we have a bunch of papers about the same thing and we don't know whether they are similar or different from another they are not compatible. We don't have dimensions on which we could compare so we can't compare them those things are on top of each other and they are this incompatible situation or they are compatible to one another.

Thanks.

That's what I think one if we were to get caught about commensurability from the readings, there is one that I liked, which is about LoL. So I'm gonna read it here. So he says, we never seem in the experimental literature to put the results of all the experiments together. So each point is as it's an experiment innumerable aspects of the situations are permitted to be suppressed. All of the dimensions along which you conducted the study are suppressed they are not written in the paper they are not. Thus no way exists of knowing whether the earlier studies are in fact, commensurate with whatever ones are under present scrutiny or are in fact contradictory, because we don't know the outcome of it. So we don't know whether this is kind of compatible or contradictory with the previous studies because we don't know where they are. And the coordinate system

so incompatible is necessarily contradictory. If the same is

not necessarily contradictory as in it's the negation of the other for example, one study might say that or the relationship between X and Y is is linear and positive and the other one says that it's inverted U. These are incompatible but not necessarily contradictory because for a segment of it, it looks like if it's monotone increase or something like that, so I don't know if I would use contradictory and incompatible as synonyms but we can I'm hitting you with different Yes, different and the whole court will work the same. So we don't know whether two things are different or not. Because all of these aspects are permitted to be suppressed. The dimensions along which things can vary. And then we don't know whether things are the same or different. So we can't put things together. So I think that's that polygon off for me from Moodle is really the one about the need for commensurability to be able to put the experimental literature together and to know whether things are contradictory or they are compatible with one another.

Since like Ian Duncan wants his mind getting this commensurability. Basically, in exchange, you're removing, like interpretive slack where those previously unspecified dimensions or suppress things, is the opening that people use to their interpretations and kind of fit things to their narrative. And in his mental model, like by very precisely specifying these every dimension, you are removing interpretive sock and then you kind of can access a more fundamental description of the situation.

So it's kind of reducing the space that people can go out. So it the it's not only the problem that we can meet except for when we want to kind of like, like societies have like commitment issues as simple set like, we can expand it to whatever we want, like we have no restrictions on how we can expand like the hidden dimensions and how we can get to get away with ideas without saying

and I think I think Duncan might agree with that description of the model. I think another way to think about it is that what he's saying is that when you've got customer stability, and then you have to explain why this is a star, you can come up with an explanation why this is a second you can come up with an explanation why that's across you can come up with an explanation. But once you have to miss your ability and your explanation has to fit all of them simultaneously, not one of them at a time. Then, you'd have this trade off either you'd have more complex explanations that are more satisfying. Yeah, so you have that you have the state of where more correct explanations that fit all of these simultaneously unless that's fine. And or you stick to satisfying wishes, but then you will have to ignore.

It does, but it's not obvious to me one I have more correct explanation that can explain all three points that are different, necessarily is less satisfying. I would agree that it's necessarily harder to find, like harder to access.

I think that would come from the implicit he didn't say it in the paper, the implicit assumption of high costs and density. So if you have high costs and density, which is so many factors interact with each other higher order types of purchasing of high cost density, then and more collect explanations that match the high costs and density. Yeah, they are more accurate. And we assume that people like simple explanations with a few variables and low order interactions. Did you have that to take off? I think that it is not necessary. In Jackson, Lee the trade off is introduced as you increase costs and density of course the density is very low. There is no trade off.

Right. So we talked about individual scientists and you know, we see this happening, but I think what could be more productive, like not just for this class, but for community in general is to think beyond like individual like of course, like, you know, Tinder is so so attractive. So how can you change that? So what are some objectives of social science the community that we should be looking for? So as individual scientists like doing good research, or you can start doing that by love for yourself? That's the first principle find the mission. But as a community, what are some things that aren't we want to go for like if we want to control how science is done, like, take easily how do we want my social science to evolve? And we talked about this like accumulated knowledge like religion, whether accumulated knowledge that will enable us to better predictors of intervention. Or, you know, some people might be a post prediction, like just having new framework about, like social constructs, the social contract concepts that people have not been paying attention to that could be useful. And bottom line like the instrumentalists point of view, when I get it, I'm sure you can invent new things on the screen, which should contribute to better predictions of the interventions. And if they

don't, it's fine. It's just not science. It's not again, like I always go back to the Lord of the Rings, right or, or the Hobbit. It's like there are things that are good ways to think about our religions. They are good ways to organize and think about stuff. And they might have consequential impact about how you behave and how things happen in the world. We don't call them science and that's fine. So

would you say that this kind of be without predictive like, utility would be kind of like more like ethics or, like the philosophy of politics where you lead people to think in these specific ways, but I'm not trying to

it's not it's not an ideology. Yes, and that's fine. That's not to say that it's not useful.

Like again, I think many people who are again, like, you know, something in my context that I know something about is religion, like there are people I will see I was like, oh, religion is really serving them. Well, they are like living their life. Happily. They are facing death with dignity. They are like they are rationalizing the bad things that they are going through in a way that makes them endure. And I don't and they have access, and they have a whole system and a framework of how to think about the world. That's extremely useful for them. I just don't call it sighs because it can't it doesn't. It's not evaluated based on the data explanations are not held to the standards of scientific explanations, which is the ability to make prediction on the reservation. And yet unseen situations.

And also, I think, social scientists like one, I mentioned this in the first slide, something that is not common sense, like you don't know what sciences like it is debatable, but we know the sciences about consciousness. And what we mean by that like, the conventions is on this resolution, social science paper, it could challenge the common sense of assumptions like use social science theories, and findings. And other people. I think it was the paper about devices Deep Steep D biasing humans. And the second part is something that is based on the correct representations of the world. You might believe that this is how the world works. But if it turns out empirically, that is now not how the world works. We want to let people know so in some sense, Matt also said this, we want to shift people's mind not just one person, but as collective. We want to kind of shift our common senses, or at least, what is the common sense for, say educated people, or scientists, but I'm in the C point I think, on this tension between like two camps isn't just like, it's like, how do we want to ship the product to people? Like, is it for the second part is, is this more to like as we're predicted of the situation? Or just enough to you know, we have this theory, we have this finding. For examples. It's, it's that diversity is good. And we have reasons to promote diversity, for normal reasons, everything. And if companies like start promoting diversity, possibly for wrong reasons, it's still something that was cited like that is some some something that some people want. If and I

have you seen the meta analysis that just came out, it's on archive on the efficacy of the effect of diversity, diversity and team outcomes. Have you seen for Hamad?

Anyway, it's, there is increasing evidence that again, like diversity is something that we might desire for normative values. And that's fine. But the empirical evidence of its impact on performance is at best, very weak. But again, like there is no something we want to do it and hold it for normative reasons. We try to justify it using

science. So it's kind of like the explanation that has dominated the market not because it is true like a free forum because it is appealing to people in what they think is fine. And

one thing that from the slide that kind of came to mind related to the problem mapping space is I think it's probably related to the institute's answer incentives, which is emphasis on publishing surprising positive results. Because it's, it sounds or seems as if it's slightly and that's really hard to grasp now is that we like to think we want to evaluate things based on common sense, which is they give us a subjective sense of having understood something that's how we're using it here. But at the same time, we value and have emphasis on surprising results, which are usually counterintuitive. Like when you're writing your paper, go check any of the social science papers published in Nature. I dare you not to found to find the word counter intuitively or surprisingly. Right. It's like oh, like everyone thinks punishment is good. Surprisingly, and because of if you can punish this and they give you the argument of why you should think X. counterintuitive, you're surprisingly we find why. So it's kind of, but at the same time, once they tell you what y is. Second, Aha, I get it. Now I can't see that actually, punishment is bad. So like you start with everyone thinks x, which is surprising, but once you see why it makes sense, it's kind of gives you the subjective sense of having understood something and that's why maybe the word is not common sense. As much as it's subjective sense of having understood something is even common sense.

Common sense is not very common, no. And it also doesn't make sense

it makes this kind of very swampy assumption of our common citizens. Yeah. So

I think yeah, there is like an emphasis on publishing surprising, positive results, and that is desirable for subjective sense of having understood. Yeah,

I think that's coming to the result, much to our surprise part, the only kind of goes against like, atmosphere, or like collaboration, but more to, like, get too serious competition. Uh, you kind of want your TV to stand out like in the nature plot of like, oh, something's been published. About X. on X. m&s evanescent onyx, like, first of all, it was 20 years ago. But anyway, on Twitter, like, people promote this kind of new nature. You're like, Oh, my God, like, post and you kind of, you know, have a peace of mind that it isn't really discriminately doing because what I do also think the exact same experiment is on a copy. But like, even if you managed to get the new chip it would need to be published immediately without the patient likes. Adding value to another beat up next.

Good, okay.

So what I wanted to what I was looking up towards, like, I think we could think about two different levels of incentives as scientists like so not to fool oneself is to be like trying to be high by forcing pre registration or being like explicit about whether you're doing your prediction acoustic emission, the higher the risk of hindsight and confirmation bias and how to reduce these. And also as a science scientists as weapons not to forget community itself. Ken wants makes a good point about how, like, for example, political science, sociology, sociologists, psychologists, economists, they all kind of go for the same topic. And if you just look at the title and the topic, day to day covering, like, I think someone did an experiment in the undergrad class about these gifts, the title of the paper in which students predict like which journal was published, it's very hard to do that before you go to the methodology section and you don't know it must be something. But um, what was the token was pointing to was that um, this is famous in sociology and equilibrium. Economists and these expenditures cannot be true at the same time, because they make different assumptions. But whether which assumption is to it and another, really just depends on the culture of disciplines and I think it connects to what Jesus

Yeah, everything is internally, really, not everything that things are correct up to their methodology, their own methodology.

So yeah, bias due to Hispanic communities and not just the individual, it's at its spot as a collected like, for example, economists don't do we have the incentives to respect psychologists or like sociologists don't really have incentives to talk to psychologists and economists. If they were primarily if their primary goal was to make sure that the people in the department get promoted and tenured and they get more funding? Because, um, and I think Duncan could have done because you're not solution oriented. If we were to a solution oriented challenge, whether it comes from political science. And so it's not just a scientist, incentives to self advocate their work, but also science as a discipline community to kind of protect the boundary and say that oh, this is we're not doing for example, a sociological word, economist, for example, and this is what sets us apart, like the seven DB complexify to make these assumptions and, and I want to kind of make this clear to do a little exercise, then I really just thought that Muhammad last week. So this is and we can add things. We can maybe move things around like the ones that we already have. We add things that we found from this reading this which I haven't made the font, which we can we can do the same. I think we may have enough time to kind of try to go but um, my version of this is I'm kind of changing the two axes. Like if you want to talk about incentives, we can talk about deliveries of like, individual scientists, how how do we incentivize individual scientists so that they don't fool themselves in this specific result that they have and making sure that they're not picking them up hiking? And they they're not just making up some kind of post title? And as a collective, it is more about the structure like as a collective. How do we make sure that we go into a more solution oriented space? And also the y axis is the likelihood of agreement which relates to average disability, but now I want to focus on the likelihood of agreement, which I think subset how, how likely is it that social scientists are going to read and decide the incentive structures?

And is this what you're calling organizational visibility last week, operational,

operational physical,

or what was that feasibility,

technical feasibility?

So for example, dumpkins suggestion on solution oriented social scientists may focus more on the collective structure site, which could have a big impact. But I can see many social scientists disagree. Oh, that is not science. We're not policymakers. We're not engineers. So yeah, so um, how do we want to do this? Or are they anything? Is anything else we want to talk about before?

I think this is a useful exercise to slightly in depth difference between which axes to use. I think this will be used for the immediate use of it and it's kind of to list out the solutions that came up in this week's readings and start positioning them along some dimensions. The dimensions I think can be a little bit continue to refine. So yeah, pick one and let's let's start

from well, once we kind of continue with what more I managed to set up last week, we can finally see the difference between the access because I think it's like kind of the sub setup. For example, this operational visibility is a subset of badness visibility and levels of incentives could be minor subset, but correlated with it. Yeah, yeah.

So yeah, so there are three things that we didn't like find a spot for incubation. Thinking. And we can also say, we, we have a decision

and that is crowdsourcing hypotheses and there is a lot of them read what's already there. Yeah. Oh, I see it.

It's LLM there. Ominous doing research on it.

I think LLM will have to be it's going to be a category. And within it, it's like LLM as subjects LLM as researchers. LLM has measurement. And these will have different impacts. feasibilities I think it's a category that Hamlet will have to, to work on. Yeah, I mean, solution oriented illness. What do people think about about that solution and that paper by Duncan Michael, you have something you want to say?

Yeah, I thought it was kind of a narrow, narrow, overly, like confined kind of questions to be asked. And also like, sometimes different fields are actually in answer different questions. And like having them all at a position oriented with with the purpose of like walking more in the field. Like the social purpose is different. So I thought that it was unfeasible. And I thought if it wasn't meant to be it would increase the cleanliness of some topics. Or something like that. Goldilocks is a set of like Goldilocks.

Yeah, yep. This is like the thickness of kind of like this. This has been some average, like average visibility, because we can explicitly know that it's not going for all social scientists doing solution oriented social science. So it is like he didn't see like a rich percentage. So presumably, even social science. So you if it's painless

kind of observing, I presume it's in a way that everybody could benefit from or solution oriented Ness, and that it would also like feed into better theory or at least create some kind of felt like he was gonna like write it, we should all do impact oriented science. And it's good because it's, that's the goal of science. And then he put a disclaimer to anybody's ear to be like, but I'm actually not saying that you're doing stuff wrong.

But you actually like do it that the beacon version, it's like, and he was like, it's going from one person from one relative of one Institute, to biology. It's like, do the social oriented stuff. Maybe people will follow in the footsteps of the Mesilla human because you still have the other fields are doing separate things. Right. Like I think he was proposing something really big and he like, just like, people would push back and he was like, oh, in order to like not offend people or I have to like at this thing. I'm, I'm gonna say this meeting version, but really what I mean is like the stronger version are the ones

I kind of agree. I think this belongs in high feasibility and lower like it belongs in the top left quadrant.

Type usability realm relative, like less than average impact.

I don't know exactly what I think does it depend on which interpretation because there is to write which is that oh, some some people doing solution oriented and this work would benefit all and that n which is kind of 14 where he ended the paper, right? Like with the biology labs, like that's, you know, what, just like there is it's a low risk thing, someone has to do a few labs, maybe one lab does it and he considers his computational social science lab at UPenn doing that. That's why like, he left Microsoft and he's like, Oh, solution oriented, like we have the problem and then we bring whatever theory measurements and we just measure are we solving the problem? He sees it? As kind of his labs work. NASA has what he's proposed in this paper right, which is like you know what? It is that once one lab does it will see that it's feasible. And can be rewarded. It actually solves the problem and then it will lead to better development of theory measurements, and you will think it's likely to be more problem oriented rather than method. or, or. Or paradigm oriented. And yeah, I mean, I think that would belong there. There is another one which is like if everyone what he really wanted to say, which is that everyone should be solution oriented. And I think that would be extremely difficult to achieve from structural incentives. It's, it requires some sort of consensus, right? It kind of reminded me of a solution maybe should be added to the Solution map or is it solution oriented so lol in his beyond, you can play 20 questions and when suggested three solutions, right, use visual information processing which is used by theory I like for everything. His second one is kind of like this one analyze a complex task. What he says is that the second experimental strategy to help overcome the difficulties is to accept a single complex task and do all of it. So we take which is kind of the solution oriented list. Let's pick something self driving cars or something and just do the whole thing. Get a car to drive itself bring in AI, vision, decision making statistical oil or signal processing, whatever you need to bring to solve that task fully. That kind of loan second solution. It's a task kind of locking describes more what kind of a task to pick. Here. He doesn't specify. He just says, accept as a community, a single complex task, a single complex task and do all of it. The correct experimental style is to design specific small experiments to attempt to settle specific small question, often as not, as I've said that why the empirical explanation bla bla bla bla bla and alternative is to focus on a series of experimental and theoretical studies around a single complex task, the aim being to demonstrate that one has fishing theory of a genuine slab of human behavior. All of the studies will be designed to fit together and add up to a total picture in detail. Such paradigm is best described by illustration. Unfortunately, I know of no single example, which successfully shows the scheme at work. I attribute this not to its difficulties. I thought the feasibility but it's not really being tight, which is kind of like Ken's argument, and it's been it's not that no one has done it. That's why somebody has to do it. Even if one lab should do it, it's not because it's impossible. It's just because we don't do it. And once we do it, it's kind of an alternative way of doing things we pick we pick something and do the whole thing. So

people's argument was about, you know, we should have much more people and much more social scientists are contributing to this solution. Social Science, which is the reason I read it, it doesn't have to be on the present, but significantly more than right now. Like I can see why would go much the way there's a piece of DNA, not only because some people would disagree with the direction, but also because if we want to put everything like hire like the best machine learning scientists, like people who are really skilled at quantitative work and people who have good domain knowledge about literature and everything, like first of all, like John told me, it's very hard to hire someone who's an expert in engineering in social science because of the odd part jobs in college. And also like yeah, this kind of try and trying to get people into like, one type of problem. Like some people might say, hey, we did it in like self driving cars. Like, why can we do it in social science, but I don't know how to respond to his argument that I'm just gonna be not as hard if we find it's because I know that he manages his lab comp, this was a science that is huge. One on One argument could be less because you don't get once you have this kind of all the resources and fun things like more if you if you're saying you should have more versions of the lab, like come in other Institute's as well. Sure about like, who, who could do that?

Yeah. And I think his response would be that that's why I suggest that one, does it show that it's feasible can be done and that it's good for science? And once it's done, that there is some fixed cost for being an early adopter of a new perspective and how to do science is like, Oh, I'm willing to pay the fixed cost, but then other people they will just they won't have that cost to bear as well. I

didn't fully follow meals. Third solution is that like a scholar should over his lifetime, have some very consistent research agenda and inspect different aspects of it.

The third or the second what you highlighted series The second

series of experimental

that would be something like driving car which is for example, we pick a task

is that the one that he hasn't known any example? Yeah, yeah. Is that

I don't know an example but that would be now. I think it'd be like that the self driving car which is like we take a problem a task, can we get a car to drive itself without without human driving it? That's the task then we do a bunch of experiments and studies they are all contributing to the same thing. Wasn't

life predictability, one mission life predictability, the paper we read? You mentioned that was a huge so with the task of predicting based on very high quality data.

It's might be it's, I think, I see it as a slightly different than life trajectory prediction. It was big in the sense that it's a single data, single study. And we are all trying to predict the same thing. The common task framework kind of made us all that version of this, but imagine if many researchers were able to get new data and they are developing new theories. They are all agree that the task that we are trying to do is to predict life trajectory, and that's how we're going to evaluate and then we do research for 10 years. That's more likely something I've encountered. It's not one single paper, like one paper is about just computer vision, one paper is about control. One paper is about signal processing 10 papers about different methods to or pipelines to connect these three things, and everyone is showing how much it's improving and doing that task over now two decades of research and we can see how much more a car can drive itself today than two decades ago.

But since it says a genuine slab of human behavior, I think self driving car may not be applied. At

my dad's it might it would be the way that Don can use it in his thesis to say, oh, like computer scientists work on the self driving car. We want something like that in the social science.

So when I asked Charlie, my advisor about is there any scholar that has worked through a very consistent agenda, he introduced me to Eric Von Hippel, and he has throughout his lifetime does the user centered innovation and that seems to be one just I thought of that as a one slab of human behavior.

But the question did it do all of it is it like one big task for which you can measure progress? Like a problem, you're motivated by a problem? That's the solution oriented risk, there is a problem for which you can see whether you solved it or not, or how far you are from solving it and seeing over time that you are making progress, and then it's fair to bring in any here you want. Any methods, any algorithms, any field is like, I don't care. It's like MIT something the point is it's solvable. And that's slightly a different mindset, right? Because if you're, if we want to publish an econ journal, then we have to use specific methods and specific theories and specific framings. It's not that the problem and the solution are independent from another once you pick a problem, you actually commit to solving it using what people agree our methods that we are theories that we are allowed to use to solve that problem. It's a different perspective, to say that to be truly solution oriented, you pick a problem for which you can measure your progress and just do whatever to make progress on it. Which is a much more like, it's a very it's a pure instrumentalist perspective, but the paper details how would you operationalize? Like, if you want to be an instrumentalist? How would you do it? And it's always the paper kind of makes arguments of how to actually operationalize that philosophical view.

So the published number of published paper cannot be the indication of an indication of

like, are you solving the problem

was hard to identify problem like I can't think of a problem.

cool climate or degree. I

mean, one one, I don't think we talked about it a little bit last time for example, reducing misinformation. How do you know Yeah, on the on the, on the how do you know that a self driving car is actually driving itself? Maybe the number of accidents, the number of like passing some tests there, even the evaluation criteria. Right is part of the task of the task of being solution oriented. And maybe fighting with if we actually don't care and Dave has a paper that just got accepted designers last week where they show that the harm from information is two things. How persuasive the misinformation is because maybe something is everyone saw it, but no one believed it and it doesn't change anyone's behavior or belief about anything then it has no it has wide reach but has no effect. So there is exposure of different messages. And then there is the impact of each message. And one thing they see the show is that oh actually misinformation has very little impact. misinformation and particularly about the vaccine had very little impact. The misinformation and although misleading information information that's not misinformation. It's true information but can make you framed in a way that makes you could nudge your behavior in a particular direction. Has 47 more impact on them on vaccine hesitancy is 47 times larger than misinformation. So true, misleading information has 47 times the harm of misinformation. And then it's for maybe the problem to be solved. It's not like oh, let's reduce the amount of misinformation because actually the it's the thing that's causing more harm is not misinformation. Because they don't have the the quantify both the exposure and that impact, because you have to be conditional on being exposed. You can have impact Yeah, but given that most people do it. Impact was very, very limited. So maybe like one one problem that platforms, social media platforms would want to solve to solve this reduce harmful content.