eurogenechange_subs

    11:24AM Sep 19, 2024

    Speakers:

    Razib Khan

    Keywords:

    selection

    population

    genome

    people

    genetic

    adaptation

    allele

    polygenic

    mutation

    trait

    neolithic

    frequency

    haplotype

    europeans

    allele frequency

    variants

    results

    gene

    means

    variation

    This podcast is brought to you by the Albany public library main branch and the generosity of listeners like you. What is a podcast?

    God daddy, these people talk as much as you do.

    Razib Khan’s Unsupervised Learning.

    Embryos have a 30 to 60% difference in the predisposition to severe diseases. With Orchid parents can identify their embryo at lowest genetic risk rather than just a random outcome. Our genetic calculator can show the impact of embryo selection. Discuss embryo screening at IVF with a genetics expert.

    Welcome to another episode of Unsupervised Learning. This time, I am also doing a monolog like I did last week. Yeah. I was not expecting to do this this week, but there's a preprint that came out that people have been talking about, and people have been waiting on for a long time. I think the main reason that people are talking about it is kind of like one particular finding, actually, and the preprint is huge, but one particular finding is the polygenic score for educational attainment and household income, and I think there was one measure for intelligence all increased in the ancient DNA sample in this preprint. And so it showed that, like if you naively projected it, you know, Western hunter gatherers from 8000 BCE had an average IQ that was like 88 as opposed to 100 which is a standardized European norm, showed that people got smarter dealing with Neolithic. I'm front loading the sexy, spicy, controversial aspect, but I'm gonna talk about that much later in this podcast, partly just because, you know, really people should read the methods and the supplements. And honestly, I like to read the supplements pretty thoroughly, and I did not read them thoroughly. I'm just going to tell you on this reading for this paper, because there's a lot of them. It's pretty abstruse, and I wanted to get this podcast out because people have been asking me what I think about it. And so, you know, I'm getting it out, but I'm probably going to read the supplements at some point, you know, in depth to for me to be happy about it. So the preprint is pervasive findings of directional selection realize the promise of ancient DNA to elucidate human adaptation. And yeah, the authors are Ali Akbari, Alison Barton, Steven Gazal, Zheng Li, Muhammadreza Kariminejad, Annabel Perry, Yating Zeng, Alissa Mittnik, Nick Patterson, Matthew Mah, Xiang Zhou, Alkis Price, Eric Lander, Ron Pinhasi, Nadin Rohland, Swapan Mallick, and David Reich. So Akbari is the first author and Reich is the last author. So it's out of David Reich's lab. And I met Ali Akbari. We've had discussions at the American Society for Human Genetics. I know he's been working on this forever. For a very, very long time, out of the Reich Lab, where mostly they study these population structure, phylogenetics, population history, Phylogenomics. But they didn't really start doing stuff on selection until the last couple of years. You know, I think Vageesh Narasimhan, who is now at UT, here at University of Texas, Austin, he did some of the work, and there was a couple of others that came out. But this is really and I think Iain Mathieson, who's at Penn now, also out of the Reich Lab, like Vageesh. Stuff has been coming out of dribs and drabs, but this is kind of the big one, and they attached it with a selection browser. So it's a really good browser. You should check it out. You can look up the snips with the RS IDs and see what the results are. In terms of the author list. I don't know everybody here, but I will tell you, Ali Akbari, you know, I don't know him well, but you know, we have, we talked probably for an hour at ASHG, I know his work. I know David Reich moderately well. I of correspondent with Swapan Mallick, Nadin Rohland, I don't know well, but I know her name, Ron Pinasi, He has a lot of DNA. You know, he's an archeologist or a genetic anthropologist, a lot of DNA. Eric Lander is a big deal. I don't know why he's on this paper, honestly, Eric Lander as some of you know, you know, kind of found the Broad Institute, probably the biggest, mover and shaker in American genomics, aside from Jim Watson maybe, and Francis Collins and Craig Venter. You know, of those, he's up there, but he's not as known by the public. He's extremely smart. My former boss, Spencer Wells would tell me that when he was in the Lewontin Lab, Richard Lewontin’s lab, sometimes Lander, Lander is a mathematics student, and he came to give a talk or give presentations, and Stephen Jay Gould would be there, and Lewontin would be there. And, you know, it was Lander was there. It was, it was very interesting times. But, and then, of course, you know, Alkes Price is on this paper. Alkes Price, I don't know him, obviously, but, you know, you know, I don't want to say that he's like von Neumann, but his students are people that work with him. And likewhen you go to a presentation at a conference, it's like he's von Neumann, like his, you know, statistical, mathematical brain is just not from this world. So the fact that he's on this paper, you know, I don't want to say it reassures me, but it does reassure me. I don't know most of the others by reputation. I mean, Nick Patterson has been on this podcast. Nick is his genius. He does a lot of you know the math, you know, creation of new statistics. Alissa Mittnik, she's done some good work with with the Reich lab before. So you know the other people. I don't know aside from Ali, but it's actually a relatively short list of authors. That's why I'm actually like going over this. It's a short list of authors. I thought there would be a lot more here. So, for example, if Alkes Price went through everything, I don't think that there are massive technical errors. I'm very skeptical that there would be. Nick Patterson, David Reich, Ali, none one of them are slouches. So, this is going to be in Nature or Science, probably Nature, if I had to bet. But in any case, there's massive supplementary info, which is like a book in and of itself, and that I have not read candidly, and I will at some point, but I'm going to go through this paper and talk about what they discovered, because everyone's curious and talking about it, you know, it's the IQ, the behavioral stuff, but really, there's a lot more here. So I’m going to read the abstract really quickly, and then I'll just go through and I'm going to give some preface about selection, adaptation and these sorts of studies. “We present a method for detecting evidence of natural selection in ancient DNA time series data that leverages an opportunity not utilized in previous scans: testing for a consistent trend in allele frequency change over time. By applying this to 8433 West Eurasians who lived over the past 14,000 years and 6510 contemporary people, we find an order of magnitude more genome-wide significant signals than previous studies: 347 independent loci with >99% probability of selection. Previous work showed that the classic hard sweeps driving advantageous mutations to fixation have been rare over the broad span of human evolution, but in the last 10 millennia, many hundreds of alleles have been affected by strong directional selection. Discoveries increase. Include an increase from ~0% to ~20% in 4000 years for the major risk factor for celiac disease HLA-DQB1 arise from ~0% to ~8% in 6000 years of blood type B; and fluctuating selection in TKY2 tuberculosis risk allele rising from ~2% to ~9% from ~5500 to ~3000 years ago before dropping to 3%. We identify instances of coordinated selection on alleles affecting the same trait with the polygenic score today, predictive of body fat percentage decreasing by around a standard deviation over 10 millennia, consistent with the “Thrifty Gene” hypothesis that a genetic predisposition to store energy during food scarcity becomes disadvantageous after farming. We also identify selection for a combination of alleles that are today associated with lighter skin, lower risk for schizophrenia and bipolar disease, slower health decline and increased measures related to cognitive performance (scores on intelligence tests, household income, years of schooling). These traits are measured in modern, industrialized society. So what phenotypes were adopted in the past is unclear. We estimated selection coefficients at 9.9 million variants, enabling study of how Darwinian forces couple to allelic effects and shape the genetic architecture of complex traits.” Okay, there was a lot of terms in there. I'm sure some of you are confused, just going through really quickly I'm

    seeing a hard sweep. So a hard sweep, just so you guys know, and I'm going to talk about this again, is selection at a specific position in the genome of a single mutation, and then it rises in frequency, and that's what's driving the adaptation. Strong directional selection, positive or negative, you probably understand that fluctuating selection is, you know, frequency dependence, going up and down, that sort of thing. And then the allusion to these traits are measured in modern industrialized society. So what the phenotypes were adaptive in the past is unclear. Well, example, would this be sickle cell in African Americans. Obviously, we're not farmers in Africa, or they're not farmers in Africa exposed to malaria pathogen, and so it's not adaptive today, but it's an echo of the adaptation of the past. You know, where my family's from, Bangladesh, there seems to be a cholera adaptation. I haven't checked if I have it, but if I do, it doesn't really matter in the United States, right? So a lot of these things are the ghosts of adaptation past, but it actually shows, if you look empirically at the results, and I'll go through some of the empirical results, a lot of the adaptations and selection signatures are pretty continuing into the present, pretty strongly. But first, let's talk more broadly, about selection and adaptation. And you know, there's an allusion to Darwinian forces and allelic effects. And I think this is the allusion to the fact that this is obviously in a Darwinian evolutionary biological framework. But the focus here is on genetic signatures, genetic dynamics, using genes as the tracer tracker, or the currency for evolutionary processes, right? So, as I’m assuming you know, selection is necessary for adaptation. So when you go back to Charles Darwin and his theory of evolution through adaptation, via natural selection, you know, basically what you have is a situation where adaptation needs heritable variation, and then selection correlated with that heritable variation to affect a change in the population. So, if there is selection in the population, but there's selection is operating on a trait that with that's not heritable, then adaptation is not going to happen, because the offspring are not going to inherit the trait, you know. And if you have selection on a non heritable trait, it's irrelevant. And if you have the trait and it's highly heritable, but there's no selection, obviously adaptation is not going to happen. You have to have both of them, right? And so this is why biologists, evolutionary biologists, go into genetics and look at genetics, and have Drosophila and other other sorts of model organisms to study evolution, because you use genetics to explore the patterns of evolution. Because evolution occurs through inheritance, generation to generation. It is driven by allele frequency changes, and alleles is just an old generic word that means a genetic variant or genetic marker. It basically today, often means SNP, snip, single nucleotide polymorphism. But we didn't know the DNA was a substrate of inheritance for a long time. So all of the new things and the new terms actually apply to the old terms, but, you know, the mapping is a little weird, but an allele just means basically an inheritable it's like a variant that is part of - that is a variation on a gene, and a gene, in the old fashioned way, would be just a discrete unit of inheritance. Today, when we think of a gene, we think of a start codon and stop codon and it’s a biochemical reality, but actually somewhat different than the reality of how genes were initially conceptualized 120 some years ago when genetics was rediscovered after, you know, Mendel, after Mendel's period in the 1860s and 70s, right? So, so you have this situation where you're trying to bridge adaptation, Darwinian forces to genetics, and genetics is your window. It's your tracer. It is, yeah, it's basically a map of evolutionary process and adaptation. But the mapping is not always so easy, and a lot of times you have signatures of selection. You know the selection is going on in parts of the genome, but mapping it back to what the pressures of the external world were is not always easy. It is easy if you have Drosophila, if you're breeding flies, fruit flies, but you know, we're interested in humans, we're interested in other organisms. We're interested in the ecosystem. That is much harder. So we have a lot of data now when it comes to molecular genomic methods, but fitting that data to Darwinian, adaptive stories is very, very difficult. And I think you'll notice that they were - they're pretty cautious, and didn't overemphasize that, partly because the methods are pretty intense what they had to do. And I'll get to that. I also want to say I. Yeah, you know, selection can be non genetic. So, you know, the Cultural Revolution people, Joe Henrich, Robert Boyd, Pete Richardson, Marcus Feldman, the late LL Cavalli-Sforza, try not to leave anybody out. So the people are, you know, gonna get annoyed, but you know, and anyhow, Alex Mesoudi did a podcast like three years ago. Now, I think three summers ago, three summers ago, on cultural evolution. You can listen to that. So evolution can can occur through culture. This is the dual inheritance model. And obviously that has nothing to do with genetics. So, you know, you have adaptation on a cultural level, you can imagine that it's a little more difficult to test and select for, because, you know, there's no genes there. There's memes, and memes are not no one's really discovered a way to track a meme, if that makes sense, in the same way that we can test a gene, because a gene is a physical entity. It's something that's amenable to technological inquiry, which is really what's transformed genomics over the last, you know, 25 years. So in terms of detecting selection on the molecular level, this started to become a thing. Well, 1960s to 1970s this appeared in the neutral theory and also molecular evolution. And molecular evolution just basically means DNA is a molecule, and how is this molecule evolving? Also, there was some early work with other types of variation that's correlated with the DNA, Alzymes and protein variation, amino acid variation. I don't want to get into all the different types, but really, people mostly focus on DNA now, because that's really the important one. That's the ultimate root of a lot of the variation. There's forms of detecting adaptation and selection that are between species or between really diverged lineages that started out very early. A lot of this comes out of the Drosophila world. So some of you might know dnds. So dnds is a statistic that's looking for selection within the genome. So you have a gene, so you know this is going to be in Drosophila, usually. And you have two types of Drosophila, and maybe one, I don't know, one can digest. One can digest like some sort of fruit very well. The other cannot well. I mean, what's that? That difference is heritable, right? So you know that the flies diverged whatever you know, generations ago or 1000s of years ago or hundreds of years ago, depends. Sometimes they're within the same species. But the point is, okay, they have a heritable difference. The differences are usually fixed, which means that they're totally different between the two species. And then you do things like dnds, which is looking for evolution and like changes across the two lineages of non synonymous positions,

    okay, but I gotta go back here, within the DNA, it's organized into codons. And basically there are some mutations that have an effect on the protein coding and some mutations that don't. Ones that don't are synonymous, the ones that do are non synonymous. And basically what you want to do is the non synonymous ones that mess up proteins or change them, mm somehow, you want to look how fast they're mutating in relation to how fast the ones that are synonymous are mutating. Since the ones that are synonymous or mutating are neutral, they have no good up or down effect. They're controlled by drift. The ones that are non synonymous are usually very, very constrained, because you don't want them to mutate. You don't want to break the protein. But sometimes the mutation is good. In very rare cases, then you have, you can have positive selection. So one population maybe is adapting to a new fruit, there's positive selection. And so there's a non synonymous mutation that goes from zero to one, whereas the other population that stays at zero, it's constrained. And so you compare the non synonymous positions like, Oh, they're different. Well, I mean, is that region of the genome, you know, what is the variation on the synonymous positions? Oh, this is, like, way bigger than the variation on the synonymous positions. I kind of use a obvious example, but it's a little bit more subtle than that. Anyway. That's one way you can look at selection across species in the molecular

    lens. Another way is called Tajimas D. And you know, basically a lot of the Tajimas is D and these sorts of methods rely on comparing types of variation in the genome. And so, for example, heterozygosity, which is whether you know any given position, you have two copies of the gene, any given position of the two copies, the two copies match or don't match, and then compare it to the overall diversity of the genome. And like, look at the ratio, and that tells you whether the population is increasing, decreasing, subject to selection, not subject to selection. You know, it's a little hard to explain why. A visualization would help. But basically, these are some old forms of detecting selection. I mean old forms in terms of they go back decades. They're pretty straightforward. They're usually looking at a single locus or region of the genome, and they're looking at species level, interspecific differences, looking for differences between species, right? I'm going to be talking about totally different thing - within species types, within, obviously, within human beings, within our species, over the last 14,000 years. So some of these methods, they don't work very well because, you know, non synonymous, synonymous, all of these things, there hasn't been that much time for a lot of mutations to build up, etc, etc. It's just like the scaling is off. It's very little time between the separation between the lineages. So you need to have different types of selection, right? Or different different types of molecular tests, and I'm going to get to that, but first I want to talk about the different types of selection, just so that you have this in your head. Positive selection is really straightforward. You go zero to one, it's positively selected. It's good. So zero would be like, there's a new mutation, and then it goes, goes to 100% right? Negative selection is the opposite. Obviously. It goes from a high frequency and it goes down. You know, usually it's like you're moving in a new environment, or something has changed. So there's something called background selection, which is often it's related to negative selection. And this is the kind of thing that happens when you get a new mutation that causes a slight bad effect, and it's kind of slowly purged out of the genome. So there's constant background negative selection going on. And you know that affects some of these statistics that people are generating, you have over dominance. This is really straightforward, and frankly, I don't think it's that important, but it's sickle cell is an example of over dominance. And over dominance is selection for the heterozygote often. So basically, sickle cell, if you have a mutated copy and a normal copy, you're fitter than the other two. The issue with over dominance is, if you do the math, you can't have too many overdominant positions, or you have too many people with two bad copies, because independent segregation. There's a segregation load. Just think about it. Like, if you had, like, I don't know, 20 over dominant traits, there's going to be a minority of them where you're messed up, probably, you know, and it starts to build up over time. So overdominance is a real thing. It happens, but I don't think it's super common. This is a much bigger deal. Uh, frequency dependent selection and stabilizing selection, they're related. Frequency dependent is describing more of the dynamics, while stabilizing selection just means that the trait value you're going for a midpoint trait value. So imagine you don't want to be too tall. You don't want to be too short. Stabilizing selection keeps you in the middle, kind of, you know, and like, there's, there's men, just random genetic drift and, like, segregation, variation keeps, like, a little variation around the middle. Height, intelligence, maybe those are under balancing selection, stabilizing selection. And you know, often they can be driven by frequency dependence, but not always. But frequency dependences, there's two types. There's positive and negative. Positive frequency dependent is kind of very similar to positive selection, because it just means that the higher your frequency, the more you're selected, right? Arguably, selection on a recessive trait could be positive frequency dependent, because when it's at a really low frequency, the gene isn't expressed very often, so the selection is weak. As it increases in frequency, the gene is expressed more and more as the trait that's selected. And so selection should become stronger at that position, right? So that's kind of a form of positive frequency dependence. A negative frequency dependence is a lot of the immunological variations, where as it increases in frequency, bacterial or viral adaptations occur. And so it's fitness drops. In these cases, if you have a rare mutation, that's often better because there's nothing adapting to it. So these are situations where these regions of the genome are highly diverse, and they're diverse because there's negative frequency dependent selection, where when they get really low in frequency, they get extremely adaptive, and then when they get higher in frequency, their adaptiveness drops again. And so they'll often be cycling of some sort, the HLA loci, you know, human leukocyte antigen. These are immuno loci, these are very, very diverse. And, you know, they show up in this in this paper. I mean no surprise. I mean no surprise at all. I want to mention something also relevant to the molecular genetic evolution aspect hard versus soft sweep, hard sweep, which is mentioned in the paper, is really, like, classic, there's a mutation, there's a single mutation, and that just drags, like, its whole region of the genome. And, you know, it's easy to, like, see its effect in the genome, because it's this one single mutation. It's destabilizing region of the genome, creating, like, you know, deviations from all sorts of statistics that jumps up. Soft sweeps are more difficult to detect because it's a situation where you might have a bunch of different mutations that are floating around in the population, and all of a sudden they get selected. And so all of them increase in frequency at the same time, but obviously they start interfering with each other, so the signature of selection in that region is going to be weaker because it's different variants increasing in frequency, right? So, for example, imagine a situation where 90% of the frequency is one allele, the major allele, and then you have, like, I don't know, like, 10 different minor alleles at the 10% what if those 10 different minor alleles have, like, they break the function of the trait, and all of a sudden, that's beneficial. What's going to happen is the 90% is going to start to drop through negative selection against it, and those 10 different minor alleles will start to all rise up, and that's going to be a sweep upward. That's gonna be a soft sweep, because those 10 are those 10 are going to be competing with each other depending on how good they are, maybe. But if they're the same, they're just all going to rise up at the same time, right? And so that's a that's the kind of thing that probably is actually kind of similar to what happens with stabilizing selection on quantitative traits. So think about height. You know, there are lots and lots, like hundreds, 1000s of genetic positions that control height, and there might be selection for tall or short in some populations. Who knows? Well, in that case, it's not selecting for a single variant. It's looking for whatever's there within the population, within the individual. And so a bunch of different variants are going to change in frequency together. And, you know, it's not like, technically, like a soft sweep, because not going from zero to something, it's stabilizing. But this dynamic is common where you have, you know, polygenic, many genetic positions, subject to selection, harder to detect, obviously, than the hard sweep from a single mutation. That's like a positive directional selection, right? I talked to you about DNDS earlier. You know, within species, there's different types of ways to detect selection. So when the genomic revolution happened, we had the whole genome. One thing that was really popular was using haplotype structure. And haplotype structure just basically means haplotype is basically a sequence of variants in the genome that are correlated together because they were co inherited together often, you know, with an individual. Because, remember, your genome, your genome is, abstractly, It's organized into DNA, 3 billion base pairs, but it's two copies, and those two copies are physically organized into, you know, well, 22

    autosomes, X chromosome and the Y depending on if you're male or female, 2 X’s if you're female, okay, that's how it's packaged. And they're organized that way, and the recombination swaps them around that way. But they're linear. They're linear. And so there's two sets of your, of your genome, really, right? And those two sets have had mutations and variations that are linked together. If you - so for example, my children, they have, you know, two sets of the genome. And one set actually is has a lot of variation from South Asians. That's me. The other - the other genome is Northern European, their mom. And so the haplotypes are very distinct. Are going to be very distinct when recombination happens, they have offspring, because the segments are going to be very distinct from each other, because they have different clusters, the genes have different clusters of variations that define their haplotypes. Okay, so haplotype is a line. It's a segment on the genome that defines correlated markers, correlated variants, usually SNPs, and when you have selection, when you have a hard sweep, the haplotype structure gets totally disrupted from normal expectation, because usually the correlations across the segment of a genome is pretty small because recombination broke apart the associations. Genome is swapped back and forth 20 to 30 times, or 20 to 40 times per generation over the generations, the correlations disappear. But then you have a hard sweep, and then what happens is the whole region around that genome, around that mutation that's good, hitchhikes along with it eventually. You know, over over the generations, the segment will get shorter and shorter that's pulled along because recombination breaks apart and swaps, right? But you have, like, for example, lactase persistence, LCT, that gene is very, very - it has a huge haplotype, and that's because it was an extremely strong signature of selection that was very recent. So those are the two conditions for a long haplotype. Also, there are regions where recombination is suppressed. I don't want to get. Into that. It's more of like a genome architecture organization issue. But in any case, you're these long haplotypes that they were finding using the whole genome. They're like, Oh, signature selection. So where are the long LCT, lactose persistence is one. Another one is SLC24A5, which causes lighter skinned in Northwest Eurasians. And so these haplotypes are OCA HERC2 is also another one. It's blue eyes. Is associated with it other things, various forms of albinism. So haplotypes are associated with long haplotypes with selection, strong selection and recent strong selection, right? But the issue is, you know, obviously that's not going to detect everything. Another way you could do it is, I look for outlier snips. So, you know, there's one statistic, like population branch statistic, where, okay, you have, like, a population structure that you know, and you know there's certain allele frequency correlations that you're gonna see. So imagine you have a population structure of Europeans, Chinese and Tibetans. And I'm selecting this for a reason. And so when you generate a phylogenetic tree with the whole genome, the Europeans are, you know, they're like, like, 10-20 times 10-20 times more distinct than from the two East Asian populations. But you can look in the genome for deviations from that expectation, like regions of the genome, or, you know, SNPs, where one one population is distinct from the others, but it's it's not what you expect. So what you what that would be, would be where Chinese and Tibetans cluster with Europeans, right? And so this is looking for outliers, and that's how one of the major adaptations for altitude adaptation from Tibetans at EPAS1 was detected. Turns out that's from Denisovans, actually. So that's one of the reasons why, you know, the Europeans and Han Chinese are similar and the Tibetans are different. But the point is, you know, you're looking for outliers, right? Other ways you could do it are like looking at the site frequency spectra. I don't think I can like describe that really easily. But, you know, obviously you have a situation where you have lots of you have, you have, you know, hundreds of millions of snips of variants. Most of them are very, very rare. Some of them are very, very common, right? And so when you look at the site frequency spectra, you can try to figure out which might have been selected. Like, why are they common? Like, is it drift as a selection? And the site frequency spectra has certain expectation and deviations from the site frequency spectra and certain regions of the genome might tell you something, right? That, you know, it's, I think that's the best I can get out verbally, because you really need to see distributions. So what they're looking what they're doing here, though, is somewhat different. They have 1000s and 1000s and 1000s of samples, obviously, I said earlier, like, what is it like? They have 8433 West Eurasians. So these are the ancient DNA sample that the Reich lab released. And they have 6510 contemporary people. So contemporary people are from, I think, UK Biobank, and like the 1000 genomes, and I think the UK Biobank they curated it so it was good representation of worldwide genetic variation. So in any case, they focus on West Eurasians, because that's where they got the data. You know, there's a lot of them. You know, there's very, very few ancient genomes out of India, very, very few. I think, yeah. I mean, it's bad. Anyway, I don't want to talk about that. But any case, so they have all of these, you know, ancient and modern people. And so what are they going to do? One thing that you can naively do, and I, you know, I've done it myself, is just look at the trajectory of the alleles, right? So, for example, 90% of Western hunter gatherers in Western Europe, Mesolithic hunter gatherers had blue eyes. And, you know, we, you know, okay, we didn't like, we don't have pictures of them, but the OCA HERC2 locus that causes this. I was just a condition. It's not condition. Some of the people that I love, the most have blue eyes. But any case, this trait is found in Western Hunter Gathers. It's very, very dominant. It drops really fast when the Neolithic farmers show up, and that's probably because the Neolithic farmers have very few. I mean, they're not like 100% brown eyed, but their fraction is pretty, pretty high for brown eyes, and then it slowly goes back up, never, to 90% obviously. But, you know, somewhere in the middle, and what you can do there - the intuition is, okay, you have these three ancestral populations. You have the Western hunter gatherers, you have the Neolithic farmers, and you have the steppe populations, right? And, like, it's in the browser, they're really looking at, they're really partitioning and looking at these three populations in this model. But in any case, all these these populations have, like, different frequencies of different frequencies of different things. Blue eyes are for the Western hunter gatherers, high. It's lower than the farmers and the steppe people. It's in the middle of modern Europeans. The issue here is it should be considerably lower in modern Europeans, just by looking at the combination of those three populations, just assuming, like, no weird drift effects, or whatever it should be, should do the mental math here, because, like, the maybe the frequency of the alleles should be, let's say, like 10 to 20% probably closer to 10, but it was across all of Europe. It's closer to, like, 50% okay. Well, the naive reading of that, just like looking at it visually, would be like, oh, there's been positive selection for blue eyes, or, like, the underlying gene, it could be another trait for the last 10,000 years, because the Western Hunter Gatherer trait persists to a much greater extent than the WHG ancestry across the whole genome. And there are other other traits like this too. Okay, and people have gone through in their papers on this. They've gone through comparing, looking at the trajectories and whatnot. But you know, the problem here is obviously ancient populations are not modern populations. And you know, it's, it's just, it's a little bit ad hoc. So what they did here

    is use a you really need to go to the methods here. And this is gonna be hard to describe, so I'm gonna try to make it quick and succinct, but like, jump to the methods please. They use a generalized linear mixed model to correct for the population structure. So population structure is a major confounder. A lot of tests for selection, especially when the selection is polygenic, so there's multiple genes. Kind of weak effect at each locus.People picked up population structure and assumed that was selection. So this happened, for example, in height differences between Northern and Southern Europeans, those are heritable. Those are genetic. Okay, that's true. There's also population structure differences. And what happened was early methods that tested for selection, confused the population structure difference, a heritable population structure difference with selection, and so inferred a lot stronger selection than there was, I say a lot stronger, because often there's still selection, it's just attenuated, right? So I believe with the height, there's still some evidence for selection, but it's very attenuated. That means, like, it's much lower, right? And so that's something that they're worried about. Well, they they basically have a model where the output, it's the allele count, so it's not the frequency, but the allele count. And basically what they're doing - you have a trajectory over time, okay? And what they're looking for is correlated changes within the within these populations over time with the ancient DNA, because they have 14,000 years, right? And what they're looking for at any given allele, like any given position, is they want to figure out if there is a if there's a noticeable change in the frequency of the of that allele, and when I say noticeable, you know, change. The issue here is there's going to be changes in allele frequency over time, when there's population admixture between different populations. And there's also going to be allele frequency changes due to genetic drift. So sometimes things are just going to be moving randomly. So how do they know that they're not picking up genetic drift? How do they know they're not picking up population structure? So they have a generalized linear mixed model, where the output is this change in allele frequency. Well, actually, there's, there's a coefficient for selection, okay, in it. I'm not gonna say the whole model, but, but basically, the model has, you know, it's got, you know, there's time in there. There's just like, a like, error parameter in there. There's a correlation - There's correlation, there's there's a matrix in there that's the correlation of the kind of like the genotypic correlation covariance of the individuals of the genes. So it's basically this, a matrix that models population structure and relatedness as they're going through the individuals at any given time and across time, and they're looking for patterns that are not going to just be explained by the relatedness, right? So, for example, with again, with OCA2 HERC2 SNP, what you're seeing is, initially, there's a crash that's just due to population structure, but then there's selection upward. The frequency goes upward. Well, that's not going to be due to population structure, because the next population that came into Europe the steppe people, they also hardly had blue eyes. So what you had is a situation, if you didn't have selection, you would have a drop, which is what they what you see empirically. And then the steppe people would come in, and it would stay flat, right? So you would expect a change where there's the change would be explicable. Could be explained within the model by just changes in population structure over time. But basically, you can see just visually, if you put in that SNP you can see visually in the browser, it goes back up. That's not due to population structure, right? And I can describe this because we know the history of this specific gene really, really well. And that's the thing. This is bringing up, like, a lot of results in the aggregate, but we know a lot of the results already, because people have looked at these specific genes the way scientists and geneticists used to do this, like, you know, a while ago, a generation ago, is like, they would like, they would have, like, a natural history of a specific gene. Here you're returning results on basically 1000s of genetic positions. And so it's pretty intense, you know, they they tried to do some things to check to see if the model was making any sense. So for example, they found that there was a huge, huge enrichment of the basically, when they detected that there was selection, when the model - the model fit was better, when there was selection at that allele, the directional selection at that allele, what they saw was that allele was also really enriched in genome wide associations. Genome wide associations are looking for traits that vary within the population. And, you know, it's looking for like the traits that vary within the population, of the SNPs, the markers that explain the traits that vary within the population, right? And so, yeah, I think, I think that's enough there, yeah. So they independently estimated some of these parameters, in particular selection parameters, the one you're concerned about for 9.9 million variants, which is a substantial number, the average human genome has, like, around 5 million variations from the reference. And like, you know, the whole human population probably has 100 population probably has hundreds of I don't think it's got a billion, but it's got hundreds of millions of variants, but most of them are very rare. And, you know, they did, because this team - so Mallick, and Nadin Rohland in particular. I don't know if this is true, because I don’t know all the details, but these people are really good with curating and cleaning the data. That's really important. One of the issues, one of the criticisms of earlier work like David Pfeiffer's from this group, I've talked to them about it, was that ancient DNA is crappy data, and you're gonna get weird results from crappy data. When you need sensitivity, you need, like, a lot of sensitivity and power. So they have, like, huge sample size, okay? So they have like, increased their power that way. They've also really, really cleaned the data. So these are high quality markers, and this is a group that has a lot of data, right, really high quality markers, and they've tried to cross check their method in variety of ways, so that they can see that they're they're getting real results. Let's see. They about 2% 2.35% of the variance in allele frequency over time in their in their model, was explained by selection, which seems reasonable. So what's the other other? All the rest of the variance? I mean, you know, drift and probably like population structure, mixing, you know, just like non selective, non adaptive stuff, just like population, regular population history and sampling variants, there's a lot of they also calculated polygenic scores. So polygenic scores is simpler than the generalized linear mixed model, which has like fixed effects and random facts. I don't want to basically the generalized linear mixed model, it has an output of the allele frequency count, and it's got an input of all these independent variables that it's trying to control for. And, you know, they're common, I think, in economics and I see them statistical genetics all the time. Polygenic score is somewhat simpler, where you just have the result, the odds or a value, and then you have, like, a bunch of variables that explain it. And the variables are the environment, random noise, and then also the genes, right? And so that's how they compute a polygenic scores. They got polygenic scores, and they were looking at how the polygenic scores also varied over time, right? And this is for the polygenic traits that was obviously important. They also, yeah, look to things like the enrichment, inheritability, and they did a bunch of simulations in terms of, I'm not gonna go into the technicals, but they could not do the calculation simultaneously. Obviously, on any cluster, it just was not possible. So they had to partition in very various ways. So there's a lot of bioinformatic work, cloud computing work, I do have to say

    it's a little bit of an issue for me. And so far as this is, this is many, many years of work I've been hearing about this. I mean, I don't even know. I think I've been hearing about this for at least four years in terms of the idea for this paper. This has been a long time in coming, and so there's obviously a lot of work. You got to admire that. But this is not going to be easy to replicate. I'm not going to be able to replicate it. I've used some of these methods before, but in small batches. Obviously, I don't have the cloud computing resources, and, you know, I don't have Alkes Price to check this. I don't have Swapan Mallick to, you know, make sure that the data curation is good. I you know, the imputation. There's all these things, words that some of you know and some of you don't, but there's a lot of points in here I would like to check. So I guess what I would tell you guys is, I do do like, you know, some data analysis. I do things like that a lot. There's, it's one thing to, like, read the paper, know the method. It's another thing to do it and develop an intuition. The only person who really gets this preprint, I believe, really, to any extent, is Ali Akbari, because I, you know, like just talking to him, his body language. This was like exhausting. This is many years of work. And you know, more respect to him. But the problem here is reproducibility. These are very exciting results. Some of them are obviously true. I mean, they're you find them in other results, you know. But the issue is like, Okay, how do we reproduce this? Like, you know, we're kind of getting to the point of big science where, you know, Eske Willerslev’s group or something, maybe Johannes Krauss, I don't know. I don't know who's gonna be able to reproduce this, because there are other groups I got, like, Rasmus Nielsen's group or Jonathan Pritchards group. You know, they have kind of the chops to do it, but they do other things as well. And you know, it's just, I don't know if they have the resources to replicate this, or they want to devote one of their grad students, or, like their postdocs to this project. So, you know, we have these results, they're great, but, like, how do we reproduce it? Right? That's a major issue. I also want to say, Yeah, think that's that's they got the sample size, yeah, they tested for things like false positives and and other stuff. So there's different. So the results, the results for selection like, come out with a Z score, so the longer distribution. And so some of the results are, are very, very high, confident, there's 99th percentile and stuff, and some of them are not. And, you know, there's a false discovery rate, so that, you know, with this sort of stuff, like, if you're really stringent, you're not gonna return a false result, but then you're gonna prune out a lot of true results, right? So it's the ROC curve, you know, it's like, the curve of, like, the trade off between, you know, true positives and false negatives and all that stuff, right? So I've been talking for a while, and I think a lot of you are, like, not a lot of you are not super excited probably because I haven't talked about the traits. So I'll talk about the traits. I'm just gonna go through the traits. I'm gonna talk about it. So the really, really high confidence ones that they got, you know, 347, independent loci, with 279, excluding the HLA region. The reason they had so many hits on the HLA region is the HLA region is, you know, it's like your immune system, and it's so important for natural selections, it's hit all the time. We get sick. Populations get sick the HLA. There are variants. There are mutations in your HLA that you share with the chimpanzee, that you don't share with your sibling. They're so variable, they're just like variable all over the place. The HLA loci are also one of the reasons you have tissue matching problems within even your family, because there's just so many positions and the combinatorics work out that, you know, you might match with someone that's in, you know, like on the other side of the world. Now, the probability of matching is way higher within your family. But the issue is, you have one family. Probability of matching within your ethnicity is way higher because that's, you know, family extended. But if your ethnicity is small, you probably, just, like, have a much greater chance if you, you know, sample everybody, right? If you are of a mixed race heritage, you know, it's going to be kind of an issue. So I've heard that people who are Eurasian were for a while, we're going to Thailand to try finding a tissue match, because there's so many half Asian, half Europeans there that, you know, I don't think they have to do that anymore. But my only point is, like, you know, HLA is very diverse, so not, not shocking. This is not a fake, fake result, because that's we see it in other ways. Okay, so when they did the when they reduced the stringency, when they reduced the stringency of their of their statistic, so that, like, there's more false positives. So the false positive is 50% now, when they reduce their stringency to 3.16 they got 10,000 non HLA. 10,361 non HLA loci. That implies that there's about 5000 independent episodes of selection there, right? Because half of them are fake or false. Sorry, I shouldn’t say fake, but the other half are probably true. So that's a lot of selection. One thing that they point out is, if there's selection, how come we don't detect that many hard sweeps? We won't see hard sweeps. Like hard sweeps should be like, 90% allele frequency, 100% allele frequency, all over the place, and like all these like long haplotypes. One thing, one issue is the reason we don't see it is because there's lots of selection they claim. Probably that's not hard Sweeps that swept, then stopped. There's balancing selection, frequency dependent selection, selection that reverses a direction. So what they're detecting is over 14,000 years, 5000 independent episodes of selection. But they could go in all different directions. Which means that in modern the modern statistics, it comes out to be a wash or it's not detectable, right? So they're arguing that this is a more powerful way of detecting selection, because it's seeing, you know, in evolutionary genetics, evolutionary biology, it's often you say, like evolution really is just the change in allele frequencies over time. This is like RA Fisher formulation, and he kind of founded evolution genetics. But basically, this is what they're looking at. They're looking at evolution as we understand it. Changes in allele frequency over time. All this stuff about haplotype structure and side frequency spectrum, that's all fine and well, but I think they're arguing that this is actually a more powerful method in a way. Yeah, so their median selection magnitude, yeah, it was, around it looks like 0.8% or is it the tag SNPs? I'm gonna skip that. I'm gonna skip that. You know, they had a they had, yeah, so it's 0.8% they had limited power to detect selection coefficients of less than 0.5% actually. So that makes sense. Selection coefficients around 1% are pretty strong. 10% are crazy. So you don't see it's a lot of selection is pretty low. So that's what you would see. So they put like, a bunch of traits. I will talk about some of these, HLBDQB1 like a lot of you probably see that in various genetic tests you've taken, it's on chromosome six, micro, micro recognition. There's a snip that, yeah, has increased in frequency, and, yeah, and so it's a tag for a snip that's related to celiac disease or gluten sensitivity. So there's a mutation that results in a 19 fold increase, right? A probability, the selection coefficient is 4.5% which is high, went from zero to 20% the last 400 years. Wow, 4000 years. Sorry, that makes more sense. 4000 years. So it's 2000 BC to now. So, yeah, this is something with agriculture. A pathogen, probably. And this is post Neolithic, right? So this is Bronze Age, or later, and, yeah, so this is basically celiac disease, or gluten sensitivity is a reaction to something that happened. So, you know, you know, in evolution, you often have pleiotropy. In genetics, you have pleiotropy, which means that one gene has the same has multiple effects. So whatever it's doing besides gluten sensitivity or celiac, is very important, because selection coefficient of 4.5% is high. Something happened where it wasn't a big deal that you had gluten sensitivity, like, just live with it. Okay? Another thing that they noticed, ABO, most of you know ABO, you know. So these are usually, like, I think, modeled as Mendelian inherent, like, I'm AO, because I know my mom is A, my dad is B, so they have to be AO and BO, or otherwise I'd be AB, right, in any case, so I'm AO, and that means that I have the A antigen, right? And if I was BO or BB, I'd have the B antigen. And if you're AB, you have both. If you're O, you don't have the antigen. O is the ancestral state, probably, almost certainly it is. And like, you know, antigens, they evolve over time. Like, maybe there were different blood groups a million years ago, I don't know, but A, B and O are the ones we mostly focus on.

    I mean, you know, there's other blood antigens, like Rh factor, you guys know. So it looks like B was a new mutation, which makes sense. A is an older mutation from O, but it shows that B went from zero to 10% across the whole human or northwest EU Asians, not the whole human population, over the last 6000 years, with selection coefficient of 3% and this is a very, very strong signal, and it was correlated with a decrease in A. So, yeah, they have opposite phenotype effects on many phenotypes. The optimal balance of the alleles change. You know, I don't know I don't know all the phenotypes like, because, like, you know, they're probably in the citations. I do know if you're,l O you had a lower risk of dying from covid than if you're A or B, okay, and so B is very high frequency in Eastern Europe, Central Asia and South Asia, and high it's still, it's still a minor allele, but it's, it's a new mutation that arose in central Eurasia. Looks like in the last in the Neolithic, in the Neolithic, the last 6000 years. I'm gonna look at the map real quick so I could just tell you guys, yeah, like looking at the map, we got A, B - So B does not exist in the new world, because it’s a new mutation. It did not exist in Australia. Its frequency is 20 to 35% and like north central India and 20-25% in Central Asia, etc. It is found in Africa, not super high frequencies, but it's there. And in Europe, it really drops when you go east to west. So in Eastern Europe, but in Russia, it's 15 to 20% but in Western Europe, it's five to 10% right? A is a little is a little bit slipped, but the highest frequencies of A are in Australia and in the northern part of the New World, where they're, you know, we're talking like well over 50% and then O is the dominant in the new world. It's like 100% in the southern half of North America and South America, there's nothing else besides O, O is ancestral. So that means we know that the New World Indigenous people left, or were separated, like 25,000 years ago, the original the first Americans. So everyone was probably O, if I had to guess back then, because, I mean, their population size wasn't like 10, probably was at least several 100. There should have been some other blood groups in there, but there weren't. So everyone was O 20,000 years ago. There is some evidence of Na-Dene and other people coming over, you know, obviously from Siberia, and they're the ones that brought A you know, like, let's say Blackfeet, the Blackfeet Indians, I think, like, a lot of them, are A so anyway. TCHH locus has a SNP and strong predictor of straight hair and male pattern baldness in Europeans. The mutation is rare in Africans and East Asians, but it looks like it's been positive it has been positively selected. They say like EDAR in East Asians. EDAR causes thick straight hair in East Asians, and actually EDAR looks like it's anti selected in western Eurasia because it's been introduced multiple times, and each time it disappears. When I say like it was in the Scandinavian hunter gatherers, and it was in the steppe people, but it's gone. It always disappears. EDAR is anti selective for some reason, or just negative selection. I don't know why. The derived variant and so, yeah. So there has been a 1.8% decrease in predisposition to baldness, and the derived variant has been selected against. So this causes straight hair and male pattern baldness in Europeans. So Europeans have gotten, like, a little more curly haired, and male pattern baldness has decreased. And so this is probably something from the probably from Western hunter gatherers. If it's like past 7000 years, I think it's from the Western hunter gatherers, probably, and it's decreased in frequency. So basically, you know, sometimes you have a derived, derived just means newer, younger, mutated, sometimes the mutation shows up. It. Works for a while, then it disappears. It goes up and down, right? And that causes, like a weird selection signature, maybe not detectable with their with their ancestral variant detector they're looking at the time transect, they can detect these sorts of things, right? TYK2, this is tuberculosis locus, and it dropped from 10% to 3% in last 3000 years, a negative 2.3% selection coefficient. But they can identify that there was a positive one from 5500 to 3000 years ago, 2% to 9% so this is the end of the Neolithic to about the Iron Age. Tuberculosis, this is weird. You know, this stuff is, you know, we can, like, correlate it to this or that. But really, like pathogens themselves are their own ecology, and that's just really, really difficult to figure that out. All right, multiple elevated multiple sclerosis in risk in Northern Europe is not due to selection of the steppe, okay, so it's a little older than that. And yeah, yeah, there's positive selection on the allele from 0% to 18% between six to 2000 years. That's that's from, like, the end of the Neolithic into the Bronze Age. They think that it might have something to do with your Yersinia pestis. Maybe, yeah, maybe, yeah. Because, I mean, these are like reactions. It's a negative a negative correlation in relation to - it's antagonistic pleiotropy of some sort. You know, it's like, basically something is being adapted to, selected for, but it causes these side effects, right? Like rapid select, rapid adaptation, strong selection, often causes side effects. And those side effects are mollified later through modifier mutations and other things, right? So, you know, I'm writing about this, you'll soon read about it, but Tibetan adaptation to high altitude is way better than Andean adaptation because it has so much time to adapt. And also, Tibetans borrowed stuff from Denisovans, like the EPAS1 adaptation, you know, Andean adaptations like really crappy, kludgy adaptations that cause problems like, you know, blood clotting and stuff. So this is common when you have strong selection initially and slowly, it'll, you know, get better. Oh, okay, so hemochromatosis, which is a cause blood clots in airplanes and stuff like that, positive selection for 5000 to 2000 years ago, and then dropping in frequency, again, not genome wide significant, but yeah, they hypothesize that it protected against Y. pestis. But no, it doesn't look like it makes sense, because there's plagues later. But um, yeah, so the blood clotting thing is weird. Probably some disease related thing. It's found in a it seems like it's from the Yamnaya, though, because, or like Indo Europeans, because it’s found in India as well. So it's associated with those populations. The CCR5 Delta 32 which is the HIV resistance allele that you guys probably know of. Let's see, it was selected between 5000-2000 years ago, going from 2% to 8% so it's too early for medieval pandemic, the black death. But they think Y. pestis was responsible for it. So that's the bubonic plague. And like you know, David Reich has been talking about how it was just incredibly pervasive with the ancient DNA, it might have been resulting, it might have, might have caused a collapse of Neolithic populations, right? Neolithic civilization, really. Okay, cystic fibrosis allele. There's no evidence of selection. It causes male infertility. Yeah, this is weird. So the earliest direct observation is 2200 years ago in Britain. Earliest imputed one is 10,000 years ago in Anatolia, yeah, I don't know. Yeah. It seems like that's weird. So then they talk about directional selection affected complex traits. So complex traits. So complex traits, let's review. You you kind of know what a complex trait is, but complex traits are, you know, like height, intelligence, but also they can be risk of type two diabetes. So the way that works is, there's an odds ratio. Your odds of getting, of getting Type Two Diabetes is going to be like, you know, so your complex trait, it's a complex trait where it's like, your odds ratio has a distribution. And so like, you know, the average, the median odds ratio will be like, I don't know, there'll be a median odds ratio like, maybe it's like, point two, I don't know. And then, like, there's people at the high end, right? And the distribution, you could converge it, you could transform it to normal, or whatever. My point is, these, like disease, are often risks. And they're odds ratios. They're odds that are actually the output. Then in other cases, you have something like height, which is easy to measure. Height, body mass index, these are easy measurements. Then you have behavioral, things like intelligence. These are endophenotypes, and that's kind of weird, because it's like. There's a lot of mediators going on, but these are all complex traits, and complex traits usually involve a complex genetic architecture, which means that there's like hundreds of genetic positions that are controlling variation. And so they computed polygenic score,

    and they looked over time if it was going to be explained by drift alone. So when you have like a quantitative trait, mutation and drift can shift around the value of the trait without any selection, because mutation can - So, for example, if you have height, mutation probably would drop it just because, you know, well, I mean, like dwarfs are extreme cases. But, you know, a mutation will probably drop it, or maybe intelligence, and then drift, could push it in either direction. And then, of course, there's negative background selection also working against it. So there's negative background selection, there's background selection against, like, negative, you know. So it's called the mutation, mutation selection, mutation selection and drift equilibrium. Of these, like normal traits, is a model that's there. A lot of it works. Sometimes it doesn't. But, you know, different populations are different heights and stuff like, that's quite clearly selection. There's, like, too much evidence now, so they were looking at this, and, yeah, they also tried to do. They're worried about population structure confound, right, like so you know, people are saying the genetic covariance matrix, corrects for that, and in the GLM, there's arguments online about this. But they also, instead of just using, instead of using a value for their apologetic risk, or they just use signs, and that tends to, you know, not be as problematic. I guess I'm using problematic and, like, it's technical, real definition, not like, ooh, something I don't like, for, for, you know, attenuating population structure compounds. Okay, so that's what they did to to calculate that. And like, let's see what they found, which actually, I know what they found, and it's probably while you're listening to me. I'm not going to talk about cross trait LD score regression. Just read the paper. They also for 31 of the 559, traits, they had a check of robustness, where they use data from East Asian GWAS. So these are West Eurasians, obviously. And so they use East Asian GWAS. And so like for height, apparently, like, you know, they replicated a lot of these minority these in East Asians, right? So you should have more confidence that selection was having affected. We see a lot of similar selections in East Asians that we see in western, Western Eurasians, right? Okay, so skin color they found really strong. This is, this is Western Eurasians. These are mostly Europeans. These are Europeans, basically. I mean, maybe some, really, sure, I don't know the sample, but they’re Western Eurasians, right? So, very strong signal, like shocking to literally nobody. Some of the these were detected, not, we didn't have, like, a big, as big of an overview as here, but Iain Mathieson, his group, Dan Ju, he's got a he's got, you know, good paper, although, like, you know, I talked to Dan about that paper a lot. So if you look at, if you look, if you just like look, you could tell that there has to be selection. So I looked in ancient DNA from Estonia, and then modern Estonian fractions of OCA HERC2 ancient Estonians had blue eyes, but not as much as modern Estonians. Why? Could be drift, but, like, really, probably a selection, right? So you could tell with the pigmentation. I've been telling people this for years, you could tell that Europeans are paler now than they were 2000 years ago, and they're way paler than there were 4000 years ago. It's basically the modern European phenotype of, like, extreme depigmentation is, I mean, it's a recent thing, but it's also maybe continuing. We don't know. We don't know what's driving it. You know, people say sexual selection, but like, that's hard to account for. Or like, you know, track. Other people say, like, they bring up vitamin D synthesis. My point is this, this paper, this pre print, this method. I mean, it just definitely confers what we already know. So they say, they say, 50% of the of the selection, of the shift in the PGS polygenic score is due to change in SLC 45A2 So SLC 45A2 is really the big difference from Europeans to other populations. SLC 24A5 is the more famous one with zebra fish a 2005 paper by Lamason et al. But that is very common in South Asia. Common the Middle East, I have two white derived copies of SLC 24A5. So. It's not, doesn't make you totally white, just it just decreases your pigmentation from what you would expect, right? SLC 45A2 - This is not fixed in Europe. Even in Northern Europe, it's not fixed, but it's very close. It's like 90% in Southern Europe. I think it's like 75-80% Sardinia. That's the lowest I could find. And then it's, you know, 99% in Northern Europe. And you know, if you, if you have the ancestral variant, the darker skin variant, you'll be olive skinned. You'll still be kind of white a lot of times. But like, so in the Middle East, the frequency is like 50% right? So you can have a brunette White complexion. You have a lighter complexion, but that really pale complexion, I bet that you see in some northern Europeans, like, let's see the guy I'm gonna the guy who plays Vision is the Marvel movies. His name now, yeah, Paul Bettany, like, if you look at what Paul Bettany looks like, the paleness of his skin, the palace of his eyes, the paleness of his hair, you know. You know, there are people like that who live in England, you know, the Northern Europe, and they, I don't think there would have been very many of them, 4000 years ago, 2000 BC, just because, like, that's that one end of the extreme. And so anyway, they found 50% of the shift was SLC, 45A2 one thing I want to point out is South Asians have European like ancestry from the Sintashta and whatnot. And I noticed really early on, just looking at the fractions of the estimates and stuff, South Asians are darker than they should be. Maybe it's adaptation. But one hypothesis I presented, like 2010 to Nick Patterson, was that selection for lighter skin had not, had not happened to the same extent to the ancestral North Indians when they separated from, you know, the other Europeans. And that turns out to be right. I mean, these, this, these would be the Fatyanovo-Balanovo people. If you look at the predictions from HiRes specs, they're probably white, but they're definitely not as pale as the people that live there now in Central Russia or the Baltic Northern Europeans, northeast Europeans are considerably paler than the Fatyanovo-Balanovo people were, and so yeah, Indians, like you know, if they have more of this Indo Aryan ancestry, they are paler, but they're not as pale as they would be, because Europeans have changed since the Fatyanovo-Balanovo people diverged from their cousins around 4000 years ago. So yeah, 69% of the selection of the PGS of the of the change, is due to the top seven loci. So it's a complex trait. It's polygenic, but it's not as polygenic - I did, I did a, I did a podcast on on pigmentation. You guys should check it out. But just, just look at the archives. But, uh, but, you know, it's one of those things where it's a power distribution, where, you know, 50% is SLC 45 70% is a top seven loci. So, you know, like, OCA HERC2. I don't know what the others are, but like, you know, probably DCM

    maybe melanocort, like the red hair stuff is happening in there as well. But, but the tail of it is really complicated, because the top 104 loci is how they had to go before the signal of selection disappears, which basically means that there's a lot of little, a lot of little, a little, little genes that are also around that are, you know, obviously controlling this trait. And so that's why you get, like, so much variation within a population, because there's all these genes that are just like bouncing around being recombined in individuals within the population. Moving on to another trait that's like a little bit of surprise to me, type two diabetes factors give signals of compelling signals of negative selection. This basically means that the risk for type two diabetes is decreased from what it should be if you just do a naive mixture of WHG, because these are Western Eurasians. These are Europeans, Western hunter gatherers, farmers and steppe and like you know, as they say in the preprint, there's other papers that look at type two diabetes and a lot of other diseases, and it looks like Western hunter gatherers would have had them. Well, I mean, Western hunter gatherers were foragers. They didn't live the farming lifestyle. They were robust. They were different in many different ways from later Europeans, and it looks like, yeah, they would, you know, like Native Americans, indigenous peoples of Australia, farming had a big effect on them in a negative way, in many ways, same probably happened to them. So basically, the risk for diabetes is dropping, and also the prediction for body fat percentage and waist circumference, thick waist. All of that is dropping. And so the argument here is just the Thrifty Gene hypothesis is vindicated. I think that you have to be they have to be careful. You'll be careful about this, because everybody wants to eat the body has a certain amount that's necessary. And unless you're a Yakut and you want to be mitochondrially inefficient so that you don't get frostbite, I don't see like, why it would be adaptive for everybody to process calories in a thrifty way. I think like something is happening here where the smoothing of agriculture made it so that hoarding of fat was selected against. But there has to be a benefit, and it's not because people got type two diabetes. People do not die of type two diabetes young enough for this to matter, right? So there's a lot of these things that cause diseases when you're old. Doesn't matter, okay? I'm sorry, just doesn't. Yes, I know. Like, you know, grandma can't do this or that, you know, if she dies at 60 or 55 because of type two diabetes. But like, the reality is that's just not - it's not a strong enough of a selective effect. I think so. It's not because of type two diabetes. Type Two Diabetes is a side effect. But, and like, you know, the adipose tissue, the lack of deposition of the adipose tissue is a side effect. Basically, what I think is going on here is like calculations with the body's homeostasis, body's metabolism, of, you know, should I expend the calories now to build tissue, or do this or that? Or should I, preferentially, you know, lay down adipose tissue or other things, or glycogen tissue, I don't know. And it could be that in a forager environment, where there's windfall, you want to lay down the adipose tissue because you're not going to get a lot to eat for a while, whereas with farmers, if you do it well, you know, famine, maybe every 10 years? I don't know, probably. But the point is, it looks like farmers smoothed the calorie through the input of calorie in a way that there was selection in the metabolism, so you did not have to hoard calories as much by laying down adipose tissue and other sorts of things that cause side effects. Some of these side effects are like type two diabetes. That's not the thing that I think people are worried about, or that was causing the problem though. There's no selection. I don't think there's any selection for type two diabetes. I think that is something that shows up really late, because, well, I mean, you know, the modern world, we live a long time, right? Pre modern world, we did not so just don't buy the hype. I guess when it comes to that, Okay, the next trait, okay, they found a lot of negative polygenic selection against psychosis like bipolar disorder and schizophrenia. And, okay, so I think what's going on here is personality probably changed. We'll get to the duo mythic. And, yeah, I don't know. It's very polygenic. They had to drop 740 loci for bipolar disorder and 726 for schizophrenia signals to become non significant. So what does that tell you? Schizophrenia and bipolar disorder are ends of the distribution, and the population is moving along the distribution. Okay, so that means that the selection is probably mixed, probably frequency dependent, but basically, if you have Bipolar schizophrenia, there was strong selection against it in mass societies and villages. That makes sense to me. You know, schizophrenia does have some correlation with creativity that's robust from everything I've heard. I don't know what bipolar disorder would do, but a lot of these mental illnesses are just kind of overactivity of theory of mind and other aspects of regular human nature, right? You know, if you're an animal, you don't get depressed because you don't think about the future. That's what I'm trying to get at. And so these are complicated. I'm not going to give you an adaptive story for what's going on, but I think we can imagine living in a village is different than living in a small foraging band, you know. So they say they observe selection of combination of alleles that at that are today associated with healthy lifestyles into an old age, faster walking, alleles that are against alleles associated with smoking, against alleles contribute to overall health decline. These are just mediating. These are outputs that are mediated by other things we don't know. The details some of you might know better than you know, even Ali Akbari, I would just then, you have to put something here, but it's obviously really complicated, so, but not by intelligence and stuff like that, because what some of you are curious about. There was like, clues about this. You also see this in East Asians. So they found scores on intelligence tests increased .79 standard deviation. Household income, 1.11 years of schooling .61. Very polygenic. Well, yeah, no, duh. So. So they did have effect. They did. They could. They checked, they checked, yeah, correlation of East Asian GWAS effect size measurements to West Asian selection. They got significant correlation, which means that, basically, probably they don't think it's population structure. Similar things are happening in East Asians that are happening in West Eurasians. And these are the two populations that have, you know, higher IQs and have had, like, dense, you know, living for a long time. So, you know, take that, how you take that, how you want to take it. It looks like the selection for this sort of stuff, though, this, this bio behavioral stuff, kind of tailed off at the end of the Neolithic. So a lot of the work was done. A lot of the work was done by the Neolithic, by living in a village, if there was work. So for example, like time preference, it's in terms of, like, can you forego and can you plan? You can imagine a situation where farmers just have to do a lot more of that than foragers, depending, you know, but so maybe that's what's going on. Obviously, there was no standardized tests or anything like that. But, you know, there was more specialization, artisans, other sorts of niches were opening up. And a lot of the Neolithic societies were pretty complicated, like Cucuteni-Trypillia, you know, there’s paper out they might have had the largest city well before Sumer. So, I think there's room for that. There's not much of a gain during the Bronze Age and later. What does that mean? Well, one thing that I've written about is cities destroy elites just because people don't reproduce like, could be like IQ shredders, demography shredders. I think another possible issue that we need to think about is, uh, you know, the guest I had, a guest, Michael Muthukrishna, his mentor, his PhD advisor, called Joe Henrich and others, they've talked about the cultural brain, the social brain. And it could be that humans are just outsourcing intelligence now to, you know, cultural technologies like literacy, tablets, you know, you know, also they're distributing it these large social networks, so that we know you only need one Isaac Newton to do calculus. You don't need a bunch of them. So it could be the specialization has a lot - some specialization and niche differentiation might have selected for these bio behavioral traits, but then the extra stuff that's really associated with complex city states and later Empires was beyond was not like driven by biology, but it was driven by cultural evolution, cultural selection, right? And also just cultural, you know, cultural technologies. So, you know innovation. You know, lucid technologies did not exist. There were no think tanks, there were no universities. There's all these cultural technologies, cultural innovations that exist to foster innovation, to foster thinking,

    you know you're here, like listening to me ramble on and reading my substacks, these are all you know, obviously, not just like turning us into von neumanns, but we have these extra tools that enable us to learn. And I think that's a that's important, right? Yeah, so I think that's an important thing to note. They talk about how there's been evidence from Iceland that polygenic selection for educational attainment and intelligence has been decreasing. So, you know, this is like, kind of makes sense. When you look at the social statistics data where highly educated people after the demographic transition have fewer children. You often see kind of like a U shaped curve, where middle and upper middle class have the lowest fertility, the upper class have high fertility. But the issue is the upper class, there's very few of them, and so it's really the lower classes that are driving the fertility. So we're talking for a while. There's all sorts of caveats and complaints that people are making. So one of the issues is some people are saying that genetic drift is not accounted for. I think that they would say it was, I mean, I know, I've heard from people that they think that, you know, genetic relationship matrix takes account for that, for the population drift. You know, they say most of the allele frequencies change is drift. You know some of the things here are interesting, because they're just what you would expect. You know, if they're enriched in functional regions of the genome, they've been detected in other forms of selection. So a lot of these, I believe, I don't know, you know which ones are wrong, but, like a lot of these, are just validating what we already know. You could visually inspect some of these, or they're really obvious. I think the polygenic ones are the biggest controversial ones, because they needed a lot of statistical power. They need like so the issue with polygenic selection papers that were published earlier was people, they were skeptical that the data was good, because there's so much crap in the ancient. DNA data. And it takes a lot of skill, a lot of experience, and they've, they've really curated a really good data set. So, you know, it's getting better. You know, obviously, you know, the rubber hits the road when it comes to testing for genetic - when you look at siblings and compare siblings and how they did a different genetically, how they're different typically, you know, there's all sorts of things that you could do to validate some of these results. Selection is different than just the correlate between correlation between a trait, a trait, and a gene. So the GWAS stuff, the correlation between the trait the genes, tends to be more robust and easier, and then selection is, you got to be a little bit more careful, because, you know, all sorts of weird things can happen there. Yeah, what else? Oh, I think, -So I have a friend, I don't like say his name, so I'm not gonna say his name, so I'm not gonna say his name. But he's a professor of statistical genetics at CUNY I don't know which CUNY but, but anyway, he was complaining that today. Okay, last name, s, o, u, a, i, a, i, a, okay, so you want to look him up. He was complaining. The big thing is, uh, portability of genome wide associations. So one thing that you see is, uh, SNPs because of LD structure and whatever that are good signals in Europeans are not as good signals or not Europeans and really bad in Africans. Okay, that's fine. But like, you know, they were looking at Europeans, they were looking at they were looking at West Eurasian populations. And the genetic distance between Western Eurasian population is still not nearly the same. So, you know this I have, I need to review this literature. There's two primary reasons that you would have the the decay with the SNPs. There's a bunch of reasons, but two prime, what is something called LD structure, where it's like, Europeans have, like, well, none Africans have, like, huge log blocks of linked of long haplotypes of linked variants. And so you could tag a variant because it's next to another one, and it gives you the same signal, and you can cover a large part of the genome. That way, the LD structure is different from different populations, and so that could cause some issues with portability. Also Africans, because of a large because they're historically large, effective breeding population size, effective population size. They have, they have really narrow LD bands, which means that you need much denser basically like one snip can cover like a huge part of the genome for a European or Asian, it covers a much smaller part of the genome for an African, because the LD is much narrower, right? So it tags a lot less, so it's much more difficult to pick up signals. Okay, so that's like a technical thing. If you have whole genomes and large sample sizes, you should be able to fix a lot of that. Another issue is different SNPs might have different effects of different effects of different populations because gene environment, Gene, Gene interactions, Gene, genetic background interactions. Anyway, that's a real biological issue, but there's a lot of portability. Most of the variation, as we know, between populations is shared. You know what is like? 85% is the stylus. Fact, only 15% is between populations and the pairwise FST, you know, I think it's like, you know, the well, Western hunter gatherers, early European farmers and steppe people, like they contributed to modern Europeans, so, and they are part of the out of Africa. So I think this portability criticism is, I don't know it's legit, but it's a real thing. But, you know, I think the effect is gonna be smaller than he Is implying here. Because, you know, these are all populations that are out of Africa. So I've seen like estimates for portability decay of like, okay, it predicts 80% from Europeans to South Asians. Predict 60% from Europeans to East Asians, other 30% to Africans. Obviously, 30% to Africans is bad. 60% isn't great, but it's still up over 50 and then 80% to South Asians isn't bad either, right? I think the genetic distance between you know, the genetic distance between EEF and WHG is point one. It's like East Asians, Chinese to Northern Europeans today, the genetic distance between, you know, Eef or whg to modern Europeans would be lower than that, because their ancestral populations to it. You know what I'm saying? Yeah, yeah, yeah. Just again. I need to read the supplements. I need to, like, dig into the GLMM. But really, like some of these empirical questions, I think the biggest worry that I have is what I brought up earlier. You know, how are we going to reproduce this? Like, how am I going to dig into this, me, other people, even other researchers, other labs that have resources. This was such a big lift, a massive lift. And. Uh, yeah, I would say again. Let me reiterate, like, these people, this team is great. This team has some like it has, you know, it has some of the great ones. So, you know, they're not shabby. There's not gonna be, like, big errors in here, but this stuff needs to be reproduced. You know, some modifications to the method need to be done to see, like, how robust some of these results are. But I think it's a really good start. You know, some people are asking, you know, privately you don't have some Geneticist friends talking. This might be like the beginning of the day of breaking on an understanding of the genetic architecture and complex traits and the arc of natural selection across different populations. Those of you who know know, those of you don't, it's okay, so we'll see. I don't know. I found it very interesting, very fascinating. A lot of stuff to dig up. Check out the browser. I'm gonna link to the browser. But you know, if you go ahead and put in the RS IDs, the easiest way to do it would be, if you know a gene type a gene like SLC 45A2 into Google, type that, and then type snpedia, and it'll give you the RS IDs. That's the easiest way that I would say to do it. Yeah, I don't know what the easiest way I know a lot of the RS IDs myself, but this is, it's a big deal. It's the big deal, and they've been working on it for so long, they wanted to do it right. I'm sure maybe someone else is gonna release something soon. I don't know, but I don't think anyone else has the resources to do anything like this anytime soon. But I think, yeah, bravo. This is super impressive, and I hope you guys enjoyed this podcast. If you're listening this late, please, you know, rate review, and then yeah, and I will be, you know, I think people do like the monologs. I'll be doing more monologs as we go forward. And, yeah, I enjoyed, you know, talking about this too, by the way. So thank you for allowing me to indulge my passions.

    Even if you and your partner are healthy, there's still a chance your child can develop a serious genetic disease. This is because every embryo has new changes not present in either parent. Most of the time, these are benign, but sometimes they can be catastrophic. Orchids, whole genome embryo reports directly screen the embryo and analyze these de novo genetic mutations. Discuss embryo screening in IVF with a genetics expert.

    This podcast for kids. You

    this my favorite podcast. I.