From: M. Taylor Saotome-Westlake Date: Sun, 29 Mar 2020 05:17:21 +0000 (-0700) Subject: Human Diversity review: the item characteristic curve X-Git-Url: http://232903.hjopswx29.asia/source?a=commitdiff_plain;h=45b475715576c56c09a2fbcff62e6d630ba25786;p=Ultimately_Untrue_Thought.git Human Diversity review: the item characteristic curve --- diff --git a/content/drafts/book-review-human-diversity.md b/content/drafts/book-review-human-diversity.md index d72b0ed..23285f6 100644 --- a/content/drafts/book-review-human-diversity.md +++ b/content/drafts/book-review-human-diversity.md @@ -42,7 +42,7 @@ You might think that there wasn't enough _time_ in the 2–5k generations since Another mechanism of recent human evolution is _introgression_: early humans interbred with our Neanderthal and Denisovan "cousins", giving our lineage the chance to "steal" all their good alleles! In contrast to new mutations, which usually die out even when they're beneficial (that 2s rule again), alleles "flowing" from another population keep getting reintroduced, giving them more chances to sweep! -Population differences are important when working with genome-wide association studies, because a model "trained on" one population won't perform as well against the "test set" of a different population. Suppose you do a big study and find a bunch of SNPs that correlate with a trait, like schizophrenia or liking opera. The frequencies of those SNPs for two populations from the same continent (like Japanese and Chinese) will hugely correlate (Pearson's _r_ ≈ 0.97), but for more genetically-distant populations from different continents, the correlation will still be big but not huge (like _r_ ≈ 0.8 or whatever). +Population differences are important when working with genome-wide association studies, because a model "trained on" one population won't perform as well against the ["test set"](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) of a different population. Suppose you do a big study and find a bunch of SNPs that correlate with a trait, like schizophrenia or liking opera. The frequencies of those SNPs for two populations from the same continent (like Japanese and Chinese) will hugely correlate (Pearson's _r_ ≈ 0.97), but for more genetically-distant populations from different continents, the correlation will still be big but not huge (like _r_ ≈ 0.8 or whatever). What do these differences in SNP frequencies mean in practice?? We ... don't know yet. At least some population differences are fairly well-understood: I'd tell you about sickle-cell and lactase persistence, except [then I would have to scream](/2017/Dec/interlude-xi/). There are some cases where we see populations independently evolve different adaptations that solve the same problem: [people living on the plateaus of both Tibet and Peru have both adapted to high altitudes](https://www.pnas.org/content/104/suppl_1/8655.long), but the Tibetans did it by breathing faster and the Peruvians did it with more hemoglobin! @@ -56,17 +56,15 @@ Anyway, it turns out that the effect of the shared environment _C_ is way smalle It's important not to overinterpret the heritability results; there are a bunch of standard caveats that go here that everyone's treatment of the topic needs to include! Heritability is about the _variance_ in phenotypes that can be predicted by _variance_ in genes. This is _not_ the same concept as "controlled by genes." To see this, notice that the trait "number of heads" has a heritability of zero because the variance is zero: all living people have exactly one head. (Siamese twins are two people.) Heritability estimates are also necessarily bound to a particular population in a particular place and time, which can face constraints shaped solely by the environment. If you plant half of a batch of seeds in the shade and half in the sun, the variance in the heights of the resulting plants will be associated with variance in genes _within_ each group, but the difference _between_ the groups is solely determined by the sunniness of their environments. Likewise, in a Society with a cruel caste system under which children with red hair are denied internet access, part of the heritability of intellectual achievement is going to come from alleles that code for red hair. Even though (_ex hypothesi_) redheads have the same inherent intellectual potential as everyone else, the heritability computation can't see into worlds that are not our own, which might have vastly different gene–environment correlations. -Old-timey geneticists used to think that they would find small number of "genes for" something, but it turns out that we live in an omnigenetic, pleiotropic world where lots and lots of SNPs each exert a tiny effect on potentially lots and lots of things. I feel like this probably _shouldn't_ have been surprising (genes code for proteins, variation in what proteins get made is going to affect high-level behaviors, but high-level behaviors involve _lots_ of proteins in a super-complicated unpredictable way), but I guess it was. +Old-timey geneticists used to think that they would find small number of "genes for" something, but it turns out that we live in an omnigenetic, pleiotropic world where lots and lots of SNPs each exert a tiny effect on potentially lots and lots of things. I feel like this probably _shouldn't_ have been surprising (genes code for amino-acid sequences, variation in what proteins get made from those amino-acid sequences is going to affect high-level behaviors, but high-level behaviors involve _lots_ of proteins in a super-complicated unpredictable way), but I guess it was. Murray's penultimate chapter summarizes the state of a debate between a "Robert Plomin school" and an "Eric Turkheimer school" on the impact and import of polygenic scores, where we tally up all the SNPs someone has that are associated with a trait of interest. The starry-eyed view epitomized by Plomin says that polygenic scores are _super great_ and everyone _and her dog_ should be excited about them: they're causal in only one direction (the trait can't cause the score) and they let us assess risks in individuals before they happen. Clinical psychology will enter a new era of "positive genomics", where we understand how to work with the underlying dimensions along which people vary (including positively), rather than focusing on treated "diagnoses" that people allegedly "have". -The curmudgeonly view epitomized by Turkheimer says that science is about understanding the _causal structure_ of phenomena, and that polygenic scores don't fucking tell us anything. [Divorce is heritable _in the same way_ that intelligence is heritable](http://www.geneticshumanagency.org/gha/the-ubiquity-problem-for-group-differences-in-behavior/), not because there are "divorce genes" in any meaningful biological sense, but because of a "universal, nonspecific genetic pull on everything." +The curmudgeonly view epitomized by Turkheimer says that Science is about understanding the _causal structure_ of phenomena, and that polygenic scores don't fucking tell us anything. [Marital status is heritable _in the same way_ that intelligence is heritable](http://www.geneticshumanagency.org/gha/the-ubiquity-problem-for-group-differences-in-behavior/), not because there are "divorce genes" in any meaningful biological sense, but because of a "universal, nonspecific genetic pull on everything": _all other things being equal_, people with more similar genes will make more similar proteins from those similar genes, and therefore end up with more similar phenotypes that interact with the environment in a more similar way, and _eventually_ (the causality flowing "upwards" through many hierarchical levels of organization) this shows up in the divorce statistics of a particular Society in a particular place and time. But this is completely opaque; the real work of Science is in figuring out what all the particular gene variations actually _do_. -Notably, Plomin and Turkheimer aren't actually disagreeing here: it's a difference in emphasis rather than facts. Polygenic scores _don't_ explain mechanisms—but might they end up being useful, and used, anyway? Murray's vision of social science is content to make predictions and "explain variance" while remaining ignorant of ultimate causality. Meanwhile, my cursory understanding (while kicking myself for [_still_](/2018/Dec/untitled-metablogging-26-december-2018/#daphne-koller-and-the-methods) not having put in the hours to get much farther into [_Probabilistic Graphical Models: Principles and Techniques_](https://mitpress.mit.edu/books/probabilistic-graphical-models)) was that you need to understand causality in order to predict what interventions will have what effects [TODO: explain why with example] - -Maybe our feeble state of knowledge is _why_ we don't know how to find reliable large-effect environmental interventions that still yet might exist in the vastness of the space of possible interventions. +Notably, Plomin and Turkheimer aren't actually disagreeing here: it's a difference in emphasis rather than facts. Polygenic scores _don't_ explain mechanisms—but might they end up being useful, and used, anyway? Murray's vision of social science is content to make predictions and "explain variance" while remaining ignorant of ultimate causality. Meanwhile, my cursory understanding (while kicking myself for [_still_](/2018/Dec/untitled-metablogging-26-december-2018/#daphne-koller-and-the-methods) not having put in the hours to get much farther into [_Probabilistic Graphical Models: Principles and Techniques_](https://mitpress.mit.edu/books/probabilistic-graphical-models)) was that you _need_ to understand causality in order to predict what interventions will have what effects: variance in rain may be statistically "explained by" variance in mud puddles, but you can't make it rain by turning the hose on. Maybe our feeble state of knowledge is _why_ we don't know how to find reliable large-effect environmental interventions that still yet might exist in the vastness of the space of possible interventions. There are also some appendicies at the back of the book! Appendix 1 (reproduced from, um, one of Murray's earlier books with a coauthor) explains some basic statistics concepts. Appendix 2 ("Sexual Dimorphism in Humans") goes over the prevalence of intersex conditions and gays, and then—so much for this post broadening the [topic scope of this blog](/tag/two-type-taxonomy/)—transgender typology! Murray presents the Blanchard–Bailey–Lawrence–Littman view as fact, which I think is basically _correct_, but a more comprehensive treatment (which I concede may be too much too hope for from a mere Appendix) would have at least _mentioned_ alternative views ([Serano](https://rationalwiki.org/wiki/Intrinsic_Inclinations_Model)? [Veale](/papers/veale-lomax-clarke-identity_defense_model.pdf)?), if only to explain _why_ they're worth dismissing. (Contrast to the eight pages in the main text explaining why "But, but, epigenetics!" is worth dismissing.) Then Appendix 3 ("Sex Differences in Brain Volumes and Variance") has tables of brain-size data, and an explanation of the greater-male-variance hypothesis. Cool! @@ -120,14 +118,22 @@ Murray opens the parts of the book about sex and race with acknowledgements of t But this kind of defensive half-measure satisfies no one. From the oblivious-science-nerd perspective—the view that agrees with Murray that "everyone should calm down"—you shouldn't _need_ to genuflect to the memory of some historical injustice before you're allowed to talk about Science. But from the perspective that cares about Justice and not just Truth, an _insincere_ gesture or a strategic concession is all the more dangerous insofar as it could function as camoflage for a nefarious hidden agenda. If your work is explicitly aimed at _destroying the anti-oppression Schelling-point belief_, a few hand-wringing historical interludes and bromides about human equality having no testable implications (!!) aren't going to clear you of the suspicion that you're _doing it on purpose_—trying to destroy the anti-oppression Schelling point in order to oppress, not because anything that can be destroyed by the truth, should be. -And sufficient suspicion makes communication nearly impossible. (If you _know_ someone is lying, their words mean nothing, [not even as the opposite of the truth](https://www.lesswrong.com/posts/qNZM3EGoE5ZeMdCRt/reversed-stupidity-is-not-intelligence).) As far as many of Murray's detractors are concerned, it almost doesn't matter what the text of _Human Diversity_ says, how meticulously researched of a psychology/neuroscience/genetics lit review it is. From their perspective, Murray is "hiding the ball": they're not mad about _this_ book; they're mad about specifically chapters 13 and 14 of a book Murray coauthored twenty-five years ago. +And sufficient suspicion makes communication nearly impossible. (If you _know_ someone is lying, their words mean nothing, [not even as the opposite of the truth](https://www.lesswrong.com/posts/qNZM3EGoE5ZeMdCRt/reversed-stupidity-is-not-intelligence).) As far as many of Murray's detractors are concerned, it almost doesn't matter what the text of _Human Diversity_ says, how meticulously researched of a psychology/neuroscience/genetics lit review it is. From their perspective, Murray is "hiding the ball": they're not mad about _this_ book; they're mad about specifically chapters 13 and 14 of a book Murray coauthored twenty-five years ago. (I don't think I'm claiming to be a mind-reader here; the first 20% of [the _New York Times_'s review of _Human Diversity_](https://archive.is/b4xKB) is pretty explicit.) In 1994's _The Bell Curve: Intelligence and Class Structure in American Life_, Murray and coauthor Richard J. Herrnstein argued that a lot of variation in life outcomes is explained by variation in intelligence. Some people think that folk concepts of "intelligence" or being "smart" are ill-defined and therefore not a proper object of scientific study. But that hasn't stopped some psychologists from trying to construct tests purporting to measure an "intelligence quotient" (or _IQ_ for short). It turns out that if you give people a bunch of different mental tests, the results all positively correlate with each other: people who are good at one mental task, like listening to a list of numbers and repeating them backwards ("reverse digit span"), are also good at others, like knowing what words mean ("vocabulary"). There's a lot of fancy linear algebra involved, but basically, you can visualize people's test results as a hyperellipsoid in some high-dimensional space where the dimensions are the different tests. (I rely on this ["configuration space"](https://www.lesswrong.com/posts/WBw8dDkAWohFjWQSk/the-cluster-structure-of-thingspace) visual metaphor _so much_ for _so many_ things that when I started [my secret ("secret") gender blog](/), it felt right to put it under a dot-space [TLD](https://en.wikipedia.org/wiki/Top-level_domain).) The longest axis of the hyperellipsoid corresponds to the "_g_ factor" of "general" intelligence—the choice of axis that cuts through the most variance in mental abilities. So Murray and Herrnstein talk about this "intelligence" thingy, and how it's heritable, and how it predicts income, school success, not being a criminal, _&c._, and how this has all sorts of implications for Society and inequality and class structure and stuff. -This _should_ just be more social-science nerd stuff, the sort of thing that would only draw your attention if, like me, you feel bad about not being smart enough to do algebraic topology and want to console yourself by at least knowing about the Science of not being smart enough to do algebraic topology. The reason everyone _and her dog_ is still mad at Charles Murray a quarter century later is Chapter 13, "Ethnic Differences in Cognitive Ability", and Chapter 14, "Ethnic Inequalities in Relation to IQ". So, _apparently_, different ethnic/"racial" groups have different average scores on IQ tests. [Ashkenazi Jews do the best](https://slatestarcodex.com/2017/05/26/the-atomic-bomb-considered-as-hungarian-high-school-science-fair-project/), which is why I sometimes privately joke that the fact that I'm [only 85% Ashkenazi (according to 23andMe)](/images/ancestry_report.png) explains my low IQ. (I'm pretty dumb compared to some of my robot-cult friends.) East Asians do a little better than Europeans/"whites". And—this is the part that no one is happy about—the difference between U.S. whites and U.S. blacks is about Cohen's _d_ ≈ 1. (If two groups differ by _d_ = 1 on some measurement that's normally distributed within each group, that means that the mean of the group with the lower average measurement is at the 16th percentile of the group with the higher average measurement, or that a uniformly-randomly selected member of the group with the higher average measurement has a probability of about 0.76 have having a higher measurement than a uniformly-randomly selected member of the group with the lower average measurement.) +This _should_ just be more social-science nerd stuff, the sort of thing that would only draw your attention if, like me, you feel bad about not being smart enough to do algebraic topology and want to console yourself by at least knowing about the Science of not being smart enough to do algebraic topology. The reason everyone _and her dog_ is still mad at Charles Murray a quarter century later is Chapter 13, "Ethnic Differences in Cognitive Ability", and Chapter 14, "Ethnic Inequalities in Relation to IQ". So, _apparently_, different ethnic/"racial" groups have different average scores on IQ tests. [Ashkenazi Jews do the best](https://slatestarcodex.com/2017/05/26/the-atomic-bomb-considered-as-hungarian-high-school-science-fair-project/), which is why I sometimes privately joke that the fact that I'm [only 85% Ashkenazi (according to 23andMe)](/images/ancestry_report.png) explains my low IQ. (I'm pretty dumb compared to some of my [robot-cult](/tag/my-robot-cult/) friends.) East Asians do a little better than Europeans/"whites". And—this is the part that no one is happy about—the difference between U.S. whites and U.S. blacks is about Cohen's _d_ ≈ 1. (If two groups differ by _d_ = 1 on some measurement that's normally distributed within each group, that means that the mean of the group with the lower average measurement is at the 16th percentile of the group with the higher average measurement, or that a uniformly-randomly selected member of the group with the higher average measurement has a probability of about 0.76 have having a higher measurement than a uniformly-randomly selected member of the group with the lower average measurement.) It's important not to overinterpret the IQ-scores-by-race results; there are a bunch of standard caveats that go here that everyone's treatment of the topic needs to include. Again, just because variance in a trait is statistically associated with variance in genes _within_ a population, does _not_ mean that differences in that trait _between_ populations are _caused_ by genes: [remember the illustrations about](#heritability-caveats) sun-deprived plants and internet-deprived red-haired children. Group differences in observed tested IQs are entirely compatible with a world in which those differences are entirely due to the environment imposed by an overtly or structurally racist society. Maybe the tests are culturally biased. Maybe people with higher socioeconomic status get more opportunities to develop their intellect, and racism impedes socio-economic mobility. And so on. -The problem is, a lot of the blank-slate-compatible hypotheses for group IQ differences become less compelling when you look into the details. "Maybe the tests are biased", for example, isn't an insurmountable defeater to the entire endeavor of IQ testing—it is _itself_ a falsifiable hypothesis, or can become one if you specify what you mean by "bias" in detail. One idea of what it would mean for a test to be _biased_ is if it's partially measuring something other than what it purports to be measuring: if your test measures a combination of "intelligence" and "submission to the hegemonic cultural dictates of the test-maker", then individuals and groups that submit less to your cultural hegemony are going to score worse, and if you _market_ your test as unbiasedly measuring intelligence, then people who believe your marketing copy will be misled into thinking that those who don't submit are dumber than they really are. +The problem is, a lot of the blank-slatey environmentally-caused-differences-only hypotheses for group IQ differences become less compelling when you look into the details. "Maybe the tests are biased", for example, isn't an insurmountable defeater to the entire endeavor of IQ testing—it is _itself_ a falsifiable hypothesis, or can become one if you specify what you mean by "bias" in detail. One idea of what it would mean for a test to be _biased_ is if it's partially measuring something other than what it purports to be measuring: if your test measures a _combination_ of "intelligence" and "submission to the hegemonic cultural dictates of the test-maker", then individuals and groups that submit less to your cultural hegemony are going to score worse, and if you _market_ your test as unbiasedly measuring intelligence, then people who believe your marketing copy will be misled into thinking that those who don't submit are dumber than they really are. But if so, and if not all of your individual test questions are _equally_ loaded on intelligence and cultural-hegemony, then the cultural bias should _show up in the statistics_. If some questions are more "fair" and others are relatively more culture-biased, then you would expect the _order of item difficulties_ to differ by culture: the ["item characteristic curve"](/papers/baker-kim-the_item_characteristic_curve.pdf) plotting the probability of getting the biased question "right" as a function of _overall_ test score should differ by culture, with the hegemonic group finding it "easier" and others (answering honestly) finding it "harder". Conversely, if the questions that descriminate most between differently-scoring cultural/ethnic/"racial" groups were the same as the questions that discriminate between younger and older children _within_ each group, that would be the kind of statistical clue you would expect to see if the test was unbiased and the group difference was real. Hypotheses that accept IQ test results as unbiased, but attribute group differences to the environment, also make statistical predictions. Controlling for parental socioeconomic status only cuts the black–white gap by a third. + +[TODO: sentence about sources of variation within/between groups based on Jensen] + +[TODO: sentence about colorism based on https://www.mdpi.com/2624-8611/1/1/17/htm "Skin color is actually only controlled by a small number of alleles, so if you think societal discrimination on skin color causes IQ differences"] + +And so on. + +In mentioning these arguments in passing, I'm definitely _not_ trying to provide a comprehensive lit review on the question. (That's [someone else's blog](https://humanvarieties.org/2019/12/22/the-persistence-of-cognitive-inequality-reflections-on-arthur-jensens-not-unreasonable-hypothesis-after-fifty-years/).) diff --git a/notes/human-diversity-notes.md b/notes/human-diversity-notes.md index a235e7f..4ea6837 100644 --- a/notes/human-diversity-notes.md +++ b/notes/human-diversity-notes.md @@ -1,5 +1,6 @@ * structural oppression and actual differences can both exist at the same time! They're not contradicting each other! * I don't know how to build a better world, but my first step is to go a little meta and talk about why we can't talk, and take seriously the possible harms from talking, rather than just asserting that free speech and civil discourse is Actually Good the way the likes of Cofnas/Winegard/Murray do (being a nobody blogger probably helps; I have an excuse) + * women and courage * A few things are actually _worse_ than the ball-hiders make it seem ("treat ppl as individuals" doesn't work; "IQ isn't morally valuable" doesn't work) * Embryo selection looks _really important_; I don't want to give amunition to racists, but I need to talk about that—and the recent Dawkins brouhaha says we can't even talk about that; and the ways I'm worried about eugenics being misused aren't even on the radar * Murray says polygenic scores are like GDP ... I bet Ben and Michael would have something to say about that analogy! @@ -8,6 +9,8 @@ * need to talk about individual differences being non-threatening + + —and the people who claim not to have an agenda are lying. (The most I can credibly claim for myself is that I try to keep my agenda reasonably _minimalist_—and the reader must judge for herself to what extent I succeed.) I think this is sympathetic but [ultimately ineffective](http://zackmdavis.net/blog/2016/08/ineffective-deconversion-pitch/). Clueless [presentist](https://en.wikipedia.org/wiki/Presentism_(literary_and_historical_analysis)) conservatism of the form, "Old-timey patriarchy and white supremacy were Really Bad, but that's over and everything is Fine Now" is unlikely to satisfy readers who _don't_ think everything is Fine Now, and suspect Murray of standing athwart history yelling "Stop!" rather than aspiring to Actual Social Science. @@ -24,7 +27,7 @@ When you "treat individuals as individuals", you do so on the basis of evidence [conditional probability "Emacs shirt" vs. "is female", no principled distinction] -The first 20% of the _New York Times_'s review of _Human Diversity_](https://archive.is/b4xKB) is dedicated to casting aspersions on _The Bell Curve_. +is dedicated to casting aspersions on _The Bell Curve_.