False discovery: How not to find the genetic basis of human intelligence

classroom Does a new study really identify genes that determine whether you’ll go to college? Um, no. Photo by velkr0.

Identifying a genetic basis for human intelligence is fraught with huge ethical, social, and political implications. If we knew of gene variants that increased intelligence, would we try to engineer them into our children? Or use them to determine who gets college loans? Or maybe just discourage people carrying the wrong variant from having children? So you’d think that researchers working on that topic would proceed with extra caution, and make sure their conclusions were absolutely iron-clad before submitting results for publication in a scientfic journal—and that peer reviewers working for journals in that field would examine the work that much more closely before agreeing to publication.

Yeah, well, if you thought that, you would be wrong.

A paper just published online ahead of print at the journal Culture and Brain claims to have identified genetic markers that (1) differentiate college students from the general population and (2) are significantly associated with cognitive and behavioral traits. Cool, right? That would mean that these marker identify genes that determine whether you make it to college, and how well you do in educational settings generally—they’re genes that contribute to intelligence.

Again, if you thought that, you’d be wrong. But in that wrongness, you’re in good company, alongside the authors of this paper and, apparently, everyone involved in its peer review and publication.

Out of equilibrium

Here’s what the paper’s authors did to identify these “intelligence” genes. They recruited almost 500 students at Beijing Normal University, took blood samples from them, and gave them all a series of 49 different cognitive and behavioral tests, covering problem solving, memory, language and mathematical ability, and a bunch of other things we generally think of as having to do with intelligence. Using the blood samples, the authors genotyped all of the students at 284 single-nucleotide polymorphism (SNP) markers located in genes with expected connections to brain function—either because they’re involved in producing neurotransmitters, or they’re strongly expressed in the brain.

Next, the authors tested each of the 284 SNPs for deviation from Hardy-Weinberg Equilibrium, or HWE. If you’re not familiar with the concept, here’s my attempt at a brief explanation: HWE boils down to probability.

We all carry two complete sets of genes—one from Dad, one from Mom. So, suppose there’s a spot in the genome where two possible variants—let’s call them A and T—can occur. This is exactly what a SNP is, a single letter of DNA code that differs from person to person. Taking into account the two copies of eaach gene we carry, every person can have one of three possible diploid genotypes at that single-letter spot: AA, AT, or TT.

If we know how common As and Ts are in the population as a whole, we can estimate how common those three diploid genotypes should be: the frequency of the first allele times the frequency of the second allele. Say you’ve genotyped a sample of people, and you find that 40% of the markers are As (a frequency of 0.4), and 60% are Ts (frequency of 0.6). Then, if the two variants are distributed randomly among all the people you’ve sampled, you’d expect to find 16% (0.4 × 0.4 = 0.16) AA genotypes, 36% (0.6 × 0.6 = 0.36) TT genotypes, and 48% either AT or TA genotypes (0.4 × 0.6 + 0.6 × 0.4 = 0.48).

If the actual frequencies of the three genotypes are close to that expectation, we say the SNP is in Hardy-Weinberg equilibrium, a state named for the two guys who originally deduced all this. Deviations from HWE may occur if, for some reason, people are more likely to mate with people who carry the same genotype, or if the three possible genotypes are associated with having different numbers of children—different fitness, in the evolutionary sense. So a deviation from HWE may mean something is going on at the deviating spot in the genome.

Of the 284 SNPs, the authors identified 24 with genotype frequecies that show a statistically significant deviation from HWE—in their sample of college students, that is. They also examined HWE for the same SNPs in a sample taken from the general population of Beijing, as part of the 1000 Genomes database of human genetic diversity, and found that all but 2 of the 24 SNPs that violated HWE in the students were within HWE expectations in the comparison sample. They conclude that this means that something about these 24 SNPs sets the college students apart from the broader population of Beijing.

Except this is not how population geneticists calculate genetic differentiation between two groups of people. For that, we usually use a statistic called FST, which essentially calculates the degree to which allele frequencies differ between two groups. That is, if the students are really differentiated from the rest of Beijing at a particular SNP, then we’d expect the frequency of the A allele among the students to be really different from the frequency of A in the other sample. FST is related to deviation from HWE; but it’s not at all the same thing. Fortunately for us all, the authors published all their genotype frequency data as Tables 1 and 2 of the paper. I can check directly to see whether the FST at each locus suggests meaningful genetic differentiation between the students and the comparison sample.

Chen&al2013_FstThe distribution of FST values calculated from the 24 SNPs. Image by jby.

Possible values for FST range from 0, when there is no difference between the two groups being compared; and 1, when the two groups are completely differentiated. The FST values I calculated from the data tables range from 0.00003 to 0.05432, and half of them are less than 0.002—that’s within the range seen for any random sample of genetic markers in other human populations [PDF]. Which is to say, the 24 SNPs identified in this paper are not really that differentiated at all.

Uncorrected testing is un-correct

But these markers identified in the study are still associated with congnitive ability, right? Well, brace yourself: there are serious problems with that claim, too. To test for association with cognition, the authors conducted a statistical test asking whether students with each of the three possible genotypes at a given SNP differed in the scores they got on the different cognitive tests. If the difference among genotypes was greater than expected by chance, they concluded that the SNP was associated with the element of intelligence approximated by that particular cognitive test. They identified these “significant” associations using a p-value cutoff of 0.01, which is a technical way of saying that the probability of observing the difference among genotypes simply by chance is less than 1 in 100.

The authors tested for associations of the genotypes at 19 SNPs (excluding 5 that would’ve had too few people with one or more of the three genotypes) with all 49 cognitive tests. They conducted each test using the complete sample of students, and then also the males and females separately, in case there were gender differences in the effects of each SNP. Across all three data sets (total, male, and female), they found 17 significant associations.

Statisticians and regular readers of xkcd will probably already know where this is going.

If you conduct one statistical test using a particular dataset, and see that there’s a 1 in 100 chance of observing the result purely by chance, you can be reasonably sure (99% sure!) that your result isn’t due to chance. However, if you conduct 100 such tests, and only one of them has a p-value of 0.01, then that is quite possibly the one time in 100 the result is pure coincidence. Think of it this way: it’s a safe bet that one roll of a die won’t be a six; but it’s not such a safe bet that if you roll a die six times, you won’t roll a six at least once. In statistics, this is called a multiple testing (or multiple comparisons) problem.

How many tests did the authors conduct? That would be 49 cognitive measurements × 19 SNPs, or 931 tests on each of the three separate datasets. At p = 0.01, you’d expect them to get somewhat more than 9 “significant” results that aren’t actually significant. And, indeed, for the total datset, they found 7 significant results; for the male students alone, they found 3; and for the females, 7. That’s exactly what would happen if there were no true associations between the SNP genotypes and the cognitive test results at all.

And, to go all the way back to the beginning, what was the p-value cutoff for the authors’ test of HWE? They considered deviations from HWE significant if the probability of observing the deviation by chance was less than 5%, or p ≤ 0.05. And 5% of 284 SNPs is a bit more than 14. That’s a pretty big chunk of their 24-SNP list.

In short, the authors of this paper identified a list of SNPs that supposedly differentiate college students from the general population, using a method that doesn’t actually identify differentiated SNPs. They then conducted a series of tests for association between those SNPs and intelligence-related traits, and didn’t find any more association than expected purely by chance. The list of genes identified this way is literally no better than what you’d get using two spins of a random number generator.

Who cares about methodological correctness, anyway?

What really makes me angry about this paper, though, is this: there are ways to do it right. The authors could have talked to a population geneticist, who would have told them to use FST or a similar measure of genetic differentiation. They could have used any number of methods to correct for the multiple testing problem in their final test for associations. And, in fact, someone must have pointed that second one out to them, because here’s what they write in the final paragraph of the paper:

… we analyzed all significant main effects at the P ≤ 0.01 level, without using more stringent corrections for multiple comparisons. We deemed this as an exploratory study to see if there were any behavioral or cognitive correlates of the SNPs in HWD. These results should provide bases for future confirmatory hypothesis-testing research.

In other words, they’re just fishing around for genes, here, so why should they actually perform a statistically rigorous test? But precisely because they don’t correct for multiple testing, any money spent on “future confirmatory hypothesis-testing research” would be wasted—it might as well start with a random selection of SNPs from the original list the authors chose to examine.

Given the nature of its subject matter, it’s appalling to me that this paper made it through peer review and into a scientific journal. It certainly wouldn’t have made it into a journal whose editors and reviewers understood basic population genetics. If I had to guess, I’d speculate that Culture and Brain doesn’t have any geneticists in its reviewer rolls—the fact that the authors spend a large chunk of their Introduction simply explaining Hardy-Weinberg Equilibrium suggests that their audience is people who don’t know much about the kind of data being presented.

And that’s where we come to the real lesson of this study. It’s getting cheaper and easier to collect genetic data with every passing day—to the point that researchers with no prior expertise or experience with genetic data can now do it. I’m afraid we’re going to see a lot more papers like this one, in the years to come.◼

Reference

Chen C., Chen C., Moyzis R.K., He Q., Lei X., Li J., Zhu B., Xue G. & Dong Q. Genotypes over-represented among college students are linked to better cognitive abilities and socioemotional adjustment, Culture and Brain, DOI:

Clark A.G., Nielsen R., Signorovitch J., Matise T.C., Glanowski S., Heil J., Winn-Deen E.S., Holden A.L. & Lai E. (2003). Linkage disequilibrium and inference of ancestral recombination in 538 single-nucleotide polymorphism clusters across the human genome, The American Journal of Human Genetics, 73 (2) 285-300. DOI:

Science online, travellin’ yeast edition

Monarch (Butterfly), Virginia Like monarch butterflies? Plant milkweed, stat! Photo by Dave Govoni.
  • This week, at Nothing in Biology Makes Sense! I discuss my latest paper, study in reconstructing evolutionary relationships with genome-wide data.
  • And, at the Molecular Ecologist: How human migration has shaped the diversity of our domesticated microbes.
  • Yow. Tracking changes in people’s personal microbial communities during a roller-derby.
  • Awww. Bees are better able to remember flowers that offer them caffeinated nectar.
  • Eek. The CDC’s warnings about antibiotic-resistant bacteria are getting scary.
  • Yay! Ambitious plans to genetically engineer a blight-resistant American chestnut are looking promising.
  • Oy. Another meteorite, another claim of fossilized extraterrestrial life.
  • Not good. This year’s overwintering monarch butterfly population is worryingly small.
  • Nifty. For some early birds, feathers on their legs might have formed a second pair of wings.
  • Heh. Seven things that are older than the (creationist) universe.

Nothing in Biology Makes Sense: Making sense of gene tree conflict across an entire genome

The only illustration in The Origin of Species. Image via Wikimedia Commons.

This week at Nothing in Biology Makes Sense, I discuss my latest research paper, which has just been published online ahead of print in Systematic Biology. In it, my coauthors and I use a genome-wide data set to reconstruct relationships among a couple dozen species in the genus Medicago—a data set that proved to be kind of a challenge.

Using that data, we identified some 87,000 individual DNA bases that varied among the sampled species—single-nucleotide polymorphisms, or SNPs. That’s not a lot in terms of actual sequence data—but considering that every one of those 87,000 SNPs is a variable character, and that most of them were probably spread far enough across the genome to have independent evolutionary histories, it contains many more independent “gene trees” than most DNA data sets used to estimate phylogenies.

To learn how we tackled all those gene trees, and what we found when we did, go read the whole thing.◼

Science online, frustrated angiosperms edition

2006.04.28 - beavertail pricklypear flower Pollen counts. Photo by jby.

Human evolution, animated

Yes, yes, evolutionary change occurs in populations, not individuals. But this animation does a rather nice job of illustrating those population-wide changes in the lineages closest to modern humans.

Via The Hairpin.◼

The Molecular Ecologist: If genes aren’t independent “beans,” speciation is easier

Three-spined stickleback profile Threespine sticklebacks are a classic case of speciation caused by natural selection. Photo by wolfpix.

This week at The Molecular Ecologist, my friend and collaborator Chris Smith writes, with two coauthors, about a new study simulating adaptive speciation in the face of gene flow, and the effects of linkage among genes involved in the adaptive divergence:

Models of speciation that involve ongoing gene flow remain controversial because gene flow is expected to homogenize differences between populations. However, genome-level effects may facilitate speciation with gene flow. For example, selection against immigrants may have the effect of reducing realized gene flow, even at loci that are not under divergent selection (Rundle & Nosil 2005). This global reduction in gene flow and increased divergence across the genome due to divergent selection is termed ‘Genome Hitchhiking’ (Feder et al. 2012). Genome hitchhiking may be enhanced by fitness epistasis – multiple loci interacting synergistically to cause reductions in fitness that are greater than selection acting on any one locus.

It turns out that speciation is more probable in models that don’t treat genes like independently evolving beans in a beanbag, bearing out a classic criticism of simple speciation models made most prominently by Ernst Mayr. However, true linkage among the selected genes isn’t necessary, either. All in all, this is an exciting new development for those of us who think natural selection might be important in forming new species, so you should definitely go read the whole thing.◼

Nothing in Biology Makes Sense: For adaptation, environmental change sets the pace

Polar Bear 2 How fast can the environment change, if living populations are to adapt? Photo by susanvg.

This week at Nothing in Biology Makes Sense! Devin Drown looks at a new experimental evolution study of adaptation in response to a changing environment—in this case, bacteria evolving in response to increasing concentrations of an antibiotic.

In the case of a rapidly changing environment, there are only a handful of solutions and most of the test populations go extinct before the mutations occur. For populations that experience a slow increase in the deathly poison, there appear to be many more ways to evolve resistance. What is especially fascinating about this research is that it appears that these pathways to resistance are only available when the environment changes slowly.

The results have significant implications for how we expect natural populations to respond to climate change and other human-caused environmental shifts—but it’s also a mighty cool experiment. Go read all about it.◼

Bully-land

Anti-bullying Respect Tour 2009 What would it take to eliminate bullying? Photo by Working Word.

In 2006, just about when we were all starting to see the light at the end of the Bush Administration, Sarah Vowell totally rearranged my perspective on U.S. politics:

High school … is the most appropriate metaphor for life in a democratic republic. Because democracy is an idealistic attempt to make life fair. And while high school is the place where you read about the democratic ideal of fairness, it is also the place most of us learn how unfair life really is. Who you are is informed by who you were then. And every nerd has an anecdote or two to tell about how Nerds versus Jocks is not just some epic mythological struggle but a pesky if normal way of life.

Vowell’s essay “The Nerd Voice” (originally published in as part of buy it over on The Partly Cloudy Patriot) starts from the observation that the differences between Presidential candidates Al Gore and George W. Bush are neatly encapsulated in the high school archetypes of Nerd and Jock, and from that spins an entire worldview. Alongside the Nerd are the poor kid, the undocumented kid, the disabled kid, the gay kid—the Democrats’ patchwork coalition of the unpopular lunch table. Arrayed against them: the Jock’s friends the rich kid, the casual racist, the popular kid who never has time for less-popular kids, the socially powerful who need not even acknowledge that they have power. Natural-born Republicans, every one.

More than a decade later, it barely counts as a metaphor to invoke the social strata of the schoolyard in reference to politics. The image of the Bully and the Bullied—the only slightly darker angle on the Nerd and the Jock—is routinely conjured by folks on both sides of the political spectrum to directly describe the actions They want to perpetrate against Us. In this atmosphere, where bullying is simultaneously a political issue and a unifying theory of politics even as the political discourse feeds back to shape the interactions of children in the schoolyard, Emily Bazelon’s book Sticks and Stones offers the hope of understanding not just our own high school traumas, or the experiences of children who are bullied today, but the way social power is wielded in American society.

Sticks and Stones grows out of Bazelon’s extensive writing on bullying for Slate. The book is structured around three specific cases: a girl bullied by upperclassmen for (a least initially) picking the wrong haircut; a gay boy in a rural school district; and a girl accused of contributing to a classmate’s suicide. The details of each case study inform Bazelon’s accounts of the others, and all three serve as starting points for discussion of broader context: the history of public schools’ legal responsbilities to protect students from bullying, historical and ongoing social research, and the evolving role of online media in teenagers’ social lives.

Bazelon’s handling of all this material is clear, precise, and cautious, even as she maintains empathy with (almost) everyone concerned in her three case studies. Early on, she establishes a working definition of bullying (from on the work of pioneering psychologist Dan Olweus), as “verbal or physical aggression that [is] repeated over time and that [involves] a power differential.” This allows differentiation between bullying and “drama,” or jostling for social status among near-equals.

That can still be a difficult line to draw, as the accounts in Sticks and Stones demonstrate—interviewing the bullies in her first case study, Bazelon rapidly establishes that what feels like bullying to the victim is percieved, by the bullies, as normal and necessary social interaction. They’re concerned to learn that the bullied girl, Monique, couldn’t handle their taunting, but not that they’d done something inappropriate.

Monique, in the eyes of these girls … hadn’t learned how to play the game; how to mock other kids and be mocked by them. This was the key to scaling the heights of middle school, if that was your goal. If you wanted to be one of the popular kids in Aminah’s mental chart, you had to learn how to trade barbs, to give as good as you got. … “You have to defend yourself,” Gianna told me.

Sticks and Stones also draws a distinction between kids who bully from positions of social or physical strength (think Mitt Romney, the son of a governor and CEO), and those who bully to shore up a precarious position low in the social pecking order. As Bazelon recounts the case of Jacob, a gay kid dealing with anti-gay bullying in a rural New York high school, it emerges that his chief antagonist may fall into the latter category, the “bully-victim.” Jacob’s bully goes after him in a broader social climate in which queer kids are fair game—where a school administrator can shrug off reported harassment by blaming the victim: “To the extent that the child isn’t ready to project their sexuality in a responsible way, the peers may not respond appropriately, either.”

In that context, a socially marginalized boy looking to prove his toughness has an obvious target.

Bullying Photo by JLM Photography.

Bazelon also finds that the roles of bully and bullied can shift rapidly, as demonstrated in the experience of Flannery Mullins, one of several students at South Hadley High in Massechusetts who were accused of bullying a classmate, Phoebe Prince, until she committed suicide. (Bazelon wrote extensively about this case for Slate.) However, it emerges that most of what came to be understood as bullying in the wake of Prince’s suicide originated as interactions that could have looked like nothing more than standard high-school drama over dating relationships—until they interacted, tragically, with Prince’s family life and fragile mental health.

Except, what does that say about what we’re willing to consider “standard high-school drama?”

The picture built in Sticks and Stones suggests that although bullying has become strongly associated with particular parts of society—queer kids, most notably, in the era of “It Gets Better”—there is something about bullying that is quite independent of any particular characteristic that may currently attract bullying. That is, to borrow a thought from Tony Kushner, it is conceivable that some future American society might treat queer young people just the same as it treats straight young people—and still allow all its young people to bully and be bullied, as part of the “normal” cost of growing up.

In other words, the problem of bullying is not about who, specifically, suffers the slings and arrows of life at the bottom of the social ladder. It’s about the existence of the ladder, and what we—parents, school staff, peers—allow teenagers to do and say to establish and enforce their places on it. Sticks and Stones surveys efforts to change exactly these things, and while Bazelon’s description of some specific programs seems hopeful, it’s also clear that they require sustained effort by teachers, administrators, students, and parents—and everyone involved must start from the shared realization that a school’s culture needs to change.

And, on some level, cultural acceptance of bullying is not about particular schools (though, of course, some are worse than others). It’s about our expectations for the very experience of high school. Fixing that will take more than local anti-bullying programs, or legalized marriage equality, or even the best anti-bullying laws. It will require Americans to re-examine how we treat each other, and how we treat those less powerful than us, in the schoolyard and beyond.◼

I was able to read Sticks and Stones for free in advance of publication, via NetGalley.

Carnival of Evolution, March 2013

Carnival Parade in Aachen 2007 Photo by Franz Patzig.

The 57th edition of the Carnival of Evolution is hosted today at Nothing in Biology Makes Sense! Head over there for a month’s worth of online writing about evolution, scientific history, and the personal experiences of biologists.◼

Science online, sequestered labs edition

Red Wolf A red wolf. Photo by Jim Liestman.