
The monthly compendium of online writing about all things evolution-y is live at Sorting out Science.◼
The monthly compendium of online writing about all things evolution-y is live at Sorting out Science.◼
This week at the Molecular Ecologist, I discuss a new, genome-wide study of genetic differentiation between two closely related species — the collared flycatcher and the pied flycatcher.
Equipped with the core genome sequence, the team collected still more sequence data from ten male flycatchers of each species, and aligned these additional sequences to the genome sequence, identifying millions of sites that vary within the two species, and millions of sites where they share variants. They scanned through all these sites to identify points in the genome where differences between the two small samples of flycatchers were completely fixed — that is, sites where all the collared flycatcher sequences carried one variant, and all the pied flycatcher sequences carried a different variant. The frequency of these fixed differences varied considerably across the genome, but there are dozens of spots where they’re especially concentrated, forming peaks of differentiation.
To learn what all those “islands of divergence” could tell us about how the two flycatchers came to be different species, go read the whole thing.◼
Cross-posted at Nothing in Biology Makes Sense!
In the course of adaptive evolution — evolutionary change via natural selection — gene variants that increase the odds of survival and reproduction become more common in a population as a whole. When we’re only talking about a single gene variant with a strong beneficial effect, that makes for a pretty simple picture: the beneficial variant becomes more and more common with each generation, until everyone in the population carries it, and it’s “fixed.” But when many genes are involved in adaptation, the picture isn’t so simple.
This is because the more genes there are contributing to a trait, the more the trait behaves like a quantitative, not a Mendelian, feature. That is, instead of being a simple question of whether or not an individual has the more useful variant, or allele, at a single gene — like a light switch turned on or off — it becomes possible to add up to the same trait value with different combinations of variants at completely different genes. As a result, advantageous alleles may never become completely fixed in the course of an adaptive evolutionary response to, say, changing environmental conditions.
That principle is uniquely well illustrated by a paper published in the most recent issue of Molecular Ecology, which pairs classic experimental evolution of the fruitfly Drosophila melanogaster with modern high-throughput sequencing to directly observe changes in gene variant frequencies during the course of adaptive evolution. It clearly demonstrates that when many genes contribute to adaptation, fixation is no longer inevitable, or even necessary.
Turning up the heat, homogenizing flies
The authors of the new study, a team from the Institut für Populationsgenetik led by Pablo Orozco-terWengel, conducted what would otherwise be a rather simple experiment in evolutionary change in the laboratory. Starting with fruitflies collected from a wild population in Portugal (yes, Virginia, Drosophila melanogaster has wild populations!) they established three replicate populations of about 1,000 flies, which they put in temperature-controlled conditions somewhat warmer than the original collection location, and allowed them to propagate for 37 generations. Exensive previous work with Drosophila has established that simply moving the flies into a laboratory setting — where they live in bottles, and eat prepared food — exerts natural selection on them, and the increased temperature added a little bit more novelty to the lab environment to make it more likely adaptation would occur.
This experiment is different from all that previous experimental evolution of Drosophila, though, is that the coauthors tracked allele frequencies at thousands of markers during the course of those 37 generations of adaptation to the lab. To do this efficiently, they used an approach called “pooled sequencing.”
The principle behind pooled sequencing is that, if all you care about is the relative frequency of a gene variant in a whole population, you don’t need to know the genotype of any specific individual in that population. So to track changes in allele frequency, the team sampled hundreds of flies from the experimental population, and ground them all up together. (The polite, technical term used here is “homogenized.”) They then extracted DNA from this “pooled” sample, and used a high-throughput sequencer to collect millions of reads — short snippets of DNA sequence — out of the pool as a whole.
To extract allele frequencies from all of those sequence reads, the team identified where each read matched the Drosophila melanogaster reference genome. When multiple reads matched to the same location, but differed in one or more DNA nucleotide bases, they identified those bases as variable markers — single-nucleotide polymorphisms, or SNPs. Because the original DNA sample was pooled from many mashed-together flies, the relative frequency of each different variant of a SNP in the Illumina output should reflect the relative frequency of that SNP variant in the population as a whole.
Using this approach, Orozco-terWengel et al. could track allele frequency changes across more than a million SNP markers by taking these pooled samples from the intial population of flies, then at multiple points during the 37-generation evolutionary experiment. By comparing the allele frequencies in samples taken during the course of adaptation to the allele frequencies in the sample from the starting population, they could identify SNPs that became more common as the population adapted — and, because they had a big sample from across the genome, they could identify those SNPs whose allele frequencies had changed more than would be expected due to genetic drift. They examined samples taken after 15 and 27 generations of evolution, and at the end of the 37-generation experiment.
Two paths to adaptation
What they found was largely in line with the verbal model I outlined at the beginning of this post. Over the course of experimental evolution, significant increases in allele frequency occurred at thousands of SNPs — suggesting that a great many genes are involved in the process of adaptation to life in the lab. Accordingly, very few of those allele frequency changes (in about 0.5% of the 2,000 SNPs that showed the greatest change from start to finish) represented complete or near-complete fixation.
More interestingly, comparison of allele frequency changes at the 15th generation and at the end of the experiment revealed two major “paths” taken by alleles. In the first case, the SNPs with strongest allele frequency changes by generation 15 all hit a “plateau” in subsequent generations — they didn’t see any significant increase in frequency between generations 15 and 37. In the second case, SNPs with the strongest allele frequency changes by generation 37, the end of the experiment, had increased steadily from the beginning population through the samples taken at the 15th and 27th generation. The SNPs in this second set had not shown significant allele frequency increases by generation 15 — which means the SNPs underlying most of the adaptive change in the first half of the experiment were a completely different set than the SNPs underlying adaptive change in subsequent generations.
If it’s already adapted, don’t fix it.
On the one hand, that suggests that Orozco-terWengel et al. managed to capture SNPs with a range of different contributions to the adaptation the observed by the end of the experiment. The SNPs with the biggest contribution showed rapid initial increases in allele frequency, then leveled off; SNPs with weaker effects showed slower, steady increases that continued for the entire experiment. But if it’s that simple, why didn’t the large-effect SNPs show continuing allele frequency change after the midpoint of the experiment?
It may be, as the coauthors speculate, that the two classes of SNPs identified in their experiment are separated by more than just the size of their respective contributions to adaptive change. There could be interactions among the alleles at these SNPs, such as overdominance, in which an individual is most fit when he or she carries two different alleles at a locus, rather than two copies of either allele. Overdominance would explain why most of the SNPs showing rapid initial increases in allele frequency then leveled out at intermediate frequencies.
So this combination of experimental evolution and modern sequencing technology raises some interesting questions even as it supports a lot of previous thinking about how natural selection acts on traits that are created by the collective action of many genes. It’s an exciting result, and, I hope, inspiration for much more work digging into the details of such “polygenic” adaptation.◼
References
Burke, M. and A. Long. 2012. What paths do advantageous alleles take during short-term evolutionary change? Molecular Ecology 4913–4916. DOI: 10.1111/j.1365-294X.2012.05745.x.
Orozco-Terwengel, P., M. Kapun, V. Nolte, R. Kofler, T. Flatt and C. Schlötterer. 2012. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Molecular Ecology 4931–4941. 10.1111/j.1365-294X.2012.05673.x.
Pavlidis, P., D. Metzler and W. Stephan. 2012. Selective sweeps in multi-locus models of quantitative traits. Genetics 192:225–239. DOI: 10.1534/genetics.112.142547.
Over at Nothing in Biology Makes Sense!, Devin Drown describes an interaction between aphids and a species of wasp who lay their eggs in the aphids so their larvae can eat the aphids alive. A new study tests whether the success of a wasp larva infecting an aphid depends on the specific genetics of the wasp, and of a bacterial symbiont the aphid carries:
The Vorburger group studies a crop pest aphid, Aphis fabae, and its common wasp parasitoid, Lysiphlebus fabarum. The adult parasitoids lay their eggs in unsuspecting aphid hosts. As the parasitoids develop they battle the hosts defenses. Some aphid hosts are also infected with a bacterium symbiont, Hamiltonella defensa, which can provide protection against the parasitoid by releasing bacteriophages that target the parasitoid invader (Vorburger et al 2009; Vorburger and Gouskov 2011). If the wasp parasitoid can evade all the host defenses then eventually it develops inside the still living aphid. Eventually, as the authors describe in grisly detail
“metamorphosis takes place within a cocoon spun inside the host’s dried remains, forming a ‘mummy’ from which the adult wasp emerges” (Rouchet and Vorburger 2012).
To learn how Vorburger et al. evaluated the importance of wasp genetics for successfully mummifying aphids, go read the whole thing.◼
This week at Nothing in Biology Makes Sense! Noah Reid takes a cue from Bill Nye the Science Guy and applies information theory to test whether a model of divine intervention fits a simple phylogenetic dataset.
Without getting into the details, we can think of information theoretic criteria for model selection as formally implementing Occam’s Razor: the simplest model with the most explanatory power is to be preferred. By preferring simple models, you guard against overinterpreting data, a pitfall that can make models poor predictors of new observations.
So, I realized as long as we can formulate any mathematical model of “The Hand of God”, rejectable or not, we can compare it to an evolutionary model in this framework. If, as Nye suggests, evolutionary theory is simple and powerful, and creationism is a model of fantastical complexity that doesn’t much improve our understanding of the data, information theory would help us sort that out.
If you want to settle the whole evolution-versus-creationism thing once and for all (okay, not really), or just learn how biologists use information theory to select models (really!), go read the whole thing.◼
This week at the collaborative blog Nothing in Biology Makes Sense!, Jon Yoder (my brother) takes a look at the possible evolutionary origins of type II diabetes from his perspective as a medical student:
Currently, around 285 million people worldwide are affected and that number could potentially climb to 430 million by the year 2030. Diabetes also accounts for 12% of all health care expenditure. It is also a highly genetically associated disease, at least Type 2 Diabetes. Now, in type 2 diabetes the individual will have high levels of circulating insulin. Insulin is a key regulator of fat storage. It is released following meals in response to glucose from the meal and stimulates the uptake of that glucose into liver, muscle and fat. It also acts to antagonize other hormones that would breakdown and use the stored glucose as energy. So, this is where I got to thinking, if there is a gene that is linked evolutionarily to helping survive famine, is there a potential link between such genes and diabetes.
To find out more, go read the whole thing.◼
I’ve got a new post up over at The Molecular Ecologist, discussing a new paper that tries to take a quantitative approach to a phenomenon that keeps turning up in human population genomic datasets, in which genetic data mirrors the geography of the places it was collected.
It’s something of a classic result in human population genomics: Go out and genotype thousands of people at thousands of genetic markers. (This is getting easier to do every day.) Then summarize the genetic variation at your thousands of markers using Principal Components Analysis, which is a method for transforming that genetic data set into values on several statistically dependent “PC axes.” Plot the transformed summary values for each of your hundreds of samples on the first two such PC axes, and you’ll probably see that the scatterplot looks strikingly like the map of the places where you collected the samples.
Of course “looks strikingly like” is not a very quantitative statement. To see how the new study deals with that problem, go read the whole thing. And yes, I manage to shoehorn in a reference to the Muppets.◼
The monthly roundup of online writing about descent with modification is online at the Stochastic Scientist. Dig in!◼
And now I present my first “real” post as a contributor at the Molecular Ecologist, a discussion of a new review article pointing out that population geneticists aren’t doing a great job dealing with one of the best-known patterns in population genetics, isolation by distance, or IBD. You may recall that I discussed IBD in a more historical context way back in the day on this very website. It’s simply a pattern in which populations located close to each other are more genetically similar than populations farther away from each other, which arises because most critters (or their seeds, or larvae, or pollen) are less likely to move longer distances. But IBD can be conflated with a number of other patterns population geneticists often try to detect:
So let’s say you’ve collected genetic data from sites on either side of a line you think might be biologically significant—a pretty standard-issue population genetics study. You run your data through Structure, and find two clusters of collection sites that line up pretty well with that Line of Hypothesized Biological Significance. As a followup, you conduct an AMOVA with the collection sites grouped according to their placement by Structure, and you find that the clusters explain a significant fraction of the total genetic variation in your data set. Therefore, you conclude that the LHBS is, in fact, a significant barrier to dispersal.
Except that as we’ve just discussed, everything you’ve just found could be a consequence of simple IBD plus the fact that you’ve structured your sampling so that your LHBS happens to bisect the landscape you’re studying. And just to add to the frustration, even if you’d started out by testing for IBD before you started with all of the tests for population structure, a significant result in a Mantel test for IBD wouldn’t necessarily mean that population structure wasn’t there.
To find out how the author of the new review article suggests we deal with the complications outlined above, go read the whole thing.◼
Just up at the collaborative science blog Nothing in Biology Makes Sense!: Sarah Hird takes down the “paleo diet” trend, which is based on eating what we think our ancestors ate before the invention of agriculture. Readers of D&T will recognize some of the points Sarah makes:
… this assumes that no evolution has occurred since the advent of agriculture. This is demonstrably false. One example of post-agricultural evolution is the human lactase gene, which breaks down lactose, the dominant sugar in milk. In ancestral humans this gene was turned off after infancy; those humans would have been “lactose-intolerant”. Most humans of European descent now have a mutation that keeps that gene turned on their entire lives. Not surprisingly, this gene spread throughout Europe at approximately the same time cattle were domesticated. There are other known examples of agricultural dietary adaptation, and doubtless more to be discovered. If we are going to use evolution to justify our dietary choices, why throw out the last 10,000 years of it?
That’s just a taste (heh) of Sarah’s objections; for the full case against trying to eat like a hunter-gatherer, you’ll need to go read the whole thing.◼