Following up on last week’s post about uncovering hidden species using DNA diversity (or “DNA barcoding”), an open-access paper in this week’s issue of PNAS demonstrates a potentially significant glitch in the system: mitochondrial pseudogenes.
The original DNA barcoding concept is straightforward, if not uncontroversial – use a standard DNA sequence marker to identify (“barcode”) species that might be challenging to ID otherwise, or previously not known as separate species. The proposed standard marker is a mitochondrial gene that codes for the protein cytochrome oxidase I (COI), which varies quite a bit between animal species (though it wouldn’t work for plants, whose mitochondrial DNA mutates very rarely). The lab where I work has used COI for a lot of studies in yucca moths, though not barcoding per se.
One potential problem with barcoding is that sequencing any gene in one species using procedures derived from another species is always a bit risky. DNA sequencing relies on primers, short snippets of DNA that bind to a region near the target gene as part of the reaction that makes lots of copies of that gene for analysis (this is called PCR, for polymerase chain reaction). The easiest way to get sequence data for a new species is to try and use primers from a close relative – if there aren’t any mutations at the primer site, they should carry over. But mutation happens, and it can definitely happen at primer sites.
Primer site mutations are a minor problem compared to pseudogenes, the focus of the new paper by Song et al. Pseudogenes are a result of gene duplication, a mutation in which an extra copy of a gene is accidentally created during DNA replication. Because it’s redundant, the extra copy can absorb mutations that destroy its function without harming individuals who carry it. The duplicate is then “junk DNA,” free to accumulate mutations – a pseudogene. (Gene duplication is also one way that new proteins and gene functions can evolve – but that’s beyond the scope of the present post.) A primer site mutation just means that primers from one species won’t work on another, but a pseudogene might still bind to primers. And then you can get sequence data from the pseudogene instead of the target gene.
DNA barcoding identifies species based on how many mutations have accumulated since they split from a common ancestor; a pseudogene, which mutates faster, can make two samples look further apart then than they are. So barcoding studies that accidentally use pseudogenes may identify two species where only one exists. Song et al. use data on mitochondrial pseudogenes in insects and crustaceans to argue that pseudogenes are both common and unpredictable. They also perform barcoding on grasshoppers and crustaceans using data “contaminated” with pseudogenes and data without – unsurprisingly, pseudogenes inflated the number of species detected by barcoding. Although Song et al. suggest a few ways to reduce the odds of interference from pseudogenes, they conclude that there is no way to completely eliminate this problem.
Last week’s paper by Smith and colleagues showed the importance of species identification for conservationists, ecologists, and evolutionary biologists. This new result suggests that DNA barcoding may not be the best way to identify species.
P.D.N. Hebert, A. Cywinska, S.L. Ball, J.R. deWaard (2003). Biological identifications through DNA barcodes Proc. Royal Society B, 270 (1512), 313-21 DOI: 10.1098/rspb.2002.2218
H. Song, J.E. Buhay, M.F. Whiting, K.A. Crandall (2008). Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified PNAS, 105 (36), 13486-91 DOI: 10.1073/pnas.0803076105