Matthew Moulton and Dr. Michael Whiting, Department of Biology
Introduction
DNA barcoding is a method of species identification based on sequencing a short mitochondrial DNA fragment of Cytochrome Oxidase I (COI). A database of over 500,000 of these COI fragments, known as “barcodes,” has been established for nearly 38,000 species and is currently being used to help rapidly assign individuals to known species and highlight potential new species. However, the ability for DNA barcoding to correctly diagnose species is limited by the presence of nuclear mitochondrial pseudogenes (numts). Numts can be co-amplified with the mitochondrial ortholog when using universal primers, which can lead to incorrect species identification and an overestimation of the number of species. Some researchers have proposed that using more specific primers may help eliminate numt co-amplification, but the efficacy of this method has not been thoroughly tested.
Objectives
The purpose of this study was to address the following questions: What is the effect of using more specific primers on eliminating numt co-amplification in DNA barcoding? To what extent do primers of differing specificity amplify different types of numts? To what degree can quality control measures correctly identify and remove numts before barcoding analyses? To what extent are numts present within Orthoptera? We discuss the results of our findings in relation to these questions and the implications our results may have on DNA barcoding.
Materials and Methods
Taxon sampling
We selected 11 taxa from Orthoptera for use in studying the extent to which COI numts are present within Orthoptera. This sampling represents 11 lineages from both Ensifera and Caelifera whose complete mitochondrial genomes have been partially or completely sequenced. DNA from representatives of the 11 lineages was extracted and voucher specimens were deposited in the Insect Genomics Collection at Brigham Young University. Four polyneopteran taxa were selected as outgroups for phylogenetic analyses.
PCR, sequencing and cloning
Based on all published and unpublished sequences of orthopteran COI, we designed Orthoptera-specific primers to be better matches of orthopteran mitochondrial orthologs and taxon-specific primers to be perfect matches of the mitochondrial ortholog of each taxon. All primers designed for this study anneal in the same position as the Folmer primers on the mitochondrial genome. The Folmer region of COI was amplified via PCR using Elongase Enzyme mix (Invitrogen Corporation) and primers of different specificity and cloned using the TOPO TA Cloning Kit (Invitrogen). DNA from each cloned colony was amplified via PCR and sequenced using BigDye chain terminating chemistry (Applied Biosystems Incorporated) and fractionated on an automated sequencing machine at BYU (ABI3730xl, Applied Biosystems Incorporated).
Sequence analysis and phylogenetic methods
Sequence data were compiled in Sequencher 4.7 (Gene Codes Corporation) and contigs were assembled to identify unique haplotypes. Each haplotype was blasted to identify cloning error. Only haplotypes that blasted to insect DNA were included in this study. We quantified indels, stop codons, and point mutations by comparing each haplotype to its mitochondrial ortholog reference sequence. We identified putative orthologs, heteroplasmy, and numts from nucleotide and amino acid sequence data obtained. We assembled a total of seventeen datasets in order to explore how primer specificity might affect data generation and subsequent phylogenetic analyses. All datasets were aligned using MUSCLE and analyzed in a parsimony, Bayesian, and Neighbor Joining framework. We also calculated haplotype sequence divergence in MEGA 4 and determined the number of clusters that would be considered unique species under the DNA barcoding standard of ≥3% nucleotide sequence divergence.
Results
In this study, we found that numts can be co-amplified in all eleven taxa using standard barcoding primers. Increased primer specificity was able to help reduce, but did not eliminate numt co-amplification in all species tested. We also found that a number of numts do not have stop codons or indels, making it difficult to distinguish them from mitochondrial orthologs, thus putting the efficacy of barcoding quality control measures under question (Fig. 1). Our findings suggest that the taxonomic impact of numt co-amplification is large and more caution is necessary to identify and eliminate numts when using DNA barcoding for species identification.
Discussion
Implications for DNA barcoding analyses and future research. When DNA barcoding works correctly, the Folmer region is sequenced from an individual and is diagnosed as one species based on its similarity to known barcode sequences in a database. In our analyses, we find that a single individual exhibits sufficient diversity among the numt haplotypes generated, that it might mistakenly be diagnosed as multiple species. Although some have suggested that using more specific primers could possibly eliminate misidentification of species due to numts (Song et al., 2008), we show that this claim cannot be substantiated for these species. Numts are a major obstacle for DNA barcoding analyses. The more we search for numts, the more common they appear to be. In order to understand the extent to which numts are present in organisms, we suggest that in-depth surveys of numts should be performed across diverse lineages in the hopes that this may lead to development of sound methods for eliminating numt co-amplification. We suggest that proponents of DNA barcoding admit to the large taxonomic impact numt co-amplification can have (Hebert et al., 2004a) and seek for ways to eliminate misleading results due to numts.
References
- Hebert PD, Penton EH, Burns JM, Janzen DH, Hallwachs W (2004) Proceedings of the National Acadamy of Sciences U S A 101, 14812-14817.
- Song H, Buhay JE, Whiting MF, Crandall KA (2008) Proceedings of the National Acadamy of Sciences U S A 105, 13486-13491