Barcode of Life Initiative: Early Success
The first phase in the generation of a DNA barcoding system for eukaryotic life has involved demonstrating the feasibility of using COI for the identification of animal species, which has been done through a series of approaches. In the first explicit discussion and empirical test of the animal DNA barcoding concept, Hebert et al. (2003a) created COI “profiles” at three different taxonomic levels: 1) for each of the seven dominant animal phyla using 100 COI amino acid sequences obtained from the GenBank database, each from a different family and representing all available classes, 2) for eight of the largest insect orders using 100 COI amino acid sequences, each from a single representative of a different family, and 3) using COI DNA sequences from single individuals representing 200 closely related species of moths covering five families, collected around Guelph, Ontario. Roughly 100 additional COI amino acid sequences from species not included in the original datasets were then used to test whether these could be assigned to the correct phylum (level 1 test) or insect order (level 2 test) using only a DNA barcode. In these initial analyses, 96% of test species were correctly assigned at the phylum level and 100% of insects were placed correctly at the ordinal level (Hebert et al. 2003a). Most importantly, a comparison of 150 new COI DNA sequences from additional individual moths against the species-level profile (level 3 test) showed that 100% of individuals could be correctly identified once their species’ barcode was available.
Recognizing that a combination of high interspecific and low intraspecific variation in COI divergence is crucial for the implementation of a DNA barcoding system, Hebert et al. (2003b) carried out a follow-up study in which they performed more than 13,000 pairwise comparisons based on COI sequences from 2,238 animal species (447 genera, 11 phyla) in GenBank. They found that about 80% of such pairs showed >8% sequence divergence, and that more than 98% of pairs exhibited >2% divergence. By contrast, individuals from the same species exhibited a level of sequence variation averaging less than 0.3%. This striking difference between inter- and intraspecific variation has since been confirmed with new COI sequence data collected in targeted analyses of specific invertebrate and vertebrate taxa, including birds, fishes, collembolans, and various insect groups (Hebert et al. 2004a; Hogg and Hebert 2004; Hebert et al., unpublished). These results provided a firm empirical underpinning for the DNA barcoding approach, and made it clear that the principle should be put into practice. In one of the largest studies to date, Hebert et al. (2004a) generated DNA barcodes for nearly half of all bird species living in North America. The Hebert lab has also performed successful “blind” tests, in which only a small piece of tissue is provided by a collaborator to determine whether correct species designations can be obtained using only the barcode. An analysis using more than 4,000 specimens from 500 species of tropical insects answered this question strongly in the affirmative (Hebert et al., unpublished).
Interestingly, several analyses carried out by the barcoding group have proven to be “double blind” tests, in which not even the taxonomic expert was aware of the total species diversity present in the sample. That is to say, the barcoding studies have revealed the existence of previously unrecognized cryptic species in groups as different as birds and butterflies (Hebert et al. 2004a,b). In a particularly remarkable example, barcoding analyses showed that one neotropical skipper butterfly “species”, recognized since 1775 as Astraptes fulgerator based on adult morphology, actually consists of at least 10 species. Although the adults are indistinguishable, subsequent study based on these barcoding results revealed major differences in caterpillar morphology and food plant affinity (Hebert et al. 2004b).
It is safe to say that the animal barcoding effort has been more successful, and has advanced far more rapidly, than could have been anticipated. The number of available COI sequences has doubled over the past year, and will increase five-fold over the next year. Indeed, this rate of accumulation continues to accelerate: whereas the earliest studies involving a few hundred analyses (Hebert et al. 2003a) took over a year to complete, a recent survey of 1,000 fish specimens representing 1% of all known fish species was completed in only 10 days. This rapid acceleration in the capacity for animal barcoding has made it possible to launch projects that will lead to the generation of a comprehensive barcode library for the two largest groups of vertebrates (birds, fishes) in addition to initiating the phases of the overall barcoding program for groups other than animals.