Canadian Centre for DNA Barcoding

DNA Barcoding Barcode of Life Initiative

CCDB Masthead
CCDB image

Barcode of Life Initiative: Barcoding Basics

U.S. patent #2,612,994 was issued on October 7th, 1952, for a “Classifying Apparatus and Method” invented by Joseph Woodland and Bernard Silver in the late 1940s. Their system of four white lines on a dark background was later developed into the Universal Product Code (UPC) in the early 1970s (Brown 1997). Today, UPCs are used by more than one million companies operating in over 100 countries, with about five billion items scanned each day, and generating estimated annual savings of $17 billion in the grocery industry alone (Brown 2001). Although it is still necessary for product details to be entered into regulated databases overseen by specialized agencies, barcodes have exerted their revolutionary impact on industry by alleviating the need for each barcode user (e.g., grocery clerks) to identify thousands of different products from memory or to use a time-consuming manual look-up procedure.

Modern UPC symbols use an 11-digit series of lines, each representing one of 10 numerals, for a total of 1011 unique combinations. Each position in a DNA sequence has only four possible “numerals,” but a stretch of only 15 nucleotides nonetheless provides 415 (> 1 billion) possible combinations. Even if only third codon positions evolve in a sufficiently random way to generate these combinations, it is necessary to sample a stretch of just 45 nucleotides to generate the same combinatorial diversity. In contrast to the UPC system, in which a completely unique commercial barcode (which need only differ by one digit from other barcodes) is consciously assigned to each product, distinctive DNA barcodes are generated through the accumulation of random mutations between reproductively isolated groups of organisms. However, with a modest 2% per million year rate of sequence evolution, a 600 bp segment of DNA will, in theory, provide 12 diagnostic nucleotide differences between any two species that have been separated by only one million years (Hebert et al. 2003a). In practical terms, a DNA barcoding program involves determining and comparing the nucleotide sequences of several hundred base pairs from a particular gene region to provide an immediate diagnosis of species. As with UPC barcodes, this will require the assembly of a comprehensive library linking each barcode sequence to a particular item—in this case, the name of a species described previously through traditional taxonomic approaches.

Thus far, work on animals has employed a standard region of the mitochondrial genome to provide species-specific DNA barcodes. Mitochondrial DNA is advantageous because it is present in several copies per mitochondrion, which in turn number in the hundreds per eukaryotic cell. This high copy number makes it far simpler to recover mitochondrial genes than their nuclear counterparts from small amounts of tissue or when DNA preservation is poor. To maximize efficiency, our barcoding efforts with animals have focused on the development of a system based on a single sequence read: a 648 base pair region near the 5’ end of the ubiquitous cytochrome c oxidase I (COI) gene.