Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698...

13
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=imdn21 Download by: [University of Toronto Libraries] Date: 22 July 2016, At: 12:28 Mitochondrial DNA Part A DNA Mapping, Sequencing, and Analysis ISSN: 2470-1394 (Print) 2470-1408 (Online) Journal homepage: http://www.tandfonline.com/loi/imdn21 Does a global DNA barcoding gap exist in Annelida? Sebastian Kvist To cite this article: Sebastian Kvist (2016) Does a global DNA barcoding gap exist in Annelida?, Mitochondrial DNA Part A, 27:3, 2241-2252, DOI: 10.3109/19401736.2014.984166 To link to this article: http://dx.doi.org/10.3109/19401736.2014.984166 View supplementary material Published online: 28 Nov 2014. Submit your article to this journal Article views: 70 View related articles View Crossmark data

Transcript of Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698...

Page 1: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=imdn21

Download by: [University of Toronto Libraries] Date: 22 July 2016, At: 12:28

Mitochondrial DNA Part ADNA Mapping, Sequencing, and Analysis

ISSN: 2470-1394 (Print) 2470-1408 (Online) Journal homepage: http://www.tandfonline.com/loi/imdn21

Does a global DNA barcoding gap exist inAnnelida?

Sebastian Kvist

To cite this article: Sebastian Kvist (2016) Does a global DNA barcoding gap exist in Annelida?,Mitochondrial DNA Part A, 27:3, 2241-2252, DOI: 10.3109/19401736.2014.984166

To link to this article: http://dx.doi.org/10.3109/19401736.2014.984166

View supplementary material

Published online: 28 Nov 2014.

Submit your article to this journal

Article views: 70

View related articles

View Crossmark data

Page 2: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

http://informahealthcare.com/mdnISSN: 2470-1394 (print), 2470-1408 (electronic)

Mitochondrial DNA Part A, 2016; 27(3): 2241–2252! 2014 Informa UK Ltd. DOI: 10.3109/19401736.2014.984166

FULL LENGTH RESEARCH PAPER

Does a global DNA barcoding gap exist in Annelida?

Sebastian Kvist

Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA

Abstract

Accurate identification of unknown specimens by means of DNA barcoding is contingent onthe presence of a DNA barcoding gap, among other factors, as its absence may result indubious specimen identifications – false negatives or positives. Whereas the utility of DNAbarcoding would be greatly reduced in the absence of a distinct and sufficiently sizedbarcoding gap, the limits of intraspecific and interspecific distances are seldom thoroughlyinspected across comprehensive sampling. The present study aims to illuminate this aspect ofbarcoding in a comprehensive manner for the animal phylum Annelida. All cytochrome coxidase subunit I sequences (cox1 gene; the chosen region for zoological DNA barcoding)present in GenBank for Annelida, as well as for ‘‘Polychaeta’’, ‘‘Oligochaeta’’, and Hirudineaseparately, were downloaded and curated for length, coverage and potential contaminations.The final datasets consisted of 9782 (Annelida), 5545 (‘‘Polychaeta’’), 3639 (‘‘Oligochaeta’’), and598 (Hirudinea) cox1 sequences and these were either (i) used as is in an automated globalbarcoding gap detection analysis or (ii) further analyzed for genetic distances, separated intobins containing intraspecific and interspecific comparisons and plotted in a graph to visualizeany potential global barcoding gap. Over 70 million pairwise genetic comparisons were madeand results suggest that although there is a tendency towards separation, no distinct orsufficiently sized global barcoding gap exists in either of the datasets rendering futurebarcoding efforts at risk of erroneous specimen identifications (but local barcoding gaps maystill exist allowing for the identification of specimens at lower taxonomic ranks). This seems tobe especially true for earthworm taxa, which account for fully 35% of the total number ofinterspecific comparisons that show 0% divergence.

Keywords

Annelida, cytochrome c oxidase subunit I,DNA barcoding, DNA barcoding gap,specimen identification

History

Received 2 October 2014Revised 29 October 2014Accepted 1 November 2014Published online 28 November 2014

Introduction

A sufficiently sized DNA barcoding gap, i.e. the difference ingenetic distance between the highest intraspecific variationand lowest interspecific divergence within a study group, is aprerequisite for accurate and effective DNA barcoding (e.g.DeSalle et al., 2005; Hebert et al., 2003a,b; Meier et al., 2008;Meyer & Paulay, 2005; Stoeckle, 2003; Wiemers & Fiedler,2007). Specifically, genetic distances below a certain thresholdwould suggest taxonomic uniformity among samples, whereasvalues above another threshold would indicate taxonomic dis-tinctness (Wiemers & Fiedler, 2007). In instances where thesethresholds overlap, users of DNA barcoding will not be able todistinguish between intra- and interspecific distances and, conse-quently, specimen identifications will be prone to error. Becauseof its preponderance, the existence and magnitude of a barcodinggap have long been tested for in several animal groups, and theresulting hypotheses are often irreconcilable. On one hand, Hebertet al. (2003b) showed that most animal groups present asufficiently sized barcoding gap while, on the other hand, morerecent studies suggest either a complete absence or an insuffi-ciently sized barcoding gap in different taxonomic groups

(e.g. Burns et al., 2007; Chapple & Ritchie, 2013; Elias et al.,2007; Meier et al., 2006; Meyer & Paulay, 2005; Wiemers &Fiedler, 2007). Beyond the standard distance-based methods oftenemployed by barcoders, it is important to note that severalcharacter-based methods also exist that, instead, take intoconsideration the presence of discrete synapomorphic nucleotidepolymorphisms to infer specimen identities [e.g. CAOS (Sarkaret al., 2002); DNA-BAR (DasGupta et al., 2005); BRONX (Little,2011); BLOG (Weitschek et al., 2013)].

Accurate specimen identification, when reliant on a barcodinggap, can be hampered in several different ways. Beyond theobvious effects that population dynamics, such as periods of rapidradiation, may have on the presence and size of a barcoding gap,incorrect identification of the reference specimens in the targetdatabase will result in incorrect taxonomic assignment of thequery specimen. Moreover, a de facto absence of a barcoding gapwill suggest to the investigator that the specimen under studycould pertain to several different species, or an erroneoussequence ID could be assigned with high confidence due to thelack of taxonomic coverage within the database. Whereasincorrect reference specimen identifications seem to be presentacross several animal groups (e.g. Kvist et al., 2014; Trewick,2008), our knowledge regarding the presence or absence of abarcoding gap across major animal lineages is still limited. Onefactor potentially limiting such an investigation is that a preciseassessment requires extremely thorough sampling of the study-group. That is, a full appreciation of the genetic variation within

Correspondence: Sebastian Kvist, Museum of Comparative Zoology,Department of Organismic and Evolutionary Biology, HarvardUniversity, 26 Oxford Street, Cambridge, MA 02138, USA. E-mail:[email protected]

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 3: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

barcodes [for zoological barcoding, this is typically a 658 basepairregion of the cytochrome c oxidase subunit I (cox1) gene,colloquially known as the ‘‘Folmer-region’’ (Folmer et al., 1994)]for a given group cannot emerge until we approach full taxonomicsampling. This is not to say that a global barcoding gap is anecessity for DNA barcoding to work for lower taxonomicgroups – local barcoding gaps will still allow for accurateidentification of specimens if some prior taxonomic knowledge ofthe study group can be ascertained such that the specimens can bea priori solidly placed within (e.g.) a family or genus. To date,however, the vast majority of taxonomic groups have yet to becomprehensively sampled and scrutinized for the presence orabsence of a barcoding gap.

One of these unsurveyed groups is the phylum Annelida, whichcurrently consists of over 17,000 recognized species (Zhang,2011), placing it among the larger invertebrate phyla. The phylumis divided into two classes: the paraphyletic ‘‘Polychaeta’’,including about two-thirds of the known annelid diversity, andClitellata (including the paraphyletic subclass ‘‘Oligochaeta’’ andthe monophyletic subclass Hirudinea), which accounts for theremaining one-third of the phylum (Erseus, 2005; Kvist &Siddall, 2013; Rouse & Pleijel, 2001; Struck et al., 2011; Weigertet al., 2014). Despite the relatively high species diversity withinthe phylum, and the ecological and economic importance of thegroup (Shain, 2009) only an estimated 2013 species wererepresented by a cox1 gene barcode in GenBank in 2013 (Kvist,2013). Whereas several studies have shown that DNA barcodingis a promising tool for specimen identification within lowertaxonomic ranks of Annelida (e.g. Bely & Weisblat, 2006; DeWit & Erseus, 2010; Hebert et al., 2003b; Kvist et al., 2010;Luttikhuizen & Dekker, 2010; Oceguera-Figueroa et al., 2010),there is a notable paucity of data in regard to the existence of aphylum-, class-, and subclass-wide barcoding gap – a necessityfor users of DNA barcoding. To shed light on the utility andsuitability of this tool for specimen identification within Annelida,the present study aims to comprehensively sample cox1 genesequences across the phylum as a whole and for each of the threemajor taxonomic groupings, scrutinize these for potential con-taminations and other anomalies, and assess the presence orabsence of a barcoding gap across Annelida, as well as‘‘Polychaeta’’, ‘‘Oligochaeta’’, and Hirudinea.

Methods

A detailed schematic of the workflow used in the current study ispresented in Figure 1 and all the final datasets, including allpartitions, are available as Supplementary material. Cytochrome coxidase subunit I sequences were downloaded on 29 July 2014from GenBank by sending the results (in FASTA format) of thefollowing search queries to separate files: ‘‘txid6340[Organism:exp] cox1 [Gene]’’, ‘‘txid6341 [Organism:exp] cox1[Gene]’’, ‘‘txid6381 [Organism:exp] cox1 [Gene]’’ and‘‘txid6403 [Organism:exp] cox1 [Gene]’’. This resulted in13,003 individual sequences for Annelida, 7640 sequences for‘‘Polychaeta’’, 4678 sequences for ‘‘Oligochaeta’’, and 685sequences for Hirudinea (Table 1). Thereafter, the four datasetswere further manipulated using TextWrangler (Bare BonesSoftware Inc., North Chelmsford, MA) in combination withMicrosoft Excel (Microsoft Corporation, Redmond, WA). Toenable comparisons of verified data, and to ensure intraspecificand interspecific comparisons, all unverified sequences wereremoved; i.e. those that included the term ‘‘UNVERIFIED’’, andso were sequences with imprecise taxonomic labels (e.g. ‘‘Gen.’’and/or ‘‘sp.’’). However, sequences with taxonomic labelsincluding ‘‘cf.’’ were assumed to be correctly annotated. Notethat these instances only composed a fraction of the full dataset

(n¼ 106, 0.8% of the Annelida dataset). In addition, fullmitochondrial genomic sequences were removed as these wereassumed to be already present in the dataset as simple cox1 genedata, and sequences with a length below 400 basepairs (bp) werealso removed.

Potential bacterial contaminants (see Siddall et al., 2009) werethen controlled for by a BLASTn search of the annelid cox1 genesequences against a previously used (Kvist et al., 2011) datasetconsisting of 50 complete bacterial genomes from most ofthe different classes of bacteria. In total, 14 sequences foundsignificant matches against the bacterial dataset during theBLASTn search. However, three of these (GU902050 Enchytraeuschristenseni Dozsa-Farkas, 1992, AY519464 Lumbriculusvariegatus Muller, 1774, and GU453374 Lumbriculus variegatus)found non-self hits against Annelida with at least one order ofmagnitude lower e-values and they were, therefore, considered tobe of annelid origin. The remaining 11 sequences were purgedfrom the dataset.

Sequence curation

Directionality of the sequences was recovered through iterativealignment of between 900 and 999 taxon segments of the datasetsemploying MAFFT ver. 7 (Katoh & Standley, 2013); thedirections of the sequences were plotted against sequences thathad been a priori determined to be in the 50–30 direction (e.g.GU902030 Achaeta aberrans Nielsen & Christensen, 1961 for theentire Annelida dataset and JQ908698 Allolobophora chloroticaSavigny, 1826 for the oligochaete dataset). The rationale behindthe segmentation of the datasets was that the online version ofMAFFT only provides a maximum of 999 directionality plots.Sequences with incorrect directionality were reverse comple-mented using the Sequence Manipulation Suite (Stothard, 2000)and each of them were re-BLASTed (using BLASTn) againstGenBank to confirm the correct direction of the complement.Equal directionality was also cross-controlled by re-aligning allcorrected segments. In addition, sequences that completely lackedoverlapping coverage for any nucleotide (those that weresequenced for a different region) were removed from the dataset.The so-called ‘‘Folmer-region’’ (i.e. the region between theuniversal primers suggested by Folmer et al. [1994]; LCO1490[50-GGTCAACAAATCATAAAGATATTGG-30] and HCO2198[50-TAAACTTCAGGGTGACCAAAAAATCA-30]) has emergedas the most commonly amplified region for barcoding purposes.The region covers a 658 bp stretch close to the 50-region of thecox1 gene (Folmer et al., 1994). Because of the wide use of the‘‘Folmer-region’’ and, therefore, the high presence of this regionwithin the dataset, sequences were truncated to cover only thisregion in order to elevate matrix occupancy in the dataset; notethat several other primer pairs exist that amplify part of, or thefull, Folmer-region and that usage of these did not exclude thesequences from the analyses performed here. The truncation wascarried out using Mesquite v.2.5 (Maddison & Maddison, 2010).Regions with homopolymer N’s were removed from the datasetwhen these positions were not shared by any other nucleotidefor any of the other taxa. Sequences showing oddities in thealignment were BLASTed (using BLASTn) against GenBank nr;in cases where the best non-self hit (including sequences fromthe same project) was against a different phylum, that sequencewas removed.

Genetic distances

MAFFT ver. 7 was used to align sequences within each ofthe datasets (Annelida, ‘‘Polychaeta’’, ‘‘Oligochaeta’’, andHirudinea) separately by employing the FFT-NS-1 strategy(the only sufficiently memory-effective method for this many

2242 S. Kvist Mitochondrial DNA Part A, 2016; 27(3): 2241–2252

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 4: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

sequences) with a gap opening penalty of 3.0, the 20PAM/K¼ 2scoring matrix, and an offset value of 0.0. Mesquite was then usedto assemble the final nexus files. Uncorrected p distances (seeSrivathsan & Meier, 2012) between the sequences were calculatedin PAUP* v.4.0d98 (Swofford, 2003), under the function ofminimal evolution, ignoring gaps for affected sites, constraining

branch lengths to be non-negative, with equal rates for variablesites, and estimating variation for all substitutions. The files weresaved as ‘‘one-column’’ and UNIX language was used to spliteach file into two separate files on the basis of containingintraspecific or interspecific values (Figure 1). Because of thelarge size of the resulting files containing interspecific values,

Figure 1. Workflow used in the present study, including commands used (when appropriate) and the results gained. Note that analyses in the uppermostdarker shading regard sequence curation, whereas those in the bottommost lighter shading regard genetic distance analyses.

DOI: 10.3109/19401736.2014.984166 Does a DNA barcoding gap exist in Annelida? 2243

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 5: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

these were further split into separate files, each consisting of1,000,000 lines (Figure 1). The resulting files were managed inMicrosoft Excel where genetic distances were rounded to theclosest half percentage (i.e. 0.25–0.74% was rounded to 0.5%),and each instance of a half or whole integer were counted. Allcounts were thereafter assembled into single master files andgraphs were created to visualize the results.

To assess to what extent a potential lack of a barcoding gap isdue to outlier values, a randomly chosen subset of equal numbersof intraspecific and interspecific comparisons were also furtherprocessed in the same manner as the full dataset (Table 1).

ABGD

As a complement to the manual calculations, the softwareAutomatic Barcode Gap Discovery (ABGD, Ecol & EvolutionaryBiol, Eugene, OR; Puillandre et al., 2012) was employed to checkthe distribution and size of a potential barcoding gap. For thispurpose, a standalone executable version of the software was runlocally for each of the four datasets with the following settings:pmin¼ 0.001, pmax¼ 0.9, simple distances, number of bins¼ 100(Figure 1). Because ABGD does not handle ambiguous data, allsuch occurrences were changed to gaps in the ABGD dataset.

Note that the overall scheme presented here does not take intoaccount taxonomic synonyms, misidentifications or subsequenterratas to the taxonomic labels, nor does it pretend to equalizeglobal and local DNA barcoding gaps. However, it does give abroad perspective on the limits of DNA barcoding within thephylum, as it pertains to the presence or absence of a global DNAbarcoding gap.

Results

At a surprisingly high rate, sequences that lacked overlappingcoverage for any basepair in the curated dataset found matchesagainst non-annelid taxa when the full-length sequences werere-blasted against GenBank. The most typical scenario involvedtop hits against taxa included in the same study (presumablysequenced under the same conditions), followed by hits againstnon-annelid taxa; commonly insects or molluscs. For example,numerous sequences from the same sequencing campaign andpertaining to the polychaete genus Boccardia Carazzi, 1893showed only a 22 bp overlap with the Folmer-region and weretherefore removed. When BLASTing the full-length sequences,

however, the best non-self hit that was not involved in the sameproject was against the carid crustacean genus MacrobrachiumSpence Bate, 1868. Moreover, the best-scoring non-self hit forseveral specimens labeled as Boccardia proboscidea Hartman,1940 and Boccardia syrtis (Rainer, 1973) was Vermivora crissalis(Salvin & Godman, 1889) (Vertebrata: Aves, Parulidae), andAltinote neleus (Latreille, 1811) (Arthropoda: Insecta:Nymphalidae, respectively). In total, 230 full-length sequenceswere removed from the dataset due to potential or obviouscontamination, and several more were removed due to insufficientoverlap between sequences. The final annelid dataset included9782 individual cox1 gene sequences and the alignment occupied673 sites. The final polychaete dataset (5545 sequences), oligo-chaete dataset (3639 sequences), and hirudinean dataset (598sequences) occupied 670, 661, and 660 aligned sites, respectively.

Intraspecific variation and interspecific divergence –Annelida

The genetic distance analysis of the full annelid dataset wascontingent on a total of 47,838,871 comparisons; over 47 millionwere interspecific comparisons and almost 455,000 were intra-specific comparisons (Table 1). The average uncorrected pdistance ± standard deviation within the entire dataset was27.39% ± 6.68; the average interspecific distance was27.60% ± 6.31, whereas the average intraspecific distance was4.89% ± 5.86. Moreover, the interspecific distance values rangedfrom 0 to 53%, and the equivalent range for intraspecific distancevalues was 0–37% (Table 2). While these ranges are uncommonlylarge for both sets of comparisons, it is important to note thatthe median value for interspecific distances was 20.06% and, forintraspecific distances, the same value was 3.56%.

Intraspecific variation and interspecific divergence –‘‘Polychaeta’’

A total of 15,370,740 comparisons (over 15 million interspecificand about 300,000 intraspecific comparisons) resulted from thegenetic variation analysis of the polychaete dataset (Table 1). Theaverage uncorrected p distance within the dataset was 29.46% ±6.42; the average interspecific distance was 29.97% ± 5.29,whereas the average intraspecific distance was 3.92% ± 5.67.Whereas the interspecific and intraspecific distance values rangedfrom 0 to 52% and 0 to 37% (Table 2), respectively, the median

Table 1. Results from the sequence curation steps performed for each of the four datasets. Values indicate the number of sequences used in each step ofthe analyses and the number of comparisons resulting from those analyses.

Dataset SequencesSequences after

curationInterspecificcomparisons

Intraspecificcomparisons

Average no. ofintraspecific comparisons

per unique taxonomic labelNo. of randomly

chosen comparisons

Annelida 13,003 9782 47,384,083 454,788 367 200,000Polychaeta 7640 5545 15,068,510 302,230 476 200,000Oligochaeta 4678 3639 6,469,412 149,929 372 100,000Hirudinea 685 598 175,855 2648 13 2000

Table 2. Results from the sequence comparisons for each of the four datasets, including standard deviations.

DatasetAverage variation

(total)Average interspecific

distanceAverage intraspecific

distanceRange interspecific

distances (%)Range intraspecific

distances (%)

Annelida 27.39% ± 6.68 27.60% ± 6.31 4.89% ± 5.86 0–53 0–37Polychaeta 29.46.% ± 6.42 29.97% ± 5.29 3.92% ± 5.67 0–52 0–37Oligochaeta 21.10% ± 3.38 24.72% ± 2.47 6.83% ± 5.77 0–36 0–25Hirudinea 21.61% ± 4.85 21.90% ± 4.23 2.27% ± 3.96 0–35.5 0–26

2244 S. Kvist Mitochondrial DNA Part A, 2016; 27(3): 2241–2252

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 6: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

value for interspecific distances was 22.42% and, for intraspecificdistances, the same value was 0.79%.

Intraspecific variation and interspecific divergence –‘‘Oligochaeta’’

For the oligochaete dataset, 6,619,341 comparisons were used asthe basis for the genetic variation analysis; almost 6.5 millionwere interspecific comparisons and about 150,000 were intraspe-cific comparisons (Table 1). The average uncorrected p dis-tance ± standard deviation within the entire dataset was 21.10% ±3.38; the average interspecific distance was 24.72% ± 2.47,whereas the average intraspecific distance was 6.83% ± 5.77.The interspecific distance values ranged from 0 to 36%, theintraspecific distance values ranged from 0 to 25% (Table 2), andthe median value for interspecific and intraspecific distances was19.90% and 0.23%, respectively.

Intraspecific variation and interspecific divergence –Hirudinea

The genetic distance analysis of the hirudinean dataset was basedon a total of 178,503 comparisons, almost 176,000 interspecificand about 2600 intraspecific comparisons (Table 1). The averageuncorrected p distance ± standard deviation within the entiredataset was 21.61% ± 4.85; the average interspecific distance was21.90% ± 4.23, whereas the average intraspecific distance was2.27% ± 3.96. Also, the interspecific distance values ranged from0 to 35.5%, and the intraspecific distance values ranged from0 to 26% (Table 2); the median value for interspecific distanceswas 27.03% and, for intraspecific distances, the same valuewas 0.68%.

Barcoding gap

Because of the differences in the numbers of interspecific andintraspecific distance comparisons (e.g. 47 million versus 454,000in the full annelid dataset), at first glance, the resulting graphsseem to illustrate barcoding gaps between approximately 3and 15% genetic distance (Figures 2A, 3A, 4A, and 5A).

However, more focused views of the gaps themselves revealboth an overlap between intraspecific and interspecific distancevalues and continuity of distance values such that no clear gapsdo exist (Figures 2B, 3B, 4B, and 5B). Moreover, numerousanomalies exist in the datasets; in particular, in the full Annelidadataset, 1384 interspecific comparisons (0.0029% of all interspe-cific comparisons) show 0% distance and fully 121,750 intraspe-cific comparisons (26.77% of all intraspecific comparisons) showdistance values of 10% or above (see Annelida_Master.xlsx inSupplementary material). In fact, 192,254 intraspecific compari-sons (42.27%) show distance values above 2% in the Annelidadataset, and 1,675,355 interspecific comparisons (3.54%) showvalues below 20%; a one order of magnitude higher interspecificdivergence than intraspecific variation is assumed to be necessaryfor accurate DNA barcoding, and an upper limit of 2% forintraspecific variation has been commonly found within otheranimal groups (Hebert et al., 2003b). It is worth noting, however,that the ‘‘2%-rule’’ has rarely been applied to annelid barcodingspecifically – it is used here as an arbitrary cutoff value by whichthe level of variation across the entire dataset can be crudelyinterpreted.

Similar distributions of distances are found in the smallerdatasets containing only values relating to each of ‘‘Polychaeta’’,‘‘Oligochaeta’’, and Hirudinea. In each of these cases, no clear-cut disjunction between intraspecific and interspecific distancevalues exists and the tendency towards a gap (at about 5–15%divergence; see Figures 3B, 4B, and 5B) is flanked on both sidesby both intraspecific and interspecific distance values. Only thehirudinean dataset presents a discontinuity between distancevalues that could possibly be interpreted as a barcoding gap,whereas both the polychaete and the oligochaete datasets showgenerous overlap between intraspecific and interspecific diver-gence values, as well as a lack of discontinuity. Analysis of thepolychaete dataset resulted in 91,415 intraspecific compari-sons (30.24%) showing 2% divergence or above and 1049interspecific comparisons (0.0070%) showed 0% divergence.For ‘‘Oligochaeta’’, 103,607 intraspecific comparisons(69.10%) resulted in distance values of 2% or above and 322interspecific comparisons (0.0049%) showed 0% divergence.

Figure 2. Result from the manual calculations of intraspecific (red) and interspecific (blue) cox1 gene distances within the 9782-sequence full anneliddataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset; (B) enlarged view of thebarcoding gap region with the y-axis set to a maximum of 10,000 comparisons. Note the absence of a disjunction between intraspecific and interspecificdistances (the lack of a global barcoding gap), which is further discussed in the text.

DOI: 10.3109/19401736.2014.984166 Does a DNA barcoding gap exist in Annelida? 2245

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 7: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

The same analysis of values for Hirudinea resulted in 957intraspecific comparisons (36.14%) showing 2% divergence orabove and 18 interspecific comparisons (0.01%) showed 0%divergence.

Interestingly, there are second peaks present in the graphsresulting from the analyses of the Annelida and ‘‘Polychaeta’’datasets, starting at about 40.5% divergence and ending at about48.5%. Because of the rather smooth primary peaks, which alsoconstitute orders of magnitude more comparisons, the secondpeaks could be an indication that one or both the sequences

involved in those comparisons have non-annelid origins orpossess anomalies in the nucleotide sequences. Because greatcare was taken to avoid reverse complementation of sequences,it is unlikely that this factor plays a role, although such ascenario would likely result in similar topologies to those shownin Figures 2 and 3. Additionally, these peaks could denotesequences that include synonymous substitutions in the secondcodon position (assuming that synonymous substitutions in firstand third positions occur more frequently in unison across taxasuch that comparisons of those sequences lie further to the left in

Figure 3. Result from the manual calculations of intraspecific (red) and interspecific (blue) cox1 gene distances within the 5545-sequence polychaetedataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset; (B) enlarged view of thebarcoding gap region with the y-axis set to a maximum of 10,000 comparisons. Note the absence of a disjunction between intraspecific and interspecificdistances (the lack of a global barcoding gap), which is further discussed in the text.

Figure 4. Result from the manual calculations of intraspecific (red) and interspecific (blue) cox1 gene distances within the 3639-sequence oligochaetedataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset; (B) enlarged view of thebarcoding gap region with the y-axis set to a maximum of 10,000 comparisons. Note the absence of a disjunction between intraspecific and interspecificdistances (the lack of a global barcoding gap), which is further discussed in the text.

2246 S. Kvist Mitochondrial DNA Part A, 2016; 27(3): 2241–2252

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 8: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

the graphs) but a control for this lies outside the scope of thepresent paper.

For the randomized analyses (consisting of 200,000 interspe-cific comparisons and 200,000 intraspecific comparisons for theannelid and polychaete datasets, 100,000 intraspecific andinterspecific comparisons for the oligochaete dataset and 2000interspecific and intraspecific comparisons for the hirudineandataset; Table 1), much the same patterns are evinced in that thereare conspicuous lacks of sufficiently sized barcoding gaps. Asopposed to the analyses of the full datasets, however, theinterspecific distance values behave more like what would beexpected when taxonomic labels are correctly assigned(Figures 6–9). More to the point, the number of interspecific

comparisons approaches zero at distance values below 15%. Onthe contrary, in all four datasets, a high percentage of intraspecificdistance comparisons show values between 10 and 15%, whichdoes not agree well with prior findings on intraspecific distances.As such, it seems as though the lack of barcoding gaps for thedatasets is not fully contingent on outlier values, as therandomized datasets also lacks distinct and sufficiently sizedbarcoding gaps.

The distance distribution resulting from the ABGD analysis ofthe full annelid dataset (Figure 10A) suggests a tendency towardsa gap between divergence estimations of 3–14%. However, ABGDagrees with the above calculations in that no distinct gap existsamong the distances; again, the number of comparisons that show

Figure 5. Result from the manual calculations of intraspecific (red) and interspecific (blue) cox1 gene distances within the 598-sequence hirudineandataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset; (B) enlarged view of thebarcoding gap region with the y-axis set to a maximum of 1000 comparisons. Note the absence of a disjunction between intraspecific and interspecificdistances (the lack of a global barcoding gap), which is further discussed in the text.

Figure 6. Result from the randomized trial using 200,000 comparisons for each of intraspecific (red) and interspecific (blue) cox1 gene distances fromthe full annelid dataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset;(B) enlarged view of the barcoding gap region with the y-axis set to a maximum of 10,000 comparisons.

DOI: 10.3109/19401736.2014.984166 Does a DNA barcoding gap exist in Annelida? 2247

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 9: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

distances between 3 and 14% is greatly reduced. Much like theempirical calculations, the highest number of comparisons isreported for genetic distances around 30%. Note here that ABGDand PAUP deal with leading and trailing gaps (as well as the fewpresent indels) in different manners, which is likely the reasonwhy the highest values in the graph do not correspond to thehighest values found by the manual calculations. In relation tothis, ABGD analyses of each of the polychaete, oligochaete, andhirudinean datasets suggest the presence of a gap betweenapproximately 30–60% (Figure 10B–D). This gap does notcorrespond to a barcoding gap (i.e. difference between highestintraspecific and lowest interspecific distances) but, rather, islikely an artifact of the gap handling strategy employed by ABGD

and is merely a gap within the range of interspecific comparisonvalues. Although the parameters involved in the resulting oddbimodal distributions are still unclear, the empirical valuesmentioned above testify the fact that intraspecific comparisonvalues flank both sides of the ‘‘gaps’’ present between �2%and 10%.

Discussion

It can be stated with certainty that when considering the cox1gene sequences deposited in GenBank that meet the criteriaenforced here, a strict and sufficiently sized global DNAbarcoding gap does not exist within Annelida, nor do more

Figure 7. Result from the randomized trial using 200,000 comparisons for each of intraspecific (red) and interspecific (blue) cox1 gene distances fromthe polychaete dataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset; (B) enlargedview of the barcoding gap region with the y-axis set to a maximum of 10,000 comparisons.

Figure 8. Result from the randomized trial using 100,000 comparisons for each of intraspecific (red) and interspecific (blue) cox1 gene distances fromthe oligochaete dataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset;(B) enlarged view of the barcoding gap region with the y-axis set to a maximum of 5000 comparisons.

2248 S. Kvist Mitochondrial DNA Part A, 2016; 27(3): 2241–2252

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 10: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

local gaps exist within ‘‘Polychaeta’’, ‘‘Oligochaeta’’, orHirudinea, separately (this does not necessarily suggest that abarcoding gap does not exist at lower taxonomic ranks). This isboth due to the lack of lower limits for the interspecificdivergence values, which reach zero in fully 1384 comparisonsin the full annelid dataset, and an upper limit of intraspecificvariation that far exceeds previously reported values for severalorganismal groups. Indeed, intraspecific divergence values410%within the current scheme occur in 26.50% of the comparisonswithin the annelid dataset (the same values are 20.70% for‘‘Polychaeta’’, 38.56% for ‘‘Oligochaeta’’, and only 5.74% forHirudinea). Assuming that a global barcoding gap actually doesexist within the annelid dataset, the most obvious reason for theuncommonly high intraspecific variations and uncommonly lowinterspecific divergences is that the taxonomic labels associatedwith the sequences are incorrect. If so, numerous comparisonslabeled here as interspecific would in fact be intraspecificcomparisons and vice versa (but perhaps at a much lowerfrequency). Given the purported lower distance between con-geners and conspecifics than between species of different genera,families, classes and orders, and assuming that taxonomicidentifications become increasingly more dubious when consider-ing closely related species, it is likely that the distances betweenthe confused taxonomic labels would fall within the range ofwhere the barcoding gap is regularly reported (between about 2and 10%). However, assuming that the nucleotide sequences arecorrect and uncontaminated in the current datasets, a distinct andsufficiently sized barcoding gap would still not exist due to thecontinuum of the distances, regardless of whether they denote aninterspecific or intraspecific distance (Figures 2–5). In otherwords, wrongfully inferred species ID’s would only switch the redand blue colored bars in Figures 2–5, but would not maneuver thebars to the right or left. Contrary to this, if the nucleotidesequences are incorrectly called or if contaminations exist withinthe datasets (great care was taken to avoid this – see the Materialand Methods section), then contemporary surveys on the presenceor absence of a barcoding gap will surely involve a series of falsenegatives or positives. Given the size of the full annelid datasetused here (9782 cox1 gene sequences), it is more than likely thatsome of the sequences stem from non-annelid taxa and were not

caught in the filtering process of the current scheme. That thesewould account for all distance values between 2 and 10% withineach of the datasets is, however, highly unlikely.

Given all of the above, there is still a tendency towardsseparation of interspecific and intraspecific distances, in each ofthe datasets, as suggested by Hebert et al. (2003b), and this ismanifest between about 3% and 15% distance within the entirepool of sequences and within each of the smaller datasets.Because of the relatively lower number of comparisons that fallwithin this range of distance values, the tendency towards a gap isespecially conspicuous in Figures 2(A), 3(A), 4(A), and 5(A),in which the ranges of y-axes (number of comparisons) are onlyconstrained by the actual number of comparisons (as opposed tothe focused views of Figures 2(B), 3(B), 4(B), and 5(B), in whichthe number of comparisons on the y-axes were manuallyconstrained).

The effects of an inadequate barcoding gap

Whereas the lack of a global barcoding gap across Annelida,‘‘Polychaeta’’, ‘‘Oligochaeta’’, and Hirudinea may cumber futurebarcoding efforts that use a significant taxonomic breadth ofspecimens, studies may reveal that DNA barcoding is still apromising tool for lower taxonomic ranks within the phylum dueto the presence of local barcoding gaps. Indeed, several inves-tigations already report this. For example, distinct barcoding gapshave been recovered within several oligochaetous clitellate taxa(e.g. Decaens et al., 2013; De Wit & Erseus, 2010; Erseus &Kvist, 2007; Huang et al., 2007; Martinsson et al., 2013; but seealso Chang et al., 2009), hirudineans (e.g. Bely & Weisblat, 2006;Kvist et al., 2010; Oceguera-Figueroa et al., 2010, 2011; Siddall& Budinoff, 2005), as well as various polychaete taxa (e.g. Carret al., 2011; Heimeier et al., 2010; Schuller, 2011; Zhou et al.,2010). This raises the question as to what groups of annelids proverefractory to DNA barcoding by virtue of lacking a distinct DNAbarcoding gap? Within the full annelid dataset, 434 polychaete,oligochaete, and hirudinean taxa with different taxonomic labels(i.e. putatively different species) display intraspecific variationbetween 2 and 14%, suggesting that the lack of a barcodinggap may be widespread across taxa and not confined to any

Figure 9. Result from the randomized trial using 2000 comparisons for each of intraspecific (red) and interspecific (blue) cox1 gene distances from thehirudinean dataset. (A) Full view of the chart with the y-axis set above the upper limit of the number of comparisons within the dataset; (B) enlargedview of the barcoding gap region with the y-axis set to a maximum of 200 comparisons.

DOI: 10.3109/19401736.2014.984166 Does a DNA barcoding gap exist in Annelida? 2249

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 11: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

particular group. Alternatively, the bimodal distribution inintraspecific distances seen in the full annelid dataset (Figure 2)may indicate the presence of cryptic lineages within the dataset(see below). Regardless of this, because several interspecificcomparisons display distance values of 0%, no global thresholdexists that would enable avoidance of false negatives or positiveswhen identifying completely unknown annelid specimens bymeans of DNA barcoding. Although the instances of 0%interspecific distance involve each of polychaetes, oligochaetesand hirudineans, a surprisingly high proportion of them seem toconcern earthworm taxa (35% of the interspecific comparisonsthat show zero distance; see Supplementary material), especiallythe genera Amynthas Kinberg, 1867, Aporrectodea Orley, 1885,Lumbricus Linnaeaus, 1758, Eisenia Michaelsen, 1900, andDendrobaena Eisen, 1874. Notably, species of these generaseem prone to morphological misidentifications due to the lack ofa rigorous taxonomic basis and, occasionally, this is reflected inchanges to their generic associations (Blakemore, 2006). Fully95% of the interspecific comparisons that resulted in 0% distanceand that involve earthworm taxa are comparisons betweencongeneric species. This could mean that rather than a localbarcoding gap being absent within these specific groups (anabsence of a global gap would remain across Annelida as awhole), their confusing taxonomy is leading to incorrect taxo-nomic labels in GenBank. For some groups of earthworms, it hasalready been shown that very high intraspecific distances occur

within a ‘‘species’’ (James et al., 2010; Novo et al., 2009, 2010;Perez-Losada et al., 2012; Richard et al., 2010) or even withinclonal groups (Fernandez et al., 2011). This would only accountfor part of the reasoning behind the lack of a barcoding gap,however, and it is still difficult to explain the remaining factorsaffecting its absence. Minimally, the results presented hereillustrate the importance of BLASTing your sequences prior toany meaningful analysis, such that potential wrongfully inferredspecimen identifications or sequence contaminations could bemore readily detected.

Even before DNA barcoding gained traction in the scientificcommunity, Stoeckle (2003) recognized that ‘‘The validity ofDNA bar coding [. . .] depends on establishing referencesequences from taxonomically confirmed specimens’’. This ideawas further supported in several papers (e.g. Collins &Cruickshank, 2013; Elias et al., 2007; Hebert & Gregory, 2005;Kvist et al., 2010; Moritz & Cicero, 2004; Will & Rubinoff, 2004)and seems to remain a major point of contention against thelogical basis of DNA barcoding. In relation to this, the presentstudy shows that one of the followings, if not both, must be true:(i) numerous ‘‘annelid’’ cox1 gene sequences deposited inGenBank are associated with an erroneous taxonomic label or(ii) a DNA barcoding gap does not exist within Annelida, or in‘‘Polychaeta’’, ‘‘Oligochaeta’’, or Hirudinea, separately. Asmentioned above, however, the complete lack of a gap acrossthe graphs shown here would necessitate that all sequences that

Figure 10. Result from the Automated Barcode Gap Discovery software (ABGD). (A) Full annelid dataset, (B) polychaete dataset, (C) oligochaetedataset, and (D) hirudinean dataset. Although the number of comparisons within the region where a barcoding gap is regularly reported is much lowerthat the adjacent peaks, they never reach 0 comparisons in any of the charts. In addition, the ‘barcoding gap’ is flanked on both sides by bothintraspecific and interspecific comparisons (cf. Figures 2–5). Note also that the gap between approximately 30 and 60% in (B–D) does not correspondto a barcoding gap but, rather, is likely an artifact of the gap handling strategy employed by ABGD and is merely a gap within the range of interspecificcomparison values (further discussed in the text).

2250 S. Kvist Mitochondrial DNA Part A, 2016; 27(3): 2241–2252

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 12: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

fall within the supposed range of a barcoding gap werecontaminations by non-annelid taxa, if a barcoding gap were toexist within the phylum. This would mean that fully 130,034intraspecific comparisons (those falling between 3 and 14% in thefull annelid dataset) would stem from contaminations of one orboth the sequences during the DNA extraction process; Williamof Ockham would likely have advocated in favor of (ii).

Potential pitfalls

The overall scheme employed in the present study does notexhaustively investigate the underlying causes for the lack of aglobal barcoding gap within Annelida, ‘‘Polychaeta’’,‘‘Oligochaeta’’, and Hirudinea. There are several reasons whyboth global and more local gaps may be lacking across thesedatasets and these may be driven by both biotic and abiotic factors.For instance, localized or globalized recent spurs of rapid radiationmay result in similar nucleotide compositions between populationsthat are undergoing speciation (i.e. without obvious gene flowpatterns). It has already been suggested that rapid radiation withinAnnelida may be the cause of the lack of support in basal nodes ofthe phylogeny of the group (Bleidorn et al., 2003; Rousset et al.,2007; Struck et al., 2002), and it is possible that some of the datasupporting the lack of a barcoding gap are also caused by rapidradiation events. As a result, the presence of cryptic species withinthe dataset may be higher than evinced by the number of uniquetaxonomic labels. As an example, Novo et al. (2010) proposed fivecryptic species to be present within a single morphotype of ahormogastrid earthworm, yet these are all denoted by the sametaxonomic label in GenBank and, consequently, in the data usedhere. If these sequences were to be separated into five distinctspecies groups, local intraspecific variation within each of thesewould likely be much lower than if the sequences were treated as asingle species. Alas, it lies outside the scope of the present study totrack each of the multitudes of species involved in odd intraspecificcomparison values back to the original mention of their potentialcryptic nature (and to investigate fully the morphological charac-ters of the specimens). Besides, as mentioned above, even if allinstances of cryptic speciation were fully revealed, the lack of abarcoding gap would still endure, as the values on the x-axes inFigures 2–5 never reach zero. It is more than likely, however, thatthe overall lack of a barcoding gap found here for Annelida,‘‘Polychaeta’’, ‘‘Oligochaeta’’, and Hirudinea would reverse whenconsidering smaller taxonomic units and more local barcodinggaps. The differences in rates of genetic evolution betweendifferent taxa likely exacerbate the situation, and can evenpotentially be the sole reason behind the lack of a phylum-wideglobal barcoding gap.

Finally, there is a potential pitfall related to undersampling ofspecies and populations in the dataset. Kvist (2013) showed thatonly 2013 species of annelids were represented by cox1 gene-sequences in GenBank, suggesting that, at least, 15,000 speciesstill remain to be barcoded (see Zhang, 2011). Seeing as one ofthe underlying assumptions of DNA barcoding is that globalspeciation rates are fairly constant, undersampling should notpresent a major obstacle under the current scheme – patterns ofintraspecific variations and interspecific divergences should besimilar across the 15,000 unsampled species. However, ifcomprehensive sampling were achievable, and if the sequencesderived from it reported a wide global barcoding gap, thetendency towards a gap (see Figures 2–10) would becomeexponentially more powerful in illustrating the utility of DNAbarcoding within Annelida and its subdivisions. In particular, itseems as though the major lack of data stems from theundersampling of intraspecific variation values as comparedwith interspecific divergence values.

To this end, it seems as important to produce population levelbarcode data to cover the taxonomic breadth within Annelida, inorder to fully evaluate the credibility of the tendency towards abarcoding gap within this ecologically and economically import-ant taxon.

Acknowledgements

The author thanks Sophie Brouillet for providing valuable assistance withABGD and Alejandro Oceguera-Figueroa, Sarah Lemer, Gonzalo Giribet,and an anonymous reviewer for providing important feedback thatbenefited earlier versions of this manuscript.

Declaration of interest

The author thanks the Wenner-Gren Foundations, Helge Ax:sonJohnson’s Foundation, the Olle Engkvist Byggmastare Foundation, andthe Sixten Gemzeus Foundation for financial support.

References

Bely AE, Weisblat DA. (2006). Lessons from leeches: A call for DNAbarcoding in the lab. Evol Dev 8:491–501.

Blakemore RJ. (2006). Cosmopolitan earthworms – An eco-taxonomicguide to the peregrine species of the world. Kippax, Australia:VermEcology.

Bleidorn C, Vogt L, Bartolomaeus T. (2003). New insights intopolychaete phylogeny (Annelida) inferred from 18S rDNA sequences.Mol Phyl Evol 29:279–88.

Burns JM, Janzen DH, Hajibabaei M, Hallwachs W, Hebert PDN. (2007).DNA barcodes of closely related (but morphologically and ecologicallydistinct) species of skipper butterflies (Hesperiidae) can differ by onlyone to three nucleotides. J Lepid Soc 61:138–53.

Carr CM, Hardy SM, Brown TM, Macdonald TA, Hebert PD. (2011).A tri-oceanic perspective: DNA barcoding reveals geographic structureand cryptic diversity in Canadian polychaetes. PLoS One 6:e22232.

Chang CH, Rougerie R, Chen JH. (2009). Identifying earthworms throughDNA barcodes: Pitfalls and promise. Pedobiologia 52:171–80.

Chapple DG, Ritchie PA. (2013). A retrospective approach to testing theDNA barcoding method. PLoS One 8:e77882.

Collins RA, Cruickshank RH. (2013). The seven deadly sins of DNAbarcoding. Mol Ecol Res 13:969–75.

DasGupta B, Konwar KM, Mandoiu II, Shvartsman AA. (2005). DNA-BAR: Distinguisher selection for DNA barcoding. Bioinformatics 21:3424–6.

Decaens T, Porco D, Rougerie R, Brown GG, James SW. (2013). Potentialof DNA barcoding for earthworm research in taxonomy and ecology.Appl Soil Ecol 65:35–42.

DeSalle R, Egan MG, Siddall M. (2005). The unholy trinity: Taxonomy,species delimitation and DNA barcoding. Philos T Roy Soc B 360:1905–16.

De Wit P, Erseus C. (2010). Genetic variation and phylogeny ofScandinavian species of Grania (Annelida: Clitellata: Enchytraeidae),with the discovery of a cryptic species. J Zool Syst Evol Res 48:285–93.

Elias M, Hill RI, Willmott KR, Dasmahapatra KK, Brower AV, Mallet J,Jiggins CD. (2007). Limited performance of DNA barcoding in adiverse community of tropical butterflies. Proc R Soc Lond B 274:2881–9.

Erseus C. (2005). Phylogeny of oligochaetous Clitellata. Hydrobiologia535/536:357–72.

Erseus C, Kvist S. (2007). COI variation in Scandinavian marine speciesof Tubificoides (Annelida: Clitellata: Tubificidae). J Mar Biol AssocUK 87:1121–6.

Fernandez R, Almodovar A, Novo M, Gutierrez M, Dıaz Cosın D. (2011).A vagrant clone in a peregrine species: Phylogeography, high clonaldiversity and geographical distribution in the earthworm Aporrectodeatrapezoides (Duges, 1828). Soil Biol Biochem 43:2085–93.

Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek RC. (1994). DNAprimers for amplification of mitochondrial cytochrome c oxidasesubunit I from diverse metazoan invertebrates. Mol Mar BiolBiotechnol 3:294–9.

Hebert PDN, Cywinska A, Ball SL, deWaard JR. (2003a). Biologicalidentifications through DNA barcodes. Proc R Soc Lond B 270:313–21.

DOI: 10.3109/19401736.2014.984166 Does a DNA barcoding gap exist in Annelida? 2251

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016

Page 13: Annelida? Does a global DNA barcoding gap exist in · entire Annelida dataset and JQ908698 Allolobophora chlorotica Savigny, 1826 for the oligochaete dataset). The rationale behind

Hebert PD, Gregory TR. (2005). The promise of DNA barcoding fortaxonomy. Syst Biol 54:852–9.

Hebert PDN, Ratnasingham S, deWaard JR. (2003b). Barcoding animallife: Cytochrome c oxidase subunit 1 divergences among closelyrelated species. Proc R Soc Lond B 270:S96–9.

Heimeier D, Lavery S, Sewell MA. (2010). Using DNA barcoding andphylogenetics to identify Antarctic invertebrate larvae: Lessons from alarge scale study. Mar Genomics 3:165–77.

Huang J, Xu Q, Sun ZJ, Tang GL, Su ZY. (2007). Identifying earthwormsthrough DNA barcodes. Pedobiologia 51:301–9.

James SW, Porco D, Decaens T, Richard B, Rougerie R, Erseus C. (2010).DNA barcoding reveals cryptic diversity in Lumbricus terrestris L.,1758 (Clitellata): Resurrection of L. herculeus (Savigny, 1826). PLoSOne 5:e15629.

Katoh K, Standley DM. (2013). MAFFT multiple sequence alignmentsoftware version 7: Improvements in performance and usability. MolBiol Evol 30:772–80.

Kvist S. (2013). Barcoding in the dark?: A critical view of thesufficiency of zoological DNA barcoding databases and a plea forbroader integration of taxonomic knowledge. Mol Phylogenet Evol 69:39–45.

Kvist S, Laumer C, Junoy J, Giribet G. (2014). A further contribution tothe phylogeny, systematics and DNA barcoding of Nemertea. InvertSyst 28:287–308.

Kvist S, Narechania A, Oceguera-Figueroa A, Fuks B, Siddall ME.(2011). Phylogenomics of Reichenowia parasitica, an alphaproteobac-terial endosymbiont of the freshwater leech Placobdella parasitica.PLoS One 6:e28192.

Kvist S, Oceguera-Figueroa A, Siddall ME, Erseus C. (2010). Barcoding,types and the Hirudo files: Using information content to criticallyevaluate the identity of DNA barcodes. Mitochondrial DNA 21:198–205.

Kvist S, Siddall ME. (2013). Phylogenomics of Annelida revisited:A cladistic approach using genome-wide expressed sequence tag datamining and examining the effects of missing data. Cladistics 29:435–48.

Little DP. (2011). DNA barcode sequence identification incorporatingtaxonomic hierarchy and within taxon variability. PLoS One 6:e20552.

Luttikhuizen PC, Dekker R. (2010). Pseudo-cryptic species Arenicoladefodiens and Arenicola marina (‘Polychaeta’: Arenicolidae) inWadden Sea, North Sea and Skagerrak: Morphological and molecularvariation. J Sea Res 63:17–23.

Maddison WP, Maddison DR. (2010). Mesquite: A modular system forevolutionary analysis version 2.5. Available at: http://mesquiteproject.org (Accessed 12 July 2014).

Martinsson S, Achurra A, Svensson M, Erseus C. (2013). Integrativetaxonomy of the freshwater worm Rhyacodrilus falciformis s.l.(Clitellata: Naididae), with the description of a new species. ZoolScr 42:612–22.

Meier R, Shiyang K, Vaidya G, Ng PK. (2006). DNA barcoding andtaxonomy in Diptera: A tale of high intraspecific variability and lowidentification success. Syst Biol 55:715–28.

Meier R, Zhang G, Ali F. (2008). The use of mean instead of smallestinterspecific distances exaggerates the size of the ‘‘barcoding gap’’ andleads to misidentification. Syst Biol 57:809–13.

Meyer CP, Paulay G. (2005). DNA barcoding: Error rates based oncomprehensive sampling. PLoS Biol 3:e422.

Moritz C, Cicero C. (2004). DNA barcoding: Promise and pitfalls. PLoSBiol 2:1529–31.

Novo M, Almodovar A, Dıaz-Cosın D. (2009). High genetic divergenceof hormogastrid earthworms (Annelida, Oligochaeta) in the centralIberian Peninsula: Evolutionary and demographic implications. ZoolScr 38:537–52.

Novo M, Almodovar A, Fernandez R, Trigo D, Dıaz Cosın D. (2010).Cryptic speciation of hormogastrid earthworms revealed by mitochon-drial and nuclear data. Mol Phylogenet Evol 56:507–12.

Oceguera-Figueroa A, Leon-Regagnon V, Siddall ME. (2010). DNAbarcoding reveals Mexican diversity within the freshwater leech genusHelobdella (Annelida: Glossiphoniidae). Mitochondrial DNA 21:24–9.

Oceguera-Figueroa A, Phillips AJ, Pacheco-Chaves B, Reeves WK,Siddall ME. (2011). Phylogeny of macrophagous leeches (Hirudinea,

Clitellata) based on molecular data and evaluation of the barcodinglocus. Zool Scr 40:194–203.

Perez-Losada M, Bloch R, Breinholt JW, Pfenninger M, Domınguez J.(2012). Taxonomic assessment of Lumbricidae (Oligochaeta) earth-worm genera using DNA barcodes. Eur J Soil Biol 48:41–7.

Puillandre N, Lambert A, Brouillet S, Achaz G. (2012). ABGD,Automatic Barcode Gap Discovery for primary species delimitation.Mol Ecol 21:1864–77.

Richard B, Decaens T, Rougerie R, James SW, Porco D, Hebert PDN.(2010). Re-integrating earthworm juveniles into soil biodiversitystudies: Species identification through DNA barcoding. Mol EcolRes 10:606–14.

Rouse GW, Pleijel F. (2001). Polychaetes. New York, NY: OxfordUniversity Press.

Rousset V, Pleijel F, Rouse GW, Erseus C, Siddall ME. (2007).A molecular phylogeny of annelids. Cladistics 23:41–63.

Sarkar IN, Planet PJ, Bael TE, Stanley SE, Siddall M, DeSalle R,Figurski DH. (2002). Characteristic attributes in cancer microarrays.J Biomed Inform 35:111–22.

Schuller M. (2011). Evidence for a role of bathymetry and emergence inspeciation in the genus Glycera (Glyceridae, ‘Polychaeta’) from thedeep Eastern Weddell Sea. Polar Biol 34:549–64.

Shain DH. (2009). Annelids in modern biology. New York, NY, USA:John Wiley and Sons.

Siddall ME, Budinoff RB. (2005). DNA-barcoding evidence for wide-spread introductions of a leech from the South American Helobdellatriserialis complex. Conserv Genet 6:467–72.

Siddall ME, Fontanella FM, Watson SC, Kvist S, Erseus C. (2009).Barcoding bamboozled by bacteria: Convergence to metazoan mito-chondrial primer targets by marine microbes. Syst Biol 58:445–51.

Srivathsan A, Meier R. (2012). On the inappropriate use of Kimura-2-parameter (K2P) divergences in the DNA-barcoding literature.Cladistics 28:190–4.

Stoeckle M. (2003). Taxonomy, DNA, and the bar code of life.BioScience 53:796–7.

Stothard P. (2000). The Sequence Manipulation Suite: JavaScriptprograms for analyzing and formatting protein and DNA sequences.Biotechniques 28:1102–4.

Struck T, Hessling R, Purschke G. (2002). The phylogenetic position ofthe Aelosomatidae and Parergordrilidae, two enigmatic oligochaete-like taxa of the ‘Polychaeta’, based on molecular data from 18S rDNAsequences. J Zool Syst Evol Res 40:155–63.

Struck TH, Paul C, Hill N, Hartmann S, Hosel C, Kube M, Lieb B, et al.(2011). Phylogenomic analyses unravel annelid evolution. Nature 471:95–8.

Swofford DL. (2003). PAUP*. Phylogenetic analysis using parsimony(*and Other Methods). Version 4. Sunderland, MA, USA: SinauerAssociates.

Trewick SA. (2008). DNA barcoding is not enough: Mismatch oftaxonomy and genealogy in New Zealand grasshoppers (Orthoptera:Acrididae). Cladistics 24:240–54.

Weigert A, Helm C, Meyer M, Nickel B, Arendt D, Hausdorf B,Santos SR, et al. (2014). Illuminating the base of the annelid tree usingtranscriptomics. Mol Biol Evol 31:1391–401.

Weitschek E, Van Velzen R, Felici G, Bertolazzi P. (2013). BLOG 2.0:A software for character-based species classification with DNAbarcode sequences. What it does, how to use it? Mol Ecol Res 13:1043–6.

Wiemers M, Fiedler K. (2007). Does the DNA barcoding gap exist? –A case study in blue butterflies (Lepidoptera: Lycaenidae). Front Zool4:1–16.

Will KW, Rubinoff D. (2004). Myth of the molecule: DNA barcodes forspecies cannot replace morphology for identification and classification.Cladistics 20:47–55.

Zhang Z-Q. (2011). Animal biodiversity: An introduction to higher-levelclassification and taxonomic richness. Zootaxa 3148:7–12.

Zhou H, Zhang Z, Chen H, Sun R, Wang H, Guo L, Pan H. (2010).Integrating a DNA barcoding project with an ecological survey: A casestudy on temperate intertidal polychaete communities in Qingdao,China. Chin J Oceanol Limnol 28:899–910.

Supplementary material available online

2252 S. Kvist Mitochondrial DNA Part A, 2016; 27(3): 2241–2252

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

12:

28 2

2 Ju

ly 2

016