TY - DATA T1 - Supporting data for "Filling reference gaps via assembling DNA barcodes using high-throughput sequencing - moving toward barcoding the world" AU - Chengran, Zhou AU - Chentao, Yang AU - Shanlin, Liu AU - Xin, Zhou DO - 10.5524/100363 UR - http://gigadb.org/dataset/100363 AB - Over the past decade, biodiversity scientists have dedicated tremendous efforts in constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2,000, further dramatic reduction on barcoding costs is unlikely because the Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also refrained High-Throughput-Sequencing (HTS) based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length COI barcodes from pooled PCR amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from over 78% of the PCR reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform, Pacbio, confirmed the accuracy the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about 1/10 of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. KW - Genomic KW - Metabarcoding KW - dna-barcoding KW - high-throughput sequencing KW - coi KW - pcr KW - gap-filling PY - 2017 PB - GigaScience Database LA - en ER -