TY - DATA T1 - Supporting data for "Deep whole-genome sequencing of 90 Han Chinese genomes" AU - Tianming Lan AU - Haoxiang Lin AU - Wenjuan Zhu AU - Tellier, Laurent Christian Asker Melchior AU - Mengcheng Yang AU - Liu, Xin AU - Wang, Jun AU - Wang, Jian AU - Huanming Yang AU - Xu, Xun AU - Xiaosen Guo DO - 10.5524/100302 UR - http://gigadb.org/dataset/100302 AB - Next generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data, due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low frequency and novel variants. Although whole exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole genome sequencing data is limited for any population, and a large amount of low-frequency, population-specific variants remains uncharacterized. We have performed whole genome sequencing at high depth (~80X) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genome Project samples, including 45 North Han Chinese and 45 South Han Chinese samples. 83 of these 90 have not been sequenced by the 1000 Genomes Project. We have identified 12,568,804 single nucleotide polymorphisms, 2,074,734 short InDels and 26,142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7,007,685 novel variants with low frequency (defined as minor allele frequency less than 5%), including 5,816,839 SNPs, 1,172,919 InDels, and 17,927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, this Han Chinese deep sequencing data enhances characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement for the 1000 Genomes Project, as well as for other human genome projects. KW - Genomic KW - High-coverage Whole-genome Sequencing KW - Han Chinese genomes KW - Denovo assembly KW - Genetic variations PY - 2017 PB - GigaScience Database LA - en ER -