TY - DATA T1 - Examplar data demonstrating the improvement of genome assembly and annotation by using AGOUTI AU - Zhang, Simo V AU - Luting Zhuo AU - Hahn, Matthew W DO - 10.5524/100195 UR - http://gigadb.org/dataset/100195 AB - Genomes sequenced using short-read, next-generation sequencing technologies are error-filled and fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number of genes within species, as well as gene gain and loss between species. We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA-seq data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on a simulated dataset, we show that it is highly accurate and that it achieves higher accuracy and contiguity compared to other existing methods. Here we provide the software, available free of charge under the MIT license, as well as the synthetic dataset for reuse and reproducibility. For the most recent updates to the software please refer to the GitHub page . KW - Genomic KW - Transcriptomic KW - Software PY - 2016 PB - GigaScience Database LA - en ER -