scholarly journals Revealing complete complex KIR haplotypes phased by long-read sequencing technology

2017 ◽  
Vol 18 (3) ◽  
pp. 127-134 ◽  
Author(s):  
D Roe ◽  
C Vierra-Green ◽  
C-W Pyo ◽  
K Eng ◽  
R Hall ◽  
...  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Edwin A. Solares ◽  
Yuan Tao ◽  
Anthony D. Long ◽  
Brandon S. Gaut

Abstract Background Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding. Results Here we illustrate a new method, which we call HapSolo, that identifies secondary contigs and defines a primary assembly based on multiple pairwise contig alignment metrics. HapSolo evaluates candidate primary assemblies using BUSCO scores and then distinguishes among candidate assemblies using a cost function. The cost function can be defined by the user but by default considers the number of missing, duplicated and single BUSCO genes within the assembly. HapSolo performs hill climbing to minimize cost over thousands of candidate assemblies. We illustrate the performance of HapSolo on genome data from three species: the Chardonnay grape (Vitis vinifera), with a genome of 490 Mb, a mosquito (Anopheles funestus; 200 Mb) and the Thorny Skate (Amblyraja radiata; 2650 Mb). Conclusions HapSolo rapidly identified candidate assemblies that yield improvements in assembly metrics, including decreased genome size and improved N50 scores. Contig N50 scores improved by 35%, 9% and 9% for Chardonnay, mosquito and the thorny skate, respectively, relative to unreduced primary assemblies. The benefits of HapSolo were amplified by down-stream analyses, which we illustrated by scaffolding with Hi-C data. We found, for example, that prior to the application of HapSolo, only 52% of the Chardonnay genome was captured in the largest 19 scaffolds, corresponding to the number of chromosomes. After the application of HapSolo, this value increased to ~ 84%. The improvements for the mosquito’s largest three scaffolds, representing the number of chromosomes, were from 61 to 86%, and the improvement was even more pronounced for thorny skate. We compared the scaffolding results to assemblies that were based on PurgeDups for identifying secondary contigs, with generally superior results for HapSolo.


2019 ◽  
Vol 159 ◽  
pp. 138-147 ◽  
Author(s):  
Alexander Lim ◽  
Bryan Naidenov ◽  
Haley Bates ◽  
Karyn Willyerd ◽  
Timothy Snider ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nisha Kanwar ◽  
Celia Blanco ◽  
Irene A. Chen ◽  
Burckhard Seelig

AbstractAdvances in sequencing technology have allowed researchers to sequence DNA with greater ease and at decreasing costs. Main developments have focused on either sequencing many short sequences or fewer large sequences. Methods for sequencing mid-sized sequences of 600–5,000 bp are currently less efficient. For example, the PacBio Sequel I system yields ~ 100,000–300,000 reads with an accuracy per base pair of 90–99%. We sought to sequence several DNA populations of ~ 870 bp in length with a sequencing accuracy of 99% and to the greatest depth possible. We optimised a simple, robust method to concatenate genes of ~ 870 bp five times and then sequenced the resulting DNA of ~ 5,000 bp by PacBioSMRT long-read sequencing. Our method improved upon previously published concatenation attempts, leading to a greater sequencing depth, high-quality reads and limited sample preparation at little expense. We applied this efficient concatenation protocol to sequence nine DNA populations from a protein engineering study. The improved method is accompanied by a simple and user-friendly analysis pipeline, DeCatCounter, to sequence medium-length sequences efficiently at one-fifth of the cost.


2019 ◽  
Vol 47 (1) ◽  
pp. 23-32 ◽  
Author(s):  
Yann Fichou ◽  
Isabelle Berlivet ◽  
Gaëlle Richard ◽  
Christophe Tournamille ◽  
Lilian Castilho ◽  
...  

Background: In the novel era of blood group genomics, (re-)defining reference gene/allele sequences of blood group genes has become an important goal to achieve, both for diagnostic and research purposes. As novel potent sequencing technologies are available, we thought to investigate the variability encountered in the three most common alleles of ACKR1, the gene encoding the clinically relevant Duffy antigens, at the haplotype level by a long-read sequencing approach. Materials and Methods: After long-range PCR amplification spanning the whole ACKR1 gene locus (∼2.5 kilobases), amplicons generated from 81 samples with known genotypes were sequenced in a single read by using the Pacific Biosciences (PacBio) single molecule, real-time (SMRT) sequencing technology. Results: High-quality sequencing reads were obtained for the 162 alleles (accuracy >0.999). Twenty-two nucleotide variations reported in databases were identified, defining 19 haplotypes: four, eight, and seven haplotypes in 46 ACKR1*01, 63 ACKR1*02, and 53 ACKR1*02N.01 alleles, respectively. Discussion: Overall, we have defined a subset of reference alleles by third-generation (long-read) sequencing. This technology, which provides a “longitudinal” overview of the loci of interest (several thousand base pairs) and is complementary to the second-generation (short-read) next-generation sequencing technology, is of critical interest for resolving novel, rare, and null alleles.


2019 ◽  
Author(s):  
Dhaivat Joshi ◽  
Shunfu Mao ◽  
Sreeram Kannan ◽  
Suhas Diggavi

AbstractMotivationEfficient and accurate alignment of DNA / RNA sequence reads to each other or to a reference genome / transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this paper, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome / transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner.ResultsWe show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2%, 2.5% and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets.Availabilityhttps://github.com/joshidhaivat/QAlign.git


2021 ◽  
Vol 8 ◽  
Author(s):  
Douglas M. M. Soares ◽  
Samir V. F. Atum ◽  
Etelvino J. H. Bechara ◽  
João C. Setubal ◽  
Cassius V. Stevani ◽  
...  

2020 ◽  
Author(s):  
Radwa A. Hanafy ◽  
Britny Johnson ◽  
Noha H. Youssef ◽  
Mostafa S. Elshahed

AbstractThe anaerobic gut fungi (AGF, Neocallimastigomycota) reside in the alimentary tracts of herbivores where they play a central role in the breakdown of ingested plant material. Accurate assessment of AGF diversity has been hampered by inherent deficiencies of the internal transcribed spacer1 (ITS1) region as a phylogenetic marker. Here, we report on the development and implementation of the D1/D2 region of the large ribosomal subunit (D1/D2 LSU) as a novel marker for assessing AGF diversity in culture-independent surveys. Sequencing a 1.4-1.5 Kbp amplicon encompassing the ITS1-5.8S rRNA-ITS2-D1/D2 LSU region in the ribosomal RNA locus from fungal strains and environmental samples generated a reference D1/D2 LSU database for all cultured AGF genera, as well as the majority of candidate genera encountered in prior ITS1-based diversity surveys. Subsequently, a D1/D2 LSU-based diversity survey using long read PacBio SMRT sequencing technology was conducted on fecal samples from 21 wild and domesticated herbivores. Twenty-eight genera and candidate genera were identified in the 17.7 K sequences obtained, including multiple novel lineages that were predominantly, but not exclusively, identified in wild herbivores. Association between certain AGF genera and animal lifestyles, or animal host family was observed. Finally, to address the current paucity of AGF isolates, concurrent isolation efforts utilizing multiple approaches to maximize recovery yielded 216 isolates belonging to twelve different genera, several of which have no prior cultured-representatives. Our results establish the utility of D1/D2 LSU and PacBio sequencing for AGF diversity surveys, and the culturability of a wide range of AGF taxa, and demonstrate that wild herbivores represent a yet-untapped reservoir of AGF diversity.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Patrick Driguez ◽  
Salim Bougouffa ◽  
Karen Carty ◽  
Alexander Putra ◽  
Kamel Jabbari ◽  
...  

AbstractCurrently, different sequencing platforms are used to generate plant genomes and no workflow has been properly developed to optimize time, cost, and assembly quality. We present LeafGo, a complete de novo plant genome workflow, that starts from tissue and produces genomes with modest laboratory and bioinformatic resources in approximately 7 days and using one long-read sequencing technology. LeafGo is optimized with ten different plant species, three of which are used to generate high-quality chromosome-level assemblies without any scaffolding technologies. Finally, we report the diploid genomes of Eucalyptus rudis and E. camaldulensis and the allotetraploid genome of Arachis hypogaea.


2018 ◽  
Author(s):  
Theodore S. Kalbfleisch ◽  
Edward S. Rice ◽  
Michael S. DePriest ◽  
Brian P. Walenz ◽  
Matthew S. Hestand ◽  
...  

AbstractEquCab2, a high-quality reference genome for the domestic horse, was released in 2007. Since then, it has served as the foundation for nearly all genomic work done in equids. Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference assemblies of large animal and plant genomes in terms of contiguity and composition. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. The result, EquCab3, is presented here. The count of non-N bases in the incorporated chromosomes is improved from 2.33Gb in EquCab2 to 2.41Gb from EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.


Sign in / Sign up

Export Citation Format

Share Document