HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intra-Host Viral Populations

Molecular Biology and Evolution ◽

10.1093/molbev/msaa315 ◽

2020 ◽

Author(s):

Matthew L Bendall ◽

Keylie M Gibson ◽

Margaret C Steiner ◽

Uzma Rentia ◽

Marcos Pérez-Losada ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Consensus Sequence ◽

Haplotype Reconstruction ◽

Consensus Sequences ◽

Genome Wide ◽

Genomic Regions ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.

Download Full-text

Validation of Variant Assembly Using HAPHPIPE with Next-Generation Sequence Data from Viruses

Viruses ◽

10.3390/v12070758 ◽

2020 ◽

Vol 12 (7) ◽

pp. 758 ◽

Cited By ~ 1

Author(s):

Keylie M. Gibson ◽

Margaret C. Steiner ◽

Uzma Rentia ◽

Matthew L. Bendall ◽

Marcos Pérez-Losada ◽

...

Keyword(s):

De Novo ◽

Sequence Data ◽

Consensus Sequence ◽

Sequence Assembly ◽

Next Generation ◽

Consensus Sequences ◽

Bioinformatic Tools ◽

Hiv Gp120 ◽

Next Generation Sequencing Ngs ◽

Ngs Data

Next-generation sequencing (NGS) offers a powerful opportunity to identify low-abundance, intra-host viral sequence variants, yet the focus of many bioinformatic tools on consensus sequence construction has precluded a thorough analysis of intra-host diversity. To take full advantage of the resolution of NGS data, we developed HAplotype PHylodynamics PIPEline (HAPHPIPE), an open-source tool for the de novo and reference-based assembly of viral NGS data, with both consensus sequence assembly and a focus on the quantification of intra-host variation through haplotype reconstruction. We validate and compare the consensus sequence assembly methods of HAPHPIPE to those of two alternative software packages, HyDRA and Geneious, using simulated HIV and empirical HIV, HCV, and SARS-CoV-2 datasets. Our validation methods included read mapping, genetic distance, and genetic diversity metrics. In simulated NGS data, HAPHPIPE generated pol consensus sequences significantly closer to the true consensus sequence than those produced by HyDRA and Geneious and performed comparably to Geneious for HIV gp120 sequences. Furthermore, using empirical data from multiple viruses, we demonstrate that HAPHPIPE can analyze larger sequence datasets due to its greater computational speed. Therefore, we contend that HAPHPIPE provides a more user-friendly platform for users with and without bioinformatics experience to implement current best practices for viral NGS assembly than other currently available options.

Download Full-text

The effect of variant interference on de novo assembly for viral deep sequencing

10.1101/815480 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christina J. Castro ◽

Rachel L. Marine ◽

Edward Ramos ◽

Terry Fei Fan Ng

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Gc Content ◽

Read Length ◽

Viral Genomes ◽

Minor Variant ◽

Main Driver ◽

Next Generation Sequencing Ngs ◽

Viral Sequences ◽

Generation Sequencing

AbstractViruses have high mutation rates and generally exist as a mixture of variants in biological samples. Next-generation sequencing (NGS) approach has surpassed Sanger for generating long viral sequences, yet how variants affect NGS de novo assembly remains largely unexplored. Our results from >15,000 simulated experiments showed that presence of variants can turn an assembly of one genome into tens to thousands of contigs. This “variant interference” (VI) is highly consistent and reproducible by ten most used de novo assemblers, and occurs independent of genome length, read length, and GC content. The main driver of VI is pairwise identities between viral variants. These findings were further supported by in silico simulations, where selective removal of minor variant reads from clinical datasets allow the “rescue” of full viral genomes from fragmented contigs. These results call for careful interpretation of contigs and contig numbers from de novo assembly in viral deep sequencing.

Download Full-text

HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data

10.1101/220830 ◽

2017 ◽

Cited By ~ 1

Author(s):

Xin Zhou ◽

Serafim Batzoglou ◽

Arend Sidow ◽

Lu Zhang

Keyword(s):

False Positive ◽

De Novo ◽

False Positives ◽

Sequencing Data ◽

De Novo Mutations ◽

Congenital Diseases ◽

Genome Wide ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Haplotype Information

AbstractBackgroundDe novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls.ResultsTo address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM.HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80% to 99% of false positives regardless of how large the candidate DNM set is.ConclusionsHAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity.

Download Full-text

miPIE: NGS-based Prediction of miRNA Using Integrated Evidence

10.1101/405357 ◽

2018 ◽

Author(s):

R.J. Peace ◽

M. Sheikh Hassani ◽

J.R. Green

Keyword(s):

De Novo ◽

Genomic Sequence ◽

Prediction Performance ◽

Data Sets ◽

Mirna Prediction ◽

Individual Contributions ◽

The Individual ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

AbstractMethods for the de novo identification of microRNA (miRNA) have been developed using a range of sequence-based features. With the increasing availability of next generation sequencing (NGS) transcriptome data, there is a need for miRNA identification that integrates both NGS transcript expression-based patterns as well as advanced genomic sequence-based methods. While miRDeep2 does examine the predicted secondary structure of putative miRNA sequences, it does not leverage many of the sequence-based features used in state-of-the-art de novo methods. Meanwhile, other NGS-based methods, such as miRanalyzer, place an emphasis on sequence-based features without leveraging advanced expression-based features reflecting miRNA biosynthesis. This represents an opportunity to combine the strengths of NGS-based analysis with recent advances in de novo sequence-based miRNA prediction. We here develop a method, microRNA Prediction using Integrated Evidence (miPIE), which integrates both expression-based and sequence-based features to achieve significantly improved miRNA prediction performance. Feature selection identifies the 20 most discriminative features, 3 of which reflect strictly expression-based information. Evaluation using precision-recall curves, for six NGS data sets representing six diverse species, demonstrates substantial improvements in prediction performance compared to miRDeep2 and miRanalyzer. The individual contributions of expression-based and sequence-based features are also examined and we demonstrate that their combination is more effective than either alone.

Download Full-text

Transcriptional Identification of Related Proteins in the Immune System of the Crayfish Procambarus clarkii

High-Throughput ◽

10.3390/ht7030026 ◽

2018 ◽

Vol 7 (3) ◽

pp. 26 ◽

Cited By ~ 1

Author(s):

Gabina Calderón-Rosete ◽

Juan González-Barrios ◽

Manuel Lara-Lozano ◽

Celia Piña-Leyva ◽

Leonardo Rodríguez-Sosa

Keyword(s):

Immune System ◽

De Novo ◽

Procambarus Clarkii ◽

Freshwater Crayfish ◽

Future Studies ◽

Consensus Sequences ◽

Next Generation Sequencing Ngs ◽

Related Proteins ◽

Generation Sequencing

The freshwater crayfish Procambarus clarkii is an animal model employed for physiological and immunological studies and is also of great economic importance in aquaculture. Although it is a species of easy husbandry, a high percentage of its production is lost annually as a result of infectious diseases. Currently, genetic information about the immune system of crustaceans is limited. Therefore, we used the abdominal nerve cord from P. clarkii to obtain its transcriptome using Next Generation Sequencing (NGS) to identify proteins that participate in the immune system. The reads were assembled de novo and consensus sequences with more than 3000 nucleotides were selected for analysis. The transcripts of the sequences of RNA were edited for annotation and sent to the GenBank database of the National Center for Biotechnology Information (NCBI). We made a list of accession numbers of the sequences which were organized by the putative role of the immune system pathway in which they participate. In this work, we report on 80 proteins identified from the transcriptome of crayfish related to the immune system, 74 of them being the first reported for P. clarkii. We hope that the knowledge of these sequences will contribute significantly to the development of future studies of the immune system in crustaceans.

Download Full-text

Histoimmunogenetics Markup Language 1.0: Reporting Next Generation Sequencing-based HLA and KIR Genotyping

10.1101/014951 ◽

2015 ◽

Author(s):

Robert P Milius ◽

Michael Heuer ◽

Daniel Valiga ◽

Kathryn J Doroschak ◽

Caleb J. Kennedy ◽

...

Keyword(s):

Next Generation Sequencing ◽

Data Exchange ◽

Consensus Sequence ◽

Markup Language ◽

Next Generation ◽

Multiple Group ◽

Specific Priming ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and sequence based typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS.

Download Full-text

Minimum Information for Reporting Next Generation Sequence Genotyping (MIRING): Guidelines for Reporting HLA and KIR Genotyping via Next Generation Sequencing

10.1101/015230 ◽

2015 ◽

Author(s):

Steven J. Mack ◽

Robert P Milius ◽

Benjamin D Gifford ◽

Jürgen Sauter ◽

Jan Hofmann ◽

...

Keyword(s):

Next Generation Sequencing ◽

Consensus Sequence ◽

Primary Data ◽

Genotype Data ◽

Next Generation ◽

Minimum Information ◽

Structured Information ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information – message annotation, reference context, full genotype, consensus sequence and novel polymorphism – and references to three categories of accessory information – NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org.

Download Full-text

Patient Derived Xenografts for Genome-Driven Therapy of Osteosarcoma

Cells ◽

10.3390/cells10020416 ◽

2021 ◽

Vol 10 (2) ◽

pp. 416

Author(s):

Lorena Landuzzi ◽

Maria Cristina Manara ◽

Pier-Luigi Lollini ◽

Katia Scotlandi

Keyword(s):

Clinical Trials ◽

Tumor Heterogeneity ◽

Functional Studies ◽

Ngs Data Analysis ◽

The Many ◽

Orthotopic Xenografts ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing ◽

Somatic Copy Number Alterations

Osteosarcoma (OS) is a rare malignant primary tumor of mesenchymal origin affecting bone. It is characterized by a complex genotype, mainly due to the high frequency of chromothripsis, which leads to multiple somatic copy number alterations and structural rearrangements. Any effort to design genome-driven therapies must therefore consider such high inter- and intra-tumor heterogeneity. Therefore, many laboratories and international networks are developing and sharing OS patient-derived xenografts (OS PDX) to broaden the availability of models that reproduce OS complex clinical heterogeneity. OS PDXs, and new cell lines derived from PDXs, faithfully preserve tumor heterogeneity, genetic, and epigenetic features and are thus valuable tools for predicting drug responses. Here, we review recent achievements concerning OS PDXs, summarizing the methods used to obtain ectopic and orthotopic xenografts and to fully characterize these models. The availability of OS PDXs across the many international PDX platforms and their possible use in PDX clinical trials are also described. We recommend the coupling of next-generation sequencing (NGS) data analysis with functional studies in OS PDXs, as well as the setup of OS PDX clinical trials and co-clinical trials, to enhance the predictive power of experimental evidence and to accelerate the clinical translation of effective genome-guided therapies for this aggressive disease.

Download Full-text

Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants

Molecules ◽

10.3390/molecules23020399 ◽

2018 ◽

Vol 23 (2) ◽

pp. 399 ◽

Cited By ~ 41

Author(s):

Sima Taheri ◽

Thohirah Lee Abdullah ◽

Mohd Yusop ◽

Mohamed Hanafi ◽

Mahbod Sahebi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Ssr Markers ◽

Next Generation ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Download Full-text

Appendix A: Common File Types Used in Next-Generation Sequencing (NGS) Data Analysis

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-20 ◽

2016 ◽

pp. 199-202

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Next Generation ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Download Full-text