ploidyNGS: Visually exploring ploidy with Next Generation Sequencing data

Mapping Intimacies ◽

10.1101/086488 ◽

2016 ◽

Cited By ~ 2

Author(s):

Renato Augusto Corrêa dos Santos ◽

Gustavo Henrique Goldman ◽

Diego Mauricio Riaño-Pachón

Keyword(s):

Next Generation Sequencing Data ◽

Sequencing Data ◽

Ploidy Levels ◽

Short Read ◽

Link Type ◽

Model Free ◽

Ngs Data ◽

Free Open Source ◽

Generation Sequencing ◽

General Public License

AbstractSummaryploidyNGSis a model-free, open source tool to visualize and explore ploidy levels in a newly sequenced genome, exploiting short read data. We testedploidyNGSusing both simulated and real NGS data of the model yeastSaccharomyces cerevisiae.ploidyNGSallows the identification of the ploidy level of a newly sequenced genome in a visual way.Availability and implementationploidyNGSis available under the GNU General Public License (GPL) athttps://github.com/diriano/ploidyNGS.ploidyNGSis implemented in Python and [email protected]

Download Full-text

Detection of somatic structural variants from short-read next-generation sequencing data

10.1101/840751 ◽

2019 ◽

Author(s):

Tingting Gong ◽

Vanessa M Hayes ◽

Eva KF Chan

Keyword(s):

Next Generation Sequencing ◽

Cancer Genomics ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Structural Variants ◽

Sequencing Data ◽

Short Read ◽

Factors Affecting ◽

Ngs Data ◽

Generation Sequencing

AbstractSomatic structural variants (SVs) play a significant role in cancer development and evolution, but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS, and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of eight commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the eight SV callers examined in this paper. As the importance of large structural variants become increasingly recognised in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection and guidance on selecting an appropriate SV caller.

Download Full-text

Detection of somatic structural variants from short-read next-generation sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa056 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tingting Gong ◽

Vanessa M Hayes ◽

Eva K F Chan

Keyword(s):

Next Generation Sequencing ◽

Cancer Genomics ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Structural Variants ◽

Sequencing Data ◽

Short Read ◽

Factors Affecting ◽

Ngs Data ◽

Generation Sequencing

Abstract Somatic structural variants (SVs), which are variants that typically impact >50 nucleotides, play a significant role in cancer development and evolution but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS and sequence alignment ambiguities. In spite of active development of SV detection tools (callers) over the past few years, each method has inherent advantages and limitations. In this review, we highlight some of the important factors affecting somatic SV detection and compared the performance of seven commonly used SV callers. In particular, we focus on the extent of change in sensitivity and precision for detecting different SV types and size ranges from samples with differing variant allele frequencies and sequencing depths of coverage. We highlight the reasons for why some SV callers perform well in some settings but not others, allowing our evaluation findings to be extended beyond the seven SV callers examined in this paper. As the importance of large SVs become increasingly recognized in cancer genomics, this paper provides a timely review on some of the most impactful factors influencing somatic SV detection that should be considered when choosing SV callers.

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text

Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA

Genome Biology ◽

10.1186/gb-2010-11-10-r99 ◽

2010 ◽

Vol 11 (10) ◽

Cited By ~ 53

Author(s):

Nils Homer ◽

Stanley F Nelson

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Short Read ◽

Variant Discovery ◽

Generation Sequencing

Download Full-text

PERHAPS: Paired-End short Reads-based HAPlotyping from next-generation Sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa320 ◽

2020 ◽

Author(s):

Jie Huang ◽

Stefano Pallotti ◽

Qianling Zhou ◽

Marcus Kleber ◽

Xiaomeng Xin ◽

...

Keyword(s):

Next Generation Sequencing ◽

Snp Array ◽

Simple Approach ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Short Read ◽

Array Data ◽

Short Reads ◽

Generation Sequencing

Abstract The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.

Download Full-text

Detection of FLT3 Internal Tandem Duplication in Targeted, Short-Read-Length, Next-Generation Sequencing Data

Journal of Molecular Diagnostics ◽

10.1016/j.jmoldx.2012.08.001 ◽

2013 ◽

Vol 15 (1) ◽

pp. 81-93 ◽

Cited By ~ 81

Author(s):

David H. Spencer ◽

Haley J. Abel ◽

Christina M. Lockwood ◽

Jacqueline E. Payton ◽

Philippe Szankasi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Tandem Duplication ◽

Read Length ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Internal Tandem Duplication ◽

Short Read ◽

Flt3 Internal Tandem Duplication ◽

Generation Sequencing ◽

Short Read Length

Download Full-text

A Novel Method to Detect Bias in Short Read NGS Data

Journal of Integrative Bioinformatics ◽

10.1515/jib-2017-0025 ◽

2017 ◽

Vol 14 (3) ◽

Cited By ~ 1

Author(s):

Jamie Alnasir ◽

Hugh P. Shanahan

Keyword(s):

Biological Significance ◽

Gc Content ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Short Read ◽

Novel Method ◽

Type Data ◽

Ngs Data

AbstractDetecting sources of bias in transcriptomic data is essential to determine signals of Biological significance. We outline a novel method to detect sequence specific bias in short read Next Generation Sequencing data. This is based on determining intra-exon correlations between specific motifs. This requires a mild assumption that short reads sampled from specific regions from the same exon will be correlated with each other. This has been implemented on Apache Spark and used to analyse two D. melanogaster eye-antennal disc data sets generated at the same laboratory. The wild type data set in drosophila indicates a variation due to motif GC content that is more significant than that found due to exon GC content. The software is available online and could be applied for cross-experiment transcriptome data analysis in eukaryotes.

Download Full-text

WBFQC: A new approach for compressing next-generation sequencing data splitting into homogeneous streams

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001850018x ◽

2018 ◽

Vol 16 (05) ◽

pp. 1850018 ◽

Cited By ~ 1

Author(s):

Sanjeev Kumar ◽

Suneeta Agarwal ◽

Ranvijay

Keyword(s):

Next Generation Sequencing ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Compression Technique ◽

Compression Algorithms ◽

Ngs Data ◽

And Storage ◽

Generation Sequencing

Genomic data nowadays is playing a vital role in number of fields such as personalized medicine, forensic, drug discovery, sequence alignment and agriculture, etc. With the advancements and reduction in the cost of next-generation sequencing (NGS) technology, these data are growing exponentially. NGS data are being generated more rapidly than they could be significantly analyzed. Thus, there is much scope for developing novel data compression algorithms to facilitate data analysis along with data transfer and storage directly. An innovative compression technique is proposed here to address the problem of transmission and storage of large NGS data. This paper presents a lossless non-reference-based FastQ file compression approach, segregating the data into three different streams and then applying appropriate and efficient compression algorithms on each. Experiments show that the proposed approach (WBFQC) outperforms other state-of-the-art approaches for compressing NGS data in terms of compression ratio (CR), and compression and decompression time. It also has random access capability over compressed genomic data. An open source FastQ compression tool is also provided here ( http://www.algorithm-skg.com/wbfqc/home.html ).

Download Full-text

Benchmark of tools for CNV detection from NGS panel data in a genetic diagnostics context

10.1101/850958 ◽

2019 ◽

Author(s):

José Marcos Moreno-Cabrera ◽

Jesús del Valle ◽

Elisabeth Castellanos ◽

Lidia Feliubadaló ◽

Marta Pineda ◽

...

Keyword(s):

Copy Number Variants ◽

Hereditary Diseases ◽

Next Generation Sequencing Data ◽

Genetic Diagnostics ◽

Screening Methods ◽

Sequencing Data ◽

Targeted Next Generation Sequencing ◽

Highly Sensitive ◽

Ngs Data ◽

Generation Sequencing

ABSTRACTMotivationAlthough germline copy number variants (CNVs) are the genetic cause of multiple hereditary diseases, detecting them from targeted next-generation sequencing data (NGS) remains a challenge. Existing tools perform well for large CNVs but struggle with single and multi-exon alterations. The aim of this work is to evaluate CNV calling tools working on gene panel NGS data with CNVs up to single-exon resolution and their suitability as a screening step before orthogonal confirmation in genetic diagnostics strategies.ResultsFive tools (DECoN, CoNVaDING, panelcn.MOPS, ExomeDepth and CODEX2) were tested against four genetic diagnostics datasets (495 samples, 231 CNVs), using the default and sensitivity-optimized parameters. Most tools were highly sensitive and specific, but the performance was dataset-dependant. In our in-house datasets, DECoN and panelcn.MOPS with optimized parameters showed enough sensitivity to be used as screening methods in genetic diagnostics.AvailabilityBenchmarking-optimization code is freely available at https://github.com/TranslationalBioinformaticsIGTP/CNVbenchmarkeR.

Download Full-text

CoverView: a sequence quality evaluation tool for next generation sequencing data

Wellcome Open Research ◽

10.12688/wellcomeopenres.14306.1 ◽

2018 ◽

Vol 3 ◽

pp. 36 ◽

Cited By ~ 5

Author(s):

Márton Münz ◽

Shazia Mahamdallie ◽

Shawn Yost ◽

Andrew Rimmer ◽

Emma Poyastro-Pearson ◽

...

Keyword(s):

Quality Control ◽

Next Generation Sequencing ◽

Quality Evaluation ◽

Reference Sample ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Evaluation Tool ◽

Link Type ◽

Generation Sequencing

Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView

Download Full-text