GLASS: assisted and standardized assessment of gene variations from Sanger sequence trace data

Mapping Intimacies ◽

10.1101/088401 ◽

2016 ◽

Author(s):

Karol Pal ◽

Vojtech Bystry ◽

Tomas Reigl ◽

Martin Demko ◽

Adam Krejci ◽

...

Keyword(s):

Sanger Sequencing ◽

Reference Method ◽

Supplementary Information ◽

Sequence Variant ◽

Supplementary Data ◽

Sequencing Data ◽

Manual Inspection ◽

Sanger Sequence ◽

Variant Detection ◽

Sequence Trace

AbstractMotivationSanger sequencing remains the reference method for sequence variant detection, especially in a clinical setting. However, chromatogram interpretation often requires manual inspection and in some cases considerable expertise. Additionally, variant reporting and nomenclature is typically left to the user, which can lead to inconsistencies.ResultsWe introduce GLASS, a tool built to assist with the assessment of gene variations in Sanger sequencing data. Critically, it provides a standardized variant output as recommended by the Human Genome Variation Society.AvailabilityThe program is freely available online at http://bat.infspire.org/genomepd/glass/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

SVIM-asm: Structural variant detection from haploid and diploid genome assemblies

10.1101/2020.10.27.356907 ◽

2020 ◽

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Genetic Information ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Diploid Genome ◽

Insertions And Deletions ◽

Structural Variant ◽

Sequencing Technologies ◽

Variant Detection ◽

Genome Assemblies

AbstractMotivationWith the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes.ResultsWe introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual.Availability and ImplementationSVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/[email protected] informationSupplementary data are available online.

Download Full-text

popSTR2 enables clinical and population-scale genotyping of microsatellites

Bioinformatics ◽

10.1093/bioinformatics/btz913 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2269-2271 ◽

Cited By ~ 4

Author(s):

Snædis Kristmundsdottir ◽

Hannes P Eggertsson ◽

Gudny A Arnadottir ◽

Bjarni V Halldorsson

Keyword(s):

Population Based ◽

Supplementary Information ◽

Supplementary Data ◽

Clinical Sequencing ◽

Manual Inspection ◽

Repeat Expansions ◽

Population Scale

Abstract Summary popSTR2 is an update and augmentation of our previous work ‘popSTR: a population-based microsatellite genotyper’. To make genotyping sensitive to inter-sample differences, we supply a kernel to estimate sample-specific slippage rates. For clinical sequencing purposes, a panel of known pathogenic repeat expansions is provided along with a script that scans and flags for manual inspection markers indicative of a pathogenic expansion. Like its predecessor, popSTR2 allows for joint genotyping of samples at a population scale. We now provide a binning method that makes the microsatellite genotypes more amenable to analysis within standard association pipelines and can increase association power. Availability and implementation https://github.com/DecodeGenetics/popSTR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

schex avoids overplotting for large single-cell RNA-sequencing datasets

Bioinformatics ◽

10.1093/bioinformatics/btz907 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2291-2292 ◽

Cited By ~ 1

Author(s):

Saskia Freytag ◽

Ryan Lister

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

phylogenize: correcting for phylogeny reveals genes associated with microbial distributions

Bioinformatics ◽

10.1093/bioinformatics/btz722 ◽

2019 ◽

Vol 36 (4) ◽

pp. 1289-1290

Author(s):

Patrick H Bradley ◽

Katherine S Pollard

Keyword(s):

Community Composition ◽

Human Microbiome ◽

Human Microbiome Project ◽

Shotgun Sequencing ◽

Supplementary Information ◽

Phylogenetic Comparative Methods ◽

Supplementary Data ◽

Sequencing Data ◽

Phylogenetic Regression ◽

Project Data

Abstract Summary Phylogenetic comparative methods are powerful but presently under-utilized ways to identify microbial genes underlying differences in community composition. These methods help to identify functionally important genes because they test for associations beyond those expected when related microbes occupy similar environments. We present phylogenize, a pipeline with web, QIIME 2 and R interfaces that allows researchers to perform phylogenetic regression on 16S amplicon and shotgun sequencing data and to visualize results. phylogenize applies broadly to both host-associated and environmental microbiomes. Using Human Microbiome Project and Earth Microbiome Project data, we show that phylogenize draws similar conclusions from 16S versus shotgun sequencing and reveals both known and candidate pathways associated with host colonization. Availability and implementation phylogenize is available at https://phylogenize.org and https://bitbucket.org/pbradz/phylogenize. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Quantification of aneuploidy in targeted sequencing data using ASCETS

Bioinformatics ◽

10.1093/bioinformatics/btaa980 ◽

2020 ◽

Author(s):

Liam F Spurr ◽

Mehdi Touat ◽

Alison M Taylor ◽

Adrian M Dubuc ◽

Juliann Shih ◽

...

Keyword(s):

Copy Number ◽

Large Scale ◽

Genomic Analysis ◽

Targeted Sequencing ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Copy Number Changes ◽

Panel Sequencing ◽

Chromosome Level

Abstract Summary The expansion of targeted panel sequencing efforts has created opportunities for large-scale genomic analysis, but tools for copy-number quantification on panel data are lacking. We introduce ASCETS, a method for the efficient quantitation of arm and chromosome-level copy-number changes from targeted sequencing data. Availability and implementation ASCETS is implemented in R and is freely available to non-commercial users on GitHub: https://github.com/beroukhim-lab/ascets, along with detailed documentation. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

sangeranalyseR: simple and interactive analysis of Sanger sequencing data in R

10.1101/2020.05.18.102459 ◽

2020 ◽

Author(s):

Kuan-Hao Chao ◽

Kirston Barton ◽

Sarah Palmer ◽

Robert Lanfear

Keyword(s):

Sanger Sequencing ◽

Reference Sequence ◽

Supplementary Information ◽

File Format ◽

Bioconductor Package ◽

Sequencing Data ◽

Interactive Analysis ◽

Link Type ◽

Online Documentation ◽

Wide Range

AbstractSummarysangeranalyseR is an interactive R/Bioconductor package and two associated Shiny applications designed for analysing Sanger sequencing from data from the ABIF file format in R. It allows users to go from loading reads to saving aligned contigs in a few lines of R code. sangeranalyseR provides a wide range of options for a number of commonly-performed actions including read trimming, detecting secondary peaks, viewing chromatograms, and detecting indels using a reference sequence. All parameters can be adjusted interactively either in R or in the associated Shiny applications. sangeranalyseR comes with extensive online documentation, and outputs detailed interactive HTML reports.Availability and implementationsangeranalyseR is implemented in R and released under an MIT license. It is available for all platforms on Bioconductor (https://bioconductor.org/packages/sangeranalyseR) and on Github (https://github.com/roblanf/sangeranalyseR)[email protected] informationDocumentation at https://sangeranalyser.readthedocs.io/.

Download Full-text

GPress: a framework for querying general feature format (GFF) files and expression files in a compressed form

Bioinformatics ◽

10.1093/bioinformatics/btaa604 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4810-4812

Author(s):

Qingxi Meng ◽

Idoia Ochoa ◽

Mikel Hernaez

Keyword(s):

Single Cell ◽

Data Streams ◽

General Feature ◽

Supplementary Information ◽

Storage Space ◽

Supplementary Data ◽

Rna Seq ◽

Sequencing Data ◽

General Feature Format ◽

Original File

Abstract Motivation Sequencing data are often summarized at different annotation levels for further analysis, generally using the general feature format (GFF) or its descendants, gene transfer format (GTF) and GFF3. Existing utilities for accessing these files, like gffutils and gffread, do not focus on reducing the storage space, significantly increasing it in some cases. We propose GPress, a framework for querying GFF files in a compressed form. GPress can also incorporate and compress expression files from both bulk and single-cell RNA-Seq experiments, supporting simultaneous queries on both the GFF and expression files. In brief, GPress applies transformations to the data which are then compressed with the general lossless compressor BSC. To support queries, GPress compresses the data in blocks and creates several index tables for fast retrieval. Results We tested GPress on several GFF files of different organisms, and showed that it achieves on average a 61% reduction in size with respect to gzip (the current de facto compressor for GFF files) while being able to retrieve all annotations for a given identifier or a range of coordinates in a few seconds (when run in a common laptop). In contrast, gffutils provides faster retrieval but doubles the size of the GFF files. When additionally linking an expression file, we show that GPress can reduce its size by more than 68% when compared to gzip (for both bulk and single-cell RNA-Seq experiments), while still retrieving the information within seconds. Finally, applying BSC to the data streams generated by GPress instead of to the original file shows a size reduction of more than 44% on average. Availability and implementation GPress is freely available at https://github.com/qm2/gpress. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Structural variant analysis for linked-read sequencing data with gemtools

Bioinformatics ◽

10.1093/bioinformatics/btz239 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4397-4399 ◽

Cited By ~ 2

Author(s):

S U Greer ◽

H P Ji

Keyword(s):

Supplementary Information ◽

Supplementary Data ◽

Structural Variants ◽

Sequencing Data ◽

Structural Variant ◽

Single Dna Molecules ◽

Long Reads ◽

Depth Analysis ◽

Basic Functions ◽

Variant Analysis

Abstract Summary Linked-read sequencing generates synthetic long reads which are useful for the detection and analysis of structural variants (SVs). The software associated with 10× Genomics linked-read sequencing, Long Ranger, generates the essential output files (BAM, VCF, SV BEDPE) necessary for downstream analyses. However, to perform downstream analyses requires the user to customize their own tools to handle the unique features of linked-read sequencing data. Here, we describe gemtools, a collection of tools for the downstream and in-depth analysis of SVs from linked-read data. Gemtools uses the barcoded aligned reads and the Megabase-scale phase blocks to determine haplotypes of SV breakpoints and delineate complex breakpoint configurations at the resolution of single DNA molecules. The gemtools package is a suite of tools that provides the user with the flexibility to perform basic functions on their linked-read sequencing output in order to address even more questions. Availability and implementation The gemtools package is freely available for download at: https://github.com/sgreer77/gemtools. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FQSqueezer: k-mer-based compression of sequencing data

10.1101/559807 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sebastian Deorowicz

Keyword(s):

Data Compression ◽

State Of The Art ◽

Genomic Data ◽

General Purpose ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Partial Matching ◽

Supplementary Material ◽

Better Than

AbstractMotivationThe amount of genomic data that needs to be stored is huge. Therefore it is not surprising that a lot of work has been done in the field of specialized data compression of FASTQ files. The existing algorithms are, however, still imperfect and the best tools produce quite large archives.ResultsWe present FQSqueezer, a novel compression algorithm for sequencing data able to process single- and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by partial matching and dynamic Markov coder algorithms known from the general-purpose-compressors world. The compression ratios are often tens of percent better than offered by the state-of-the-art tools.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.

Download Full-text

SOAPTyping: an open-source and cross-platform tool for Sanger sequence-based typing for HLA class I and II alleles

10.1101/674648 ◽

2019 ◽

Author(s):

Yong Zhang ◽

Yongsheng Chen ◽

Huixin Xu ◽

Junbin Fang ◽

Zijian Zhao ◽

...

Keyword(s):

Open Source ◽

Sanger Sequencing ◽

Human Leukocyte ◽

Hla Class I ◽

Hla Typing ◽

Class I ◽

Supplementary Information ◽

Accurate Identification ◽

Sanger Sequence ◽

Cross Platform

ABSTRACTSummaryThe human leukocyte antigen (HLA) gene family plays a key role in the immune response and thus is crucial in many biomedical and clinical settings. Utilizing Sanger sequencing - the gold standard technology for HLA typing – enables accurate identification of HLA alleles with high-resolution. However, there exists a current hurdle that only commercial software such as UType, SBT-Assign and SBTEngine, instead of any open source tools could be applied to perform HLA typing based on Sanger sequencing. To fill the gap, we developed a stand-alone, open-source and cross-platform software, known as SOAPTyping, for Sanger-based typing in HLA class I and II alleles.Availability and implementationSOAPTyping is implemented in C++ language and Qt framework, which is supported on Windows, Mac and Linux. Source code and detailed documentation are accessible via the project GitHub page: https://github.com/BGI-flexlab/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text