GMPR: A novel normalization method for microbiome sequencing data

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

10.7287/peerj.preprints.3417v2 ◽

2017 ◽

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbing Huang ◽

Jun Chen

Keyword(s):

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data ◽

Sequencing Data Analysis

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Download Full-text

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

10.7287/peerj.preprints.3417 ◽

2018 ◽

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbing Huang ◽

Xuefeng Wang ◽

...

Keyword(s):

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data ◽

Sequencing Data Analysis

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Download Full-text

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

10.7287/peerj.preprints.3417v1 ◽

2017 ◽

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbin Huang ◽

Jun Chen

Keyword(s):

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data ◽

Sequencing Data Analysis

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Download Full-text

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

PeerJ ◽

10.7717/peerj.4600 ◽

2018 ◽

Vol 6 ◽

pp. e4600 ◽

Cited By ~ 45

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbing Huang ◽

Xuefeng Wang ◽

...

Keyword(s):

Geometric Mean ◽

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios—a simple but effective normalization method—for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Download Full-text

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

10.7287/peerj.preprints.3417v3 ◽

2018 ◽

Author(s):

Li Chen ◽

James Reeve ◽

Lujun Zhang ◽

Shengbing Huang ◽

Xuefeng Wang ◽

...

Keyword(s):

Normalization Method ◽

Rna Seq ◽

Sequencing Data ◽

Data Simulation ◽

Vast Number ◽

Number Of Zeros ◽

Normalization Methods ◽

Under Sampling ◽

Microbiome Data ◽

Sequencing Data Analysis

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero inflation remain largely undeveloped. Here we propose GMPR - a simple but effective normalization method - for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

Download Full-text

Retinitis pigmentosa is associated with shifts in the gut microbiome

Scientific Reports ◽

10.1038/s41598-021-86052-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Oksana Kutsyr ◽

Lucía Maestre-Carballa ◽

Mónica Lluesma-Gomez ◽

Manuel Martinez-Garcia ◽

Nicolás Cuenca ◽

...

Keyword(s):

Retinitis Pigmentosa ◽

Gut Microbiome ◽

Functional Decline ◽

Retinal Disease ◽

Photoreceptor Cells ◽

Amplicon Sequencing ◽

Rrna Gene ◽

Sequencing Data ◽

Alpha And Beta Diversity ◽

Potential Biomarker

AbstractThe gut microbiome is known to influence the pathogenesis and progression of neurodegenerative diseases. However, there has been relatively little focus upon the implications of the gut microbiome in retinal diseases such as retinitis pigmentosa (RP). Here, we investigated changes in gut microbiome composition linked to RP, by assessing both retinal degeneration and gut microbiome in the rd10 mouse model of RP as compared to control C57BL/6J mice. In rd10 mice, retinal responsiveness to flashlight stimuli and visual acuity were deteriorated with respect to observed in age-matched control mice. This functional decline in dystrophic animals was accompanied by photoreceptor loss, morphologic anomalies in photoreceptor cells and retinal reactive gliosis. Furthermore, 16S rRNA gene amplicon sequencing data showed a microbial gut dysbiosis with differences in alpha and beta diversity at the genera, species and amplicon sequence variants (ASV) levels between dystrophic and control mice. Remarkably, four fairly common ASV in healthy gut microbiome belonging to Rikenella spp., Muribaculaceace spp., Prevotellaceae UCG-001 spp., and Bacilli spp. were absent in the gut microbiome of retinal disease mice, while Bacteroides caecimuris was significantly enriched in mice with RP. The results indicate that retinal degenerative changes in RP are linked to relevant gut microbiome changes. The findings suggest that microbiome shifting could be considered as potential biomarker and therapeutic target for retinal degenerative diseases.

Download Full-text

Pisces: An Accurate and Versatile Variant Caller for Somatic and Germline Next-Generation Sequencing Data

10.1101/291641 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tamsen Dunn ◽

Gwenn Berry ◽

Dorothea Emig-Agius ◽

Yu Jiang ◽

Serena Lei ◽

...

Keyword(s):

Next Generation Sequencing ◽

Gene Mutations ◽

Variant Calling ◽

Amplicon Sequencing ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ras Gene ◽

Generation Sequencing

AbstractMotivationNext-Generation Sequencing (NGS) technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants, suitable for the clinic.ResultsWe have developed Pisces, a rapid, versatile and accurate small variant calling suite designed for somatic and germline amplicon sequencing applications. Pisces accuracy is achieved by four distinct modules, the Pisces Read Stitcher, Pisces Variant Caller, the Pisces Variant Quality Recalibrator, and the Pisces Variant Phaser. Each module incorporates a number of novel algorithmic strategies aimed at reducing noise or increasing the likelihood of detecting a true variant.AvailabilityPisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub as part of the TruSeq Amplicon workflow and the Illumina Ampliseq Workflow. Pisces is distributed on Illumina sequencing platforms such as the MiSeq™, and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA for the detection of multiple RAS gene [email protected] informationSupplementary data are available online.

Download Full-text

SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples

Frontiers in Genetics ◽

10.3389/fgene.2021.650795 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shen Yin ◽

Xiaowei Zhan ◽

Bo Yao ◽

Guanghua Xiao ◽

Xinlei Wang ◽

...

Keyword(s):

Mixture Model ◽

Normalization Method ◽

Superior Performance ◽

Rna Seq ◽

Web Based ◽

Normalization Methods ◽

Formalin Fixed Paraffin ◽

Ffpe Samples ◽

Formalin Fixed Paraffin Embedded ◽

Formalin Fixed

RNA-sequencing (RNA-seq) provides a comprehensive quantification of transcriptomic activities in biological samples. Formalin-Fixed Paraffin-Embedded (FFPE) samples are collected as part of routine clinical procedure, and are the most widely available biological sample format in medical research and patient care. Normalization is an essential step in RNA-seq data analysis. A number of normalization methods, though developed for RNA-seq data from fresh frozen (FF) samples, can be used with FFPE samples as well. The only extant normalization method specifically designed for FFPE RNA-seq data, MIXnorm, which has been shown to outperform the normalization methods, but at the cost of a complex mixture model and a high computational burden. It is therefore important to adapt MIXnorm for simplicity and computational efficiency while maintaining superior performance. Furthermore, it is critical to develop an integrated tool that performs commonly used normalization methods for both FF and FFPE RNA-seq data. We developed a new normalization method for FFPE RNA-seq data, named SMIXnorm, based on a simplified two-component mixture model compared to MIXnorm to facilitate computation. The expression levels of expressed genes are modeled by normal distributions without truncation, and those of non-expressed genes are modeled by zero-inflated Poisson distributions. The maximum likelihood estimates of the model parameters are obtained by a nested Expectation-Maximization algorithm with a less complicated latent variable structure, and closed-form updates are available within each iteration. Real data applications and simulation studies show that SMIXnorm greatly reduces computing time compared to MIXnorm, without sacrificing the performance. More importantly, we developed a web-based tool, RNA-seq Normalization (RSeqNorm), that offers a simple workflow to compute normalized RNA-seq data for both FFPE and FF samples. It includes SMIXnorm and MIXnorm for FFPE RNA-seq data, together with five commonly used normalization methods for FF RNA-seq data. Users can easily upload a raw RNA-seq count matrix and select one of the seven normalization methods to produce a downloadable normalized expression matrix for any downstream analysis. The R package is available at https://github.com/S-YIN/RSEQNORM. The web-based tool, RSeqNorm is available at http://lce.biohpc.swmed.edu/rseqnorm with no restriction to use or redistribute.

Download Full-text

Microbial Communities on Plastic Polymers in the Mediterranean Sea

Frontiers in Microbiology ◽

10.3389/fmicb.2021.673553 ◽

2021 ◽

Vol 12 ◽

Author(s):

Annika Vaksmaa ◽

Katrin Knittel ◽

Alejandro Abdala Asbun ◽

Maaike Goudriaan ◽

Andreas Ellrott ◽

...

Keyword(s):

Microbial Communities ◽

Mediterranean Sea ◽

Amplicon Sequencing ◽

Polymer Surfaces ◽

Rrna Gene ◽

Sequencing Data ◽

Microbial Biofilms ◽

Degrading Bacteria ◽

The Mediterranean

Plastic particles in the ocean are typically covered with microbial biofilms, but it remains unclear whether distinct microbial communities colonize different polymer types. In this study, we analyzed microbial communities forming biofilms on floating microplastics in a bay of the island of Elba in the Mediterranean Sea. Raman spectroscopy revealed that the plastic particles mainly comprised polyethylene (PE), polypropylene (PP), and polystyrene (PS) of which polyethylene and polypropylene particles were typically brittle and featured cracks. Fluorescence in situ hybridization and imaging by high-resolution microscopy revealed dense microbial biofilms on the polymer surfaces. Amplicon sequencing of the 16S rRNA gene showed that the bacterial communities on all plastic types consisted mainly of the orders Flavobacteriales, Rhodobacterales, Cytophagales, Rickettsiales, Alteromonadales, Chitinophagales, and Oceanospirillales. We found significant differences in the biofilm community composition on PE compared with PP and PS (on OTU and order level), which shows that different microbial communities colonize specific polymer types. Furthermore, the sequencing data also revealed a higher relative abundance of archaeal sequences on PS in comparison with PE or PP. We furthermore found a high occurrence, up to 17% of all sequences, of different hydrocarbon-degrading bacteria on all investigated plastic types. However, their functioning in the plastic-associated biofilm and potential role in plastic degradation needs further assessment.

Download Full-text

Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btz484 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4809-4811 ◽

Cited By ~ 8

Author(s):

Robert S Harris ◽

Monika Cechova ◽

Kateryna D Makova

Keyword(s):

Tandem Repeats ◽

Error Rates ◽

Superior Performance ◽

Supplementary Information ◽

Whole Genome Sequencing Data ◽

Dna Repeats ◽

Sequencing Data ◽

Heat Shock Stress ◽

Noise Cancelling ◽

Long Read

Abstract Summary Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response. Availability and implementation NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text