Proteus: an R package for downstream analysis of MaxQuant output

Mapping Intimacies ◽

10.1101/416511 ◽

2018 ◽

Cited By ~ 9

Author(s):

Marek Gierlinski ◽

Francesco Gastaldello ◽

Chris Cole ◽

Geoffrey J. Barton

Keyword(s):

Mass Spectrometry ◽

Differential Expression Analysis ◽

Simulated Data ◽

R Package ◽

Data Exploration ◽

Label Free ◽

Interactive Analysis ◽

Quality Checks ◽

Downstream Analysis ◽

Selection Of

AbstractProteus is a package for downstream analysis of MaxQuant evidence data in the R environment. It provides tools for peptide and protein aggregation, quality checks, data exploration and visualisation. Interactive analysis is implemented in the Shiny framework, where individual peptides or protein may be examined in the context of a volcano plot. Proteus performs differential expression analysis with the well-established tool limma, which offers robust treatment of missing data, frequently encountered in label-free mass-spectrometry experiments. We demonstrate on real and simulated data that limma results in improved sensitivity over random imputation combined with a t-test as implemented in the popular package Perseus. Embedding Proteus in R provides access to a wide selection of statistical and graphical tools for further analysis and reproducibility by scripting. Availability and implementation: The open-source R package, including example data and tutorials, is available to install from GitHub (https://github.com/bartongroup/proteus).

Download Full-text

Data-based RNA-seq Simulations by Binomial Thinning

10.1101/758524 ◽

2019 ◽

Cited By ~ 1

Author(s):

David Gerard

Keyword(s):

Theoretical Model ◽

Single Cell ◽

Differential Expression Analysis ◽

Simulated Data ◽

Real Data ◽

Theoretical Models ◽

Simulation Method ◽

R Package ◽

Rna Seq ◽

Ideal Model

AbstractWith the explosion in the number of methods designed to analyze bulk and single-cell RNA-seq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in un-substantiated claims of a method’s performance. Rather than generate data from a theoretical model, in this paper we develop methods to add signal to real RNA-seq datasets. Since the resulting simulated data are not generated from an unrealistic theoretical model, they exhibit realistic (annoying) attributes of real data. This lets RNA-seq methods developers assess their procedures in non-ideal (model-violating) scenarios. Our procedures may be applied to both single-cell and bulk RNA-seq. We show that our simulation method results in more realistic datasets and can alter the conclusions of a differential expression analysis study. We also demonstrate our approach by comparing various factor analysis techniques on RNA-seq datasets. Our tools are available in the seqgendiff R package on the Comprehensive R Archive Net-work: https://cran.r-project.org/package=seqgendiff.

Download Full-text

scRMD: imputation for single cell RNA-seq data via robust matrix decomposition

Bioinformatics ◽

10.1093/bioinformatics/btaa139 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3156-3161 ◽

Cited By ~ 9

Author(s):

Chong Chen ◽

Changjing Wu ◽

Linjie Wu ◽

Xiaochen Wang ◽

Minghua Deng ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Matrix Decomposition ◽

Transcriptome Profiling ◽

R Package ◽

Supplementary Information ◽

Downstream Analysis

Abstract Motivation Single cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant increase of noises, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values can be beneficial in scRNA-seq data analysis. Results In this article, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method called scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering analysis. Availability and implementation The R package scRMD is available at https://github.com/XiDsLab/scRMD. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

High-Throughput Screening by Mass Spectrometry: Comparison with the Scintillation Proximity Assay with a Focused-File Screen of AKT1/PKBα

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057107300647 ◽

2007 ◽

Vol 12 (4) ◽

pp. 473-480 ◽

Cited By ~ 34

Author(s):

Andrea K. Quercia ◽

William A. Lamarr ◽

Jayhyuk Myung ◽

Can C. Özbal ◽

James A. Landro ◽

...

Keyword(s):

Mass Spectrometry ◽

High Throughput ◽

High Throughput Screening ◽

Kinase Inhibitors ◽

Label Free ◽

False Negatives ◽

Scintillation Proximity Assay ◽

Therapy Target ◽

Hit Identification ◽

Selection Of

Mass spectrometry is an emerging format for label-free high-throughput screening. The main limitation of mass spectrometry is throughput, due to the requirement to purify samples prior to ionization. Here the authors compare an automated high-throughput mass spectrometry (HTMS) system (RapidFire™) with the scintillation proximity assay (SPA). The cancer therapy target AKT1/PKBα was screened against a focused library of kinase inhibitors and IC50 values determined for all compounds that exhibit > 50% inhibition. A selection of additional compounds that exhibited ≤ 50% inhibition in the primary screen was chosen as controls to confirm inactives. The selection of compounds is expected to identify common actives, common inactives, false positives, and false negatives. Agreement is found between HTMS and SPA in terms of primary hit identification and hit confirmation. ( Journal of Biomolecular Screening 2007:473-480)

Download Full-text

A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data

10.1101/2020.05.29.122770 ◽

2020 ◽

Author(s):

Q. Giai Gianetto ◽

S. Wieczorek ◽

Y. Couté ◽

T. Burger

Keyword(s):

Multiple Imputation ◽

Missing Values ◽

Simulated Data ◽

R Package ◽

Label Free ◽

Proteomics Data ◽

Quantitative Mass Spectrometry ◽

Missing Completely At Random ◽

Peptide Level ◽

Multiple Imputation Method

AbstractMotivationQuantitative mass spectrometry-based proteomics data are characterized by high rates of missing values, which may be of two kinds: missing completely-at-random (MCAR) and missing not-at-random (MNAR). Despite numerous imputation methods available in the literature, none account for this duality, for it would require to diagnose the missingness mechanism behind each missing value.ResultsA multiple imputation strategy is proposed by combining MCAR-devoted and MNAR-devoted imputation algorithms. First, we propose an estimator for the proportion of MCAR values and show it is asymptotically unbiased under assumptions adapted to label-free proteomics data. This allows us to estimate the number of MCAR values in each sample and to take into account the nature of missing values through an original multiple imputation method. We evaluate this approach on simulated data and shows it outperforms traditionally used imputation algorithms.AvailabilityThe proposed methods are implemented in the R package imp4p (available on the CRAN Giai Gianetto (2020)), which is itself accessible through Prostar [email protected]; [email protected]

Download Full-text

Normalization of Mass Spectrometry Data (NOMAD)

10.1101/105783 ◽

2017 ◽

Author(s):

Carl Murie ◽

Brian Sandri ◽

Timothy J. Griffin ◽

Christine Wendt ◽

Ola Larsson

Keyword(s):

Mass Spectrometry ◽

Reference Sample ◽

R Package ◽

Mass Spectrometry Data ◽

Bioconductor Package ◽

Computationally Efficient ◽

Itraq Reagent ◽

Downstream Analysis ◽

Larger Sample ◽

Model Approach

AbstractMotivationiTRAQ reagent-based mass spectrometry (MS) is a commonly used technology for identification and quantification of proteins in biological samples. Such studies are often performed over multiple MS runs, potentially resulting in introduction of MS run bias that could affect downstream analysis. iTRAQ MS data have therefore commonly been normalized using a reference sample which is included in each MS run. We show, however, that such normalization does not efficiently remove systematic MS run bias. A linear model approach was previously proposed to improve on the reference normalization approach but does not computationally scale to larger data. Here we describe the NOMAD (normalization of mass spectrometry data) R package which implements a computationally efficient ANOVA normalization approach with protein assembly functionality.ResultsNOMAD provides the same advantages as the linear regression solution but is more computationally efficient which allows superior scaling to larger sample sizes. Moreover, NOMAD efficiently removes bias which allows for valid across MS run comparisons.AvailabilityThe NOMAD Bioconductor package: [email protected]; [email protected]

Download Full-text

genomalicious: serving up a smorgasbord of R functions for population genomic analyses

10.1101/667337 ◽

2019 ◽

Author(s):

Joshua A. Thia ◽

Cynthia Riginos

Keyword(s):

Source Code ◽

R Package ◽

Allele Frequencies ◽

Data Exploration ◽

Programming Environments ◽

Population Genomic ◽

Snp Data ◽

Genomic Analyses ◽

Selection Of ◽

R Functions

ABSTRACTTurning SNP data into biologically meaningful results requires considerable computational acrobatics, including importing, exporting, and manipulating data among different analytical packages and programming environments, and finding ways to visualise results for data exploration and presentation. We introduce genomalicious, an R package designed to provide a selection of functions for population genomicists to simply, intuitively, and flexibly, guide SNP data through their analytical pipelines, within and outside R. Moreover, researchers using pooled allele frequencies, or individually sequenced genotypes, are sure to find functions that accommodate their tastes in genomalicious. The source code for this package is freely available on GitHub.

Download Full-text

rearrvisr: an R package to detect, classify, and visualize genome rearrangements

10.1101/2020.06.25.170522 ◽

2020 ◽

Author(s):

Dorothea Lindtke ◽

Sam Yeaman

Keyword(s):

Simulated Data ◽

R Package ◽

Genome Rearrangements ◽

It Projects ◽

Data Set ◽

Single Genome ◽

A Genome ◽

Downstream Analysis ◽

Novel Algorithm

AbstractThe identification of genome rearrangements is of direct relevance for understanding their potential impacts on evolution and disease. However, available methods that detect or visualize rearrangements from deviations in gene order do not map them onto a genome of interest, complicating downstream analysis. In this work, we present rearrvisr, an R package that implements a novel algorithm for the identification and classification of rearrangements. In contrast to other software, it projects rearrangements onto a single genome, facilitating the localization of rearranged regions and estimation of their extent. We show that our tool achieves high precision and recall scores on simulated data, and illustrate the utility of our method by applying it to a data set generated from publicly available Drosophila genomes. The package is freely available from GitHub (https://github.com/dorolin/rearrvisr) and can be installed directly from R.

Download Full-text

Age-Dependent Hippocampal Proteomics in the APP/PS1 Alzheimer Mouse Model: A Comparative Analysis with Classical SWATH/DIA and directDIA Approaches

Cells ◽

10.3390/cells10071588 ◽

2021 ◽

Vol 10 (7) ◽

pp. 1588

Author(s):

Sophie J. F. van der Spek ◽

Miguel A. Gonzalez-Lozano ◽

Frank Koopmans ◽

Suzanne S. M. Miedema ◽

Iryna Paliukhovich ◽

...

Keyword(s):

Mouse Model ◽

Neurodegenerative Disorder ◽

Differential Expression Analysis ◽

Amyloid Β ◽

Label Free ◽

Proteomics Data ◽

Protein Levels ◽

Data Independent Acquisition ◽

Downstream Analysis ◽

Synaptic Pruning

Alzheimer’s disease (AD) is the most common neurodegenerative disorder in the human population, for which there is currently no cure. The cause of AD is unknown; however, the toxic effects of amyloid-β (Aβ) are believed to play a role in its onset. To investigate this, we examined changes in global protein levels in a hippocampal synaptosome fraction of the Amyloid Precursor Protein swe/Presenelin 1 dE9 (APP/PS1) mouse model of AD at 6 and 12 months of age (moa). Data independent acquisition (DIA), or Sequential Window Acquisition of all THeoretical fragment-ion (SWATH), was used for a quantitative label-free proteomics analysis. We first assessed the usefulness of a recently improved directDIA workflow as an alternative to conventional DIA data analysis using a project-specific spectral library. Subsequently, we applied directDIA to the 6- and 12-moa APP/PS1 datasets and applied the Mass Spectrometry Downstream Analysis Pipeline (MS-DAP) for differential expression analysis and candidate discovery. We observed most regulation at 12-moa, in particular of proteins involved in Aβ homeostasis and microglial-dependent processes, like synaptic pruning and the immune response, such as APOE, CLU and C1QA-C. All proteomics data are available via ProteomeXchange with identifier PXD025777.

Download Full-text