Discover True Association Rates in Multi-protein Complex Proteomics Data Sets

iSEE: Interactive SummarizedExperiment Explorer

F1000Research ◽

10.12688/f1000research.14966.1 ◽

2018 ◽

Vol 7 ◽

pp. 741 ◽

Cited By ~ 26

Author(s):

Kevin Rue-Albrecht ◽

Federico Marini ◽

Charlotte Soneson ◽

Aaron T.L. Lun

Keyword(s):

High Throughput ◽

Software Package ◽

Biological Data ◽

Data Exploration ◽

Data Sets ◽

Proteomics Data ◽

Code Tracking ◽

Dynamic Linking ◽

Interactive Visualisation ◽

Visual Interface

Data exploration is critical to the comprehension of large biological data sets generated by high-throughput assays such as sequencing. However, most existing tools for interactive visualisation are limited to specific assays or analyses. Here, we present the iSEE (Interactive SummarizedExperiment Explorer) software package, which provides a general visual interface for exploring data in a SummarizedExperiment object. iSEE is directly compatible with many existing R/Bioconductor packages for analysing high-throughput biological data, and provides useful features such as simultaneous examination of (meta)data and analysis results, dynamic linking between plots and code tracking for reproducibility. We demonstrate the utility and flexibility of iSEE by applying it to explore a range of real transcriptomics and proteomics data sets.

Download Full-text

Quickomics: exploring omics data in an intuitive, interactive and informative manner

10.1101/2021.01.19.427296 ◽

2021 ◽

Author(s):

Benbo Gao ◽

Jing Zhu ◽

Soumya Negi ◽

Xinmin Zhang ◽

Stefka Gyoneva ◽

...

Keyword(s):

Modular Design ◽

Functional Module ◽

Supplementary Information ◽

Data Sets ◽

Omics Data ◽

Proteomics Data ◽

Primary Analysis ◽

Link Type ◽

R Shiny ◽

Advanced Analysis

AbstractSummaryWe developed Quickomics, a feature-rich R Shiny-powered tool to enable biologists to fully explore complex omics data and perform advanced analysis in an easy-to-use interactive interface. It covers a broad range of secondary and tertiary analytical tasks after primary analysis of omics data is completed. Each functional module is equipped with customized configurations and generates both interactive and publication-ready high-resolution plots to uncover biological insights from data. The modular design makes the tool extensible with ease.AvailabilityResearchers can experience the functionalities with their own data or demo RNA-Seq and proteomics data sets by using the app hosted at http://quickomics.bxgenomics.com and following the tutorial, https://bit.ly/3rXIyhL. The source code under GPLv3 license is provided at https://github.com/interactivereport/[email protected], [email protected] informationSupplementary materials are available at https://bit.ly/37HP17g.

Download Full-text

Denoising large-scale biological data using network filters

10.21203/rs.3.rs-66071/v2 ◽

2021 ◽

Author(s):

Andrew J Kavran ◽

Aaron Clauset

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Learning Task ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Life History Variation ◽

Wide Range ◽

Underlying Processes

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Download Full-text

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

Cancer Informatics ◽

10.1177/1176935118771082 ◽

2018 ◽

Vol 17 ◽

pp. 117693511877108 ◽

Cited By ~ 4

Author(s):

Min Wang ◽

Steven M Kornblau ◽

Kevin R Coombes

Keyword(s):

Principal Components ◽

Myeloid Leukemia ◽

Principal Component ◽

R Package ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Data Set ◽

Apoptosis Pathway ◽

Biological Interpretation

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

Download Full-text

PACOM: A Versatile Tool for Integrating, Filtering, Visualizing, and Comparing Multiple Large Mass Spectrometry Proteomics Data Sets

Journal of Proteome Research ◽

10.1021/acs.jproteome.7b00858 ◽

2018 ◽

Vol 17 (4) ◽

pp. 1547-1558 ◽

Cited By ~ 2

Author(s):

Salvador Martínez-Bartolomé ◽

J. Alberto Medina-Aunon ◽

Miguel Ángel López-García ◽

Carmen González-Tejedo ◽

Gorka Prieto ◽

...

Keyword(s):

Mass Spectrometry ◽

Large Mass ◽

Data Sets ◽

Proteomics Data ◽

Versatile Tool

Download Full-text

TRANSPIRE: A Computational Pipeline to Elucidate Intracellular Protein Movements from Spatial Proteomics Data Sets

Journal of the American Society for Mass Spectrometry ◽

10.1021/jasms.0c00033 ◽

2020 ◽

Vol 31 (7) ◽

pp. 1422-1439

Author(s):

Michelle A. Kennedy ◽

William A. Hofstadter ◽

Ileana M. Cristea

Keyword(s):

Data Sets ◽

Computational Pipeline ◽

Proteomics Data ◽

Intracellular Protein

Download Full-text

Graphical Interpretation and Analysis of Proteins and their Ontologies (GiaPronto): A One-Click Graph Visualization Software for Proteomics Data Sets

Molecular & Cellular Proteomics ◽

10.1074/mcp.tir117.000438 ◽

2017 ◽

Vol 17 (7) ◽

pp. 1426-1431 ◽

Cited By ~ 7

Author(s):

Amber K. Weiner ◽

Simone Sidoli ◽

Sharon J. Diskin ◽

Benjamin A. Garcia

Keyword(s):

Data Sets ◽

Graph Visualization ◽

Proteomics Data ◽

Graphical Interpretation ◽

Visualization Software

Download Full-text

DreamAI: algorithm for the imputation of proteomics data

10.1101/2020.07.21.214205 ◽

2020 ◽

Author(s):

Weiping Ma ◽

Sunkyu Kim ◽

Shrabanti Chowdhury ◽

Zhi Li ◽

Mi Yang ◽

...

Keyword(s):

Missing Values ◽

Imputation Accuracy ◽

High Dimensional ◽

Data Sets ◽

Proteomics Data ◽

Dynamic Nature ◽

Substantial Fraction ◽

Imputation Methods ◽

Data Analyses ◽

Proteomics Research

AbstractDeep proteomics profiling using labelled LC-MS/MS experiments has been proven to be powerful to study complex diseases. However, due to the dynamic nature of the discovery mass spectrometry, the generated data contain a substantial fraction of missing values. This poses great challenges for data analyses, as many tools, especially those for high dimensional data, cannot deal with missing values directly. To address this problem, the NCI-CPTAC Proteogenomics DREAM Challenge was carried out to develop effective imputation algorithms for labelled LC-MS/MS proteomics data through crowd learning. The final resulting algorithm, DreamAI, is based on an ensemble of six different imputation methods. The imputation accuracy of DreamAI, as measured by correlation, is about 15%-50% greater than existing tools among less abundant proteins, which are more vulnerable to be missed in proteomics data sets. This new tool nicely enhances data analysis capabilities in proteomics research.

Download Full-text

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics

International Journal of Molecular Sciences ◽

10.3390/ijms22179650 ◽

2021 ◽

Vol 22 (17) ◽

pp. 9650

Author(s):

Miranda L. Gardner ◽

Michael A. Freitas

Keyword(s):

Multiple Imputation ◽

Missing At Random ◽

Data Sets ◽

Proteomics Data ◽

Missing Not At Random ◽

Differential Abundance ◽

Missing Value ◽

Bottom Up ◽

Missing Value Imputation ◽

Impute Data

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.

Download Full-text

MS-PyCloud: An open-source, cloud computing-based pipeline for LC-MS/MS data analysis

10.1101/320887 ◽

2018 ◽

Cited By ~ 2

Author(s):

Li Chen ◽

Bai Zhang ◽

Michael Schnaubelt ◽

Punit Shah ◽

Paul Aiyetan ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Open Source ◽

High Performance ◽

Large Scale ◽

Rapid Development ◽

Data File ◽

Data Sets ◽

Proteomics Data ◽

Amazon Web Services

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/

Download Full-text