Phoenix Enhancer: proteomics data mining using clustered spectra

Mapping Intimacies ◽

10.1101/846303 ◽

2019 ◽

Author(s):

Mingze Bai ◽

Chunyuan Qin ◽

Kunxian Shu ◽

Johannes Griss ◽

Yasset Perez-Riverol ◽

...

Keyword(s):

Supplementary Information ◽

Proteomics Data ◽

Web Based ◽

Tandem Mass Spectra ◽

Link Type ◽

False Discovery ◽

Cluster Data ◽

Public Data ◽

Visualization Tools ◽

Individual Dataset

AbstractMotivationSpectrum clustering has been used to enhance proteomics data analysis: some originally unidentified spectra can potentially be identified and individual peptides can be evaluated to find potential mis-identifications by using clusters of identified spectra. The Phoenix Enhancer provides an infrastructure to analyze tandem mass spectra and the corresponding peptides in the context of previously identified public data. Based on PRIDE Cluster data and a newly developed pipeline, four functionalities are provided: i) evaluate the original peptide identifications in an individual dataset, to find low confidence peptide spectrum matches (PSMs) which could correspond to mis-identifications; ii) provide confidence scores for all originally identified PSMs, to help users evaluate their quality (complementary to getting a global false discovery rate); iii) identify potential new PSMs for originally unidentified spectra; and iv) provide a collection of browsing and visualization tools to analyze and export the results. In addition to the web based service, the code is open-source and easy to re-deploy on local computers using Docker containers.AvailabilityThe service of Phoenix Enhancer is available at http://enhancer.ncpsb.org. All source code is freely available in GitHub (https://github.com/phoenix-cluster/) and can be deployed in the Cloud and HPC [email protected] informationSupplementary data are available online.

Download Full-text

Quickomics: exploring omics data in an intuitive, interactive and informative manner

10.1101/2021.01.19.427296 ◽

2021 ◽

Author(s):

Benbo Gao ◽

Jing Zhu ◽

Soumya Negi ◽

Xinmin Zhang ◽

Stefka Gyoneva ◽

...

Keyword(s):

Modular Design ◽

Functional Module ◽

Supplementary Information ◽

Data Sets ◽

Omics Data ◽

Proteomics Data ◽

Primary Analysis ◽

Link Type ◽

R Shiny ◽

Advanced Analysis

AbstractSummaryWe developed Quickomics, a feature-rich R Shiny-powered tool to enable biologists to fully explore complex omics data and perform advanced analysis in an easy-to-use interactive interface. It covers a broad range of secondary and tertiary analytical tasks after primary analysis of omics data is completed. Each functional module is equipped with customized configurations and generates both interactive and publication-ready high-resolution plots to uncover biological insights from data. The modular design makes the tool extensible with ease.AvailabilityResearchers can experience the functionalities with their own data or demo RNA-Seq and proteomics data sets by using the app hosted at http://quickomics.bxgenomics.com and following the tutorial, https://bit.ly/3rXIyhL. The source code under GPLv3 license is provided at https://github.com/interactivereport/[email protected], [email protected] informationSupplementary materials are available at https://bit.ly/37HP17g.

Download Full-text

PyRice: a Python package for querying Oryza Sativa databases

10.1101/2020.04.20.049742 ◽

2020 ◽

Author(s):

Quan Do ◽

Ho Bich Hai ◽

Pierre Larmande

Keyword(s):

Oryza Sativa ◽

Heterogeneous Data ◽

Supplementary Information ◽

Web Based ◽

Domain Experts ◽

Link Type ◽

Heterogeneous Data Sources ◽

Query System ◽

Gene Information ◽

Python Package

AbstractSummaryCurrently, gene information available for Oryza sativa species is located in various online heterogeneous data sources. Moreover, methods of access are also diverse, mostly web-based and sometimes query APIs, which might not always be straightforward for domain experts. The challenge is to collect information quickly from these applications and combine it logically, to facilitate scientific research. We developed a Python package named PyRice, a unified programming API to access all supported databases at the same time with consistent output. PyRice design is modular and implements a smart query system which fits the computing resources to optimize the query speed. As a result, PyRice is easy to use and produces intuitive results.Availability and implementationhttps://github.com/SouthGreenPlatform/PyRiceDocumentationhttps://[email protected] informationMITSupplementary informationSupplementary data are available online.

Download Full-text

Proteomics identifies signal peptide features determining the substrate specificity in human Sec62/Sec63-dependent ER protein import

10.1101/867762 ◽

2019 ◽

Author(s):

Stefan Schorr ◽

Duy Nguyen ◽

Sarah Haßdenteufel ◽

Nagarjuna Nagaraj ◽

Adolfo Cavalié ◽

...

Keyword(s):

Signal Peptide ◽

Mammalian Cells ◽

Protein Import ◽

Supplementary Information ◽

Signal Peptides ◽

Proteomics Data ◽

Amino Terminal ◽

Link Type ◽

Sec61 Channel

AbstractIn mammalian cells one-third of all polypeptides are integrated into the membrane or translocated into the lumen of the endoplasmic reticulum (ER) via the Sec61-channel. While the Sec61-complex facilitates ER-import of most precursor polypeptides, the Sec61-associated Sec62/Sec63-complex supports ER-import in a substrate-specific manner. So far, mainly posttranslationally imported precursors and the two cotranslationally imported precursors of ERj3 and prion protein were found to depend on the Sec62/Sec63-complex in vitro. Therefore, we determined the rules for engagement of Sec62/Sec63 in ER-import in intact human cells using a recently established unbiased proteomics approach. In addition to confirming ERj3, we identified twenty-two novel Sec62/Sec63-substrates under these in vivo-like conditions. As a common feature, those previously unknown substrates share signal peptides with comparatively longer but less hydrophobic H-region and lower C-region polarity. Further analyses with four substrates, and ERj3 in particular, revealed the combination of a slowly-gating signal peptide and a downstream translocation-disruptive positively charged cluster of amino acid residues as decisive for the Sec62-/Sec63-requirement. In the case of ERj3, these features were found to be responsible for an additional BiP-requirement and to correlate with sensitivity towards the Sec61-channel inhibitor CAM741. Thus, the human Sec62/Sec63-complex may support Sec61-channel opening for precursor polypeptides with slowly-gating signal peptides by direct interaction with the cytosolic amino-terminal peptide of Sec61α or via recruitment of BiP and its interaction with the ER-lumenal loop 7 of Sec61α. These novel insights into the mechanism of human ER protein import contribute to our understanding of the etiology of SEC63-linked Polycystic Liver Disease.DatabasesThe mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (http://www.ebi.ac.uk/pride/archive/projects/Identifiers) with the dataset identifiers: PXD008178, PXD011993, and PXD012078. Supplementary information was deposited at Mendeley Data under the DOI:10.17632/6s5hn73jcv.1 (http://dx.doi.or/10.17632/6s5hn73jcv.1).

Download Full-text

CellProfiler Analyst: interactive data exploration, analysis, and classification of large biological image sets

10.1101/057976 ◽

2016 ◽

Cited By ~ 1

Author(s):

D. Dao ◽

A. N. Fraser ◽

J. Hung ◽

V. Ljosa ◽

S. Singh ◽

...

Keyword(s):

Supervised Machine Learning ◽

Supplementary Information ◽

Link Type ◽

Learning Capabilities ◽

Visualization Tools ◽

Interactive Data ◽

Supplementary Text ◽

Microsoft Windows ◽

Benchmarking Performance

AbstractSummaryCellProfiler Analyst allows the exploration and visualization of image-based data, together with the classification of complex biological phenotypes, via an interactive user interface designed for biologists and data scientists. CellProfiler Analyst 2.0, completely rewritten in Python, builds on these features and adds enhanced supervised machine learning capabilities (in Classifier), as well as visualization tools to overview an experiment (Plate Viewer and Image Gallery).AvailabilityCellProfiler Analyst 2.0 is free and open source, available at http://www.cellprofiler.org/releases and from GitHub (https://github.com/CellProfiler/CellProfiler-Analyst) under the BSD license. It is available as a packaged application for Mac OS X and Microsoft Windows and can be compiled for Linux. We implemented an automatic build process that supports nightly updates and regular release cycles for the [email protected] informationSupplementary Text 1: Manual to CellProfiler Analyst; updated versions are available at CellProfiler.org/CPASupplementary Data 1: Benchmarking performance of classifiers in CPA 2.0 versus CPA 1.0

Download Full-text

LiveStories

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2018.409 ◽

2018 ◽

Vol 106 (2) ◽

Author(s):

Kelli Yakabu ◽

Andrea Ball

Keyword(s):

Data Visualization ◽

Web Based ◽

Public Data ◽

Visualization Tools ◽

Interactive Data ◽

Data Library ◽

Drag And Drop

LiveStories is a web-based storytelling platform that is equipped with interactive data visualization tools, drag-and-drop publishing, and its own public data library.

Download Full-text

PatternLab for proteomics: a tool for differential shotgun proteomics

BMC Bioinformatics ◽

10.1186/1471-2105-9-316 ◽

2008 ◽

Vol 9 (1) ◽

Cited By ~ 109

Author(s):

Paulo C Carvalho ◽

Juliana SG Fischer ◽

Emily I Chen ◽

John R Yates ◽

Valmir C Barbosa

Keyword(s):

Experimental Data ◽

Shotgun Proteomics ◽

Support Vector ◽

Proteomics Data ◽

Tandem Mass Spectra ◽

New Methods ◽

False Discovery ◽

Counting Data ◽

Natural Support ◽

Open Issues

Abstract Background A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. Results To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen et al. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies. Conclusion PatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at http://pcarvalho.com/patternlab.

Download Full-text

Controlling false discoveries in Bayesian gene networks with lasso regression p-values

10.1101/288217 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lingfei Wang ◽

Tom Michoel

Keyword(s):

Bayesian Networks ◽

Gene Networks ◽

Network Inference ◽

Supplementary Information ◽

Lasso Regression ◽

Link Type ◽

False Discovery ◽

Systematic Biases ◽

Empirical Tests ◽

False Discoveries

AbstractMotivationBayesian networks can represent directed gene regulations and therefore are favored over co-expression networks. However, hardly any Bayesian network study concerns the false discovery control (FDC) of network edges, leading to low accuracies due to systematic biases from inconsistent false discovery levels in the same study.ResultsWe design four empirical tests to examine the FDC of Bayesian networks from three p-value based lasso regression variable selections — two existing and one we originate. Our method, lassopv, computes p-values for the critical regularization strength at which a predictor starts to contribute to lasso regression. Using null and Geuvadis datasets, we find that lassopv obtains optimal FDC in Bayesian gene networks, whilst existing methods have defective p-values. The FDC concept and tests extend to most network inference scenarios and will guide the design and improvement of new and existing methods. Our novel variable selection method with lasso regression also allows FDC on other datasets and questions, even beyond network inference and computational biology.AvailabilityLassopv is implemented in R and freely available at https://github.com/lingfeiwang/lassopv and https://cran.r-project.org/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

PBrowse: A web-based platform for real-time collaborative exploration of genomic data

10.1101/068049 ◽

2016 ◽

Author(s):

Peter S. Szot ◽

Andrian Yang ◽

Xin Wang ◽

Uwe Röhm ◽

Koon Ho Wong ◽

...

Keyword(s):

Real Time ◽

Genomic Data ◽

File Sharing ◽

Genome Browser ◽

Human Interaction ◽

Supplementary Information ◽

Web Based ◽

Link Type ◽

Multiple Users ◽

Wide Range

ABSTRACTSummaryThe central task of a genome browser is to enable easy visual exploration of large genomic data to gain biological insight. Most existing genome browsers were designed for data exploration by individual users, while a few allow some limited forms of collaboration among multiple users, such as file sharing and wiki-style collaborative editing of gene annotations. Our work’s premise is that allowing sharing of genome browser views instantaneously in real-time enables the exchange of ideas and insight in a collaborative project, thus harnessing the wisdom of the crowd. PBrowse is a parallel-access real-time collaborative web-based genome browser that provides both an integrated, real-time collaborative platform and a comprehensive file sharing system. PBrowse also allows real-time track comment and has integrated group chat to facilitate interactive discussion among multiple users. Through the Distributed Annotation Server protocol, PBrowse can easily access a wide range of publicly available genomic data, such as the ENCODE data sets. We argue that PBrowse, with the re-designed user management, data management and novel collaborative layer based on Biodalliance, represents a paradigm shift from seeing genome browser merely as a tool of data visualisation to a tool that enables real-time human-human interaction and knowledge exchange in a collaborative setting.AvailabilityPBrowse is available at http://pbrowse.victorchang.edu.au, and its source code is available via the open source BSD 3 license at http://github.com/VCCRI/[email protected] InformationSupplementary video demonstrating collaborative feature of pbrowse is available in https://www.youtube.com/watch?v=ROvKXZoXiIc.

Download Full-text

LipidFinder 2.0: advanced informatics pipeline for lipidomics discovery applications

10.1101/2020.08.16.250878 ◽

2020 ◽

Author(s):

Jorge Alvarez-Jarreta ◽

Patricia R.S. Rodrigues ◽

Eoin Fahy ◽

Anne O’Connor ◽

Anna Price ◽

...

Keyword(s):

Open Access ◽

Real Data ◽

Supplementary Information ◽

Supplementary Data ◽

Scatter Plot ◽

Lipid Profiling ◽

Link Type ◽

False Discovery ◽

Assess Data Quality ◽

Lipid Structures

AbstractWe present LipidFinder 2.0, incorporating four new modules that apply artefact filters, remove lipid and contaminant stacks, in-source fragments and salt clusters, and a new isotope deletion method which is significantly more sensitive than available open-access alternatives. We also incorporate a novel false discovery rate (FDR) method, utilizing a target-decoy strategy, which allows users to assess data quality. A renewed lipid profiling method is introduced which searches three different databases from LIPID MAPS and returns bulk lipid structures only, and a lipid category scatter plot with color blind friendly pallet. An API interface with XCMS Online is made available on LipidFinder’s online version. We show using real data that LipidFinder 2.0 provides a significant improvement over non-lipid metabolite filtering and lipid profiling, compared to available tools.AvailabilityLipidFinder 2.0 is freely available at https://github.com/ODonnell-Lipidomics/LipidFinder and http://lipidmaps.org/resources/tools/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification

10.1101/2020.06.06.131482 ◽

2020 ◽

Author(s):

Clémentine Decamps ◽

Alexis Arnaud ◽

Florent Petitprez ◽

Mira Ayadi ◽

Aurélia Baurès ◽

...

Keyword(s):

Computational Methods ◽

Tumor Heterogeneity ◽

Supplementary Information ◽

Web Based ◽

Additional Information ◽

New Methods ◽

Link Type ◽

Benchmark Datasets ◽

Supplementary Material ◽

And Performance

AbstractMotivationQuantification of tumor heterogeneity is essential to better understand cancer progressionand to adapt therapeutic treatments to patient specificities.ResultsWe present DECONbench, a web-based application to benchmark computational methods dedicated to quantify of cell-type heterogeneity in cancer. DECONbench includes benchmark datasets, computational methods and performance evaluation. It allows submission of new methods.Availability and implementationDECONbench is hosted on the open source codalab competition platform. It is freely available at: https://competitions.codalab.org/competitions/23660.Supplementary informationAdditional information is available online and on our website: https://cancer-heterogeneity.github.io/deconbench.html.

Download Full-text