scholarly journals Phoenix Enhancer: proteomics data mining using clustered spectra

2019 ◽  
Author(s):  
Mingze Bai ◽  
Chunyuan Qin ◽  
Kunxian Shu ◽  
Johannes Griss ◽  
Yasset Perez-Riverol ◽  
...  

AbstractMotivationSpectrum clustering has been used to enhance proteomics data analysis: some originally unidentified spectra can potentially be identified and individual peptides can be evaluated to find potential mis-identifications by using clusters of identified spectra. The Phoenix Enhancer provides an infrastructure to analyze tandem mass spectra and the corresponding peptides in the context of previously identified public data. Based on PRIDE Cluster data and a newly developed pipeline, four functionalities are provided: i) evaluate the original peptide identifications in an individual dataset, to find low confidence peptide spectrum matches (PSMs) which could correspond to mis-identifications; ii) provide confidence scores for all originally identified PSMs, to help users evaluate their quality (complementary to getting a global false discovery rate); iii) identify potential new PSMs for originally unidentified spectra; and iv) provide a collection of browsing and visualization tools to analyze and export the results. In addition to the web based service, the code is open-source and easy to re-deploy on local computers using Docker containers.AvailabilityThe service of Phoenix Enhancer is available at http://enhancer.ncpsb.org. All source code is freely available in GitHub (https://github.com/phoenix-cluster/) and can be deployed in the Cloud and HPC [email protected] informationSupplementary data are available online.

2021 ◽  
Author(s):  
Benbo Gao ◽  
Jing Zhu ◽  
Soumya Negi ◽  
Xinmin Zhang ◽  
Stefka Gyoneva ◽  
...  

AbstractSummaryWe developed Quickomics, a feature-rich R Shiny-powered tool to enable biologists to fully explore complex omics data and perform advanced analysis in an easy-to-use interactive interface. It covers a broad range of secondary and tertiary analytical tasks after primary analysis of omics data is completed. Each functional module is equipped with customized configurations and generates both interactive and publication-ready high-resolution plots to uncover biological insights from data. The modular design makes the tool extensible with ease.AvailabilityResearchers can experience the functionalities with their own data or demo RNA-Seq and proteomics data sets by using the app hosted at http://quickomics.bxgenomics.com and following the tutorial, https://bit.ly/3rXIyhL. The source code under GPLv3 license is provided at https://github.com/interactivereport/[email protected], [email protected] informationSupplementary materials are available at https://bit.ly/37HP17g.


2020 ◽  
Author(s):  
Quan Do ◽  
Ho Bich Hai ◽  
Pierre Larmande

AbstractSummaryCurrently, gene information available for Oryza sativa species is located in various online heterogeneous data sources. Moreover, methods of access are also diverse, mostly web-based and sometimes query APIs, which might not always be straightforward for domain experts. The challenge is to collect information quickly from these applications and combine it logically, to facilitate scientific research. We developed a Python package named PyRice, a unified programming API to access all supported databases at the same time with consistent output. PyRice design is modular and implements a smart query system which fits the computing resources to optimize the query speed. As a result, PyRice is easy to use and produces intuitive results.Availability and implementationhttps://github.com/SouthGreenPlatform/PyRiceDocumentationhttps://[email protected] informationMITSupplementary informationSupplementary data are available online.


2019 ◽  
Author(s):  
Stefan Schorr ◽  
Duy Nguyen ◽  
Sarah Haßdenteufel ◽  
Nagarjuna Nagaraj ◽  
Adolfo Cavalié ◽  
...  

AbstractIn mammalian cells one-third of all polypeptides are integrated into the membrane or translocated into the lumen of the endoplasmic reticulum (ER) via the Sec61-channel. While the Sec61-complex facilitates ER-import of most precursor polypeptides, the Sec61-associated Sec62/Sec63-complex supports ER-import in a substrate-specific manner. So far, mainly posttranslationally imported precursors and the two cotranslationally imported precursors of ERj3 and prion protein were found to depend on the Sec62/Sec63-complex in vitro. Therefore, we determined the rules for engagement of Sec62/Sec63 in ER-import in intact human cells using a recently established unbiased proteomics approach. In addition to confirming ERj3, we identified twenty-two novel Sec62/Sec63-substrates under these in vivo-like conditions. As a common feature, those previously unknown substrates share signal peptides with comparatively longer but less hydrophobic H-region and lower C-region polarity. Further analyses with four substrates, and ERj3 in particular, revealed the combination of a slowly-gating signal peptide and a downstream translocation-disruptive positively charged cluster of amino acid residues as decisive for the Sec62-/Sec63-requirement. In the case of ERj3, these features were found to be responsible for an additional BiP-requirement and to correlate with sensitivity towards the Sec61-channel inhibitor CAM741. Thus, the human Sec62/Sec63-complex may support Sec61-channel opening for precursor polypeptides with slowly-gating signal peptides by direct interaction with the cytosolic amino-terminal peptide of Sec61α or via recruitment of BiP and its interaction with the ER-lumenal loop 7 of Sec61α. These novel insights into the mechanism of human ER protein import contribute to our understanding of the etiology of SEC63-linked Polycystic Liver Disease.DatabasesThe mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (http://www.ebi.ac.uk/pride/archive/projects/Identifiers) with the dataset identifiers: PXD008178, PXD011993, and PXD012078. Supplementary information was deposited at Mendeley Data under the DOI:10.17632/6s5hn73jcv.1 (http://dx.doi.or/10.17632/6s5hn73jcv.1).


2016 ◽  
Author(s):  
D. Dao ◽  
A. N. Fraser ◽  
J. Hung ◽  
V. Ljosa ◽  
S. Singh ◽  
...  

AbstractSummaryCellProfiler Analyst allows the exploration and visualization of image-based data, together with the classification of complex biological phenotypes, via an interactive user interface designed for biologists and data scientists. CellProfiler Analyst 2.0, completely rewritten in Python, builds on these features and adds enhanced supervised machine learning capabilities (in Classifier), as well as visualization tools to overview an experiment (Plate Viewer and Image Gallery).AvailabilityCellProfiler Analyst 2.0 is free and open source, available at http://www.cellprofiler.org/releases and from GitHub (https://github.com/CellProfiler/CellProfiler-Analyst) under the BSD license. It is available as a packaged application for Mac OS X and Microsoft Windows and can be compiled for Linux. We implemented an automatic build process that supports nightly updates and regular release cycles for the [email protected] informationSupplementary Text 1: Manual to CellProfiler Analyst; updated versions are available at CellProfiler.org/CPASupplementary Data 1: Benchmarking performance of classifiers in CPA 2.0 versus CPA 1.0


2018 ◽  
Vol 106 (2) ◽  
Author(s):  
Kelli Yakabu ◽  
Andrea Ball

LiveStories is a web-based storytelling platform that is equipped with interactive data visualization tools, drag-and-drop publishing, and its own public data library.


2008 ◽  
Vol 9 (1) ◽  
Author(s):  
Paulo C Carvalho ◽  
Juliana SG Fischer ◽  
Emily I Chen ◽  
John R Yates ◽  
Valmir C Barbosa

Abstract Background A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. Results To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen et al. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies. Conclusion PatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at http://pcarvalho.com/patternlab.


2018 ◽  
Author(s):  
Lingfei Wang ◽  
Tom Michoel

AbstractMotivationBayesian networks can represent directed gene regulations and therefore are favored over co-expression networks. However, hardly any Bayesian network study concerns the false discovery control (FDC) of network edges, leading to low accuracies due to systematic biases from inconsistent false discovery levels in the same study.ResultsWe design four empirical tests to examine the FDC of Bayesian networks from three p-value based lasso regression variable selections — two existing and one we originate. Our method, lassopv, computes p-values for the critical regularization strength at which a predictor starts to contribute to lasso regression. Using null and Geuvadis datasets, we find that lassopv obtains optimal FDC in Bayesian gene networks, whilst existing methods have defective p-values. The FDC concept and tests extend to most network inference scenarios and will guide the design and improvement of new and existing methods. Our novel variable selection method with lasso regression also allows FDC on other datasets and questions, even beyond network inference and computational biology.AvailabilityLassopv is implemented in R and freely available at https://github.com/lingfeiwang/lassopv and https://cran.r-project.org/[email protected] informationSupplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Peter S. Szot ◽  
Andrian Yang ◽  
Xin Wang ◽  
Uwe Röhm ◽  
Koon Ho Wong ◽  
...  

ABSTRACTSummaryThe central task of a genome browser is to enable easy visual exploration of large genomic data to gain biological insight. Most existing genome browsers were designed for data exploration by individual users, while a few allow some limited forms of collaboration among multiple users, such as file sharing and wiki-style collaborative editing of gene annotations. Our work’s premise is that allowing sharing of genome browser views instantaneously in real-time enables the exchange of ideas and insight in a collaborative project, thus harnessing the wisdom of the crowd. PBrowse is a parallel-access real-time collaborative web-based genome browser that provides both an integrated, real-time collaborative platform and a comprehensive file sharing system. PBrowse also allows real-time track comment and has integrated group chat to facilitate interactive discussion among multiple users. Through the Distributed Annotation Server protocol, PBrowse can easily access a wide range of publicly available genomic data, such as the ENCODE data sets. We argue that PBrowse, with the re-designed user management, data management and novel collaborative layer based on Biodalliance, represents a paradigm shift from seeing genome browser merely as a tool of data visualisation to a tool that enables real-time human-human interaction and knowledge exchange in a collaborative setting.AvailabilityPBrowse is available at http://pbrowse.victorchang.edu.au, and its source code is available via the open source BSD 3 license at http://github.com/VCCRI/[email protected] InformationSupplementary video demonstrating collaborative feature of pbrowse is available in https://www.youtube.com/watch?v=ROvKXZoXiIc.


2020 ◽  
Author(s):  
Jorge Alvarez-Jarreta ◽  
Patricia R.S. Rodrigues ◽  
Eoin Fahy ◽  
Anne O’Connor ◽  
Anna Price ◽  
...  

AbstractWe present LipidFinder 2.0, incorporating four new modules that apply artefact filters, remove lipid and contaminant stacks, in-source fragments and salt clusters, and a new isotope deletion method which is significantly more sensitive than available open-access alternatives. We also incorporate a novel false discovery rate (FDR) method, utilizing a target-decoy strategy, which allows users to assess data quality. A renewed lipid profiling method is introduced which searches three different databases from LIPID MAPS and returns bulk lipid structures only, and a lipid category scatter plot with color blind friendly pallet. An API interface with XCMS Online is made available on LipidFinder’s online version. We show using real data that LipidFinder 2.0 provides a significant improvement over non-lipid metabolite filtering and lipid profiling, compared to available tools.AvailabilityLipidFinder 2.0 is freely available at https://github.com/ODonnell-Lipidomics/LipidFinder and http://lipidmaps.org/resources/tools/[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Clémentine Decamps ◽  
Alexis Arnaud ◽  
Florent Petitprez ◽  
Mira Ayadi ◽  
Aurélia Baurès ◽  
...  

AbstractMotivationQuantification of tumor heterogeneity is essential to better understand cancer progressionand to adapt therapeutic treatments to patient specificities.ResultsWe present DECONbench, a web-based application to benchmark computational methods dedicated to quantify of cell-type heterogeneity in cancer. DECONbench includes benchmark datasets, computational methods and performance evaluation. It allows submission of new methods.Availability and implementationDECONbench is hosted on the open source codalab competition platform. It is freely available at: https://competitions.codalab.org/competitions/23660.Supplementary informationAdditional information is available online and on our website: https://cancer-heterogeneity.github.io/deconbench.html.


Sign in / Sign up

Export Citation Format

Share Document