Robust summarization and inference in proteome-wide label-free quantification

Mapping Intimacies ◽

10.1101/668863 ◽

2019 ◽

Cited By ~ 1

Author(s):

Adriaan Sticker ◽

Ludger Goeminne ◽

Lennart Martens ◽

Lieven Clement

Keyword(s):

Data Analysis ◽

State Of The Art ◽

Downstream Processing ◽

Superior Performance ◽

Model Complexity ◽

Model Parameters ◽

Label Free ◽

Specific Effects ◽

Modular Analysis ◽

Downstream Analysis

AbstractLabel-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis due to peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outper-form summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialised end-user, and do not provide protein summaries, which are important for visualisation or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared to the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRob-Sum, which estimates MSqRob’s model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob’s superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarising peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored towards their specific applications.

Download Full-text

Robust Summarization and Inference in Proteome-wide Label-free Quantification

Molecular & Cellular Proteomics ◽

10.1074/mcp.ra119.001624 ◽

2020 ◽

Vol 19 (7) ◽

pp. 1209-1219

Author(s):

Adriaan Sticker ◽

Ludger Goeminne ◽

Lennart Martens ◽

Lieven Clement

Keyword(s):

Data Analysis ◽

State Of The Art ◽

Downstream Processing ◽

Superior Performance ◽

Model Complexity ◽

Model Parameters ◽

Label Free ◽

Specific Effects ◽

Modular Analysis ◽

Downstream Analysis

Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.

Download Full-text

Active Learning in the Geometric Block Model

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5772 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3641-3648 ◽

Cited By ~ 1

Author(s):

Eli Chien ◽

Antonia Tulino ◽

Jaime Llorca

Keyword(s):

Active Learning ◽

Community Detection ◽

Random Graphs ◽

State Of The Art ◽

Superior Performance ◽

Model Parameters ◽

Block Model ◽

Stochastic Block Model ◽

Synthetic Datasets ◽

Community Detection Problems

The geometric block model is a recently proposed generative model for random graphs that is able to capture the inherent geometric properties of many community detection problems, providing more accurate characterizations of practical community structures compared with the popular stochastic block model. Galhotra et al. recently proposed a motif-counting algorithm for unsupervised community detection in the geometric block model that is proved to be near-optimal. They also characterized the regimes of the model parameters for which the proposed algorithm can achieve exact recovery. In this work, we initiate the study of active learning in the geometric block model. That is, we are interested in the problem of exactly recovering the community structure of random graphs following the geometric block model under arbitrary model parameters, by possibly querying the labels of a limited number of chosen nodes. We propose two active learning algorithms that combine the use of motif-counting with two different label query policies. Our main contribution is to show that sampling the labels of a vanishingly small fraction of nodes (sub-linear in the total number of nodes) is sufficient to achieve exact recovery in the regimes under which the state-of-the-art unsupervised method fails. We validate the superior performance of our algorithms via numerical simulations on both real and synthetic datasets.

Download Full-text

Self-supervised contrastive learning for integrative single cell RNA-seq data analysis

10.1101/2021.07.26.453730 ◽

2021 ◽

Author(s):

Wenkai Han ◽

Yuqi Cheng ◽

Jiayang Chen ◽

Huawen Zhong ◽

Zhihang Hu ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Biological Diversity ◽

Mononuclear Cells ◽

Single Cells ◽

Data Representation ◽

Superior Performance ◽

Peripheral Blood Mononuclear ◽

Technical Noise ◽

Downstream Analysis

Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool to reveal the complex biological diversity and heterogeneity among cell populations. However, the technical noise and bias of the technology still have negative impacts on the downstream analysis. Here, we present a self-supervised Contrastive LEArning framework for scRNA-seq (CLEAR) profile representation and the downstream analysis. CLEAR overcomes the heterogeneity of the experimental data with a specifically designed representation learning task and thus can handle batch effects and dropout events. In the task, the deep learning model learns to pull together the representations of similar cells while pushing apart distinct cells, without manual labeling. It achieves superior performance on a broad range of fundamental tasks, including clustering, visualization, dropout correction, batch effect removal, and pseudo-time inference. The proposed method successfully identifies and illustrates inflammatory-related mechanisms in a COVID-19 disease study with 43,695 single cells from peripheral blood mononuclear cells. Further experiments to process a million-scale single-cell dataset demonstrate the scalability of CLEAR. This scalable method generates effective scRNA-seq data representation while eliminating technical noise, and it will serve as a general computational framework for single-cell data analysis.

Download Full-text

Gaussian Process Graph-Based Discriminant Analysis for Hyperspectral Images Classification

Remote Sensing ◽

10.3390/rs11192288 ◽

2019 ◽

Vol 11 (19) ◽

pp. 2288 ◽

Cited By ~ 1

Author(s):

Xin Song ◽

Xinwei Jiang ◽

Junbin Gao ◽

Zhihua Cai

Keyword(s):

Discriminant Analysis ◽

Gaussian Process ◽

State Of The Art ◽

Hyperspectral Images ◽

Superior Performance ◽

Model Parameters ◽

Optimal Model ◽

Training Samples ◽

Classification Tasks ◽

Gp Model

Dimensionality Reduction (DR) models are highly useful for tackling Hyperspectral Images (HSIs) classification tasks. They mainly address two issues: the curse of dimensionality with respect to spectral features, and the limited number of labeled training samples. Among these DR techniques, the Graph-Embedding Discriminant Analysis (GEDA) framework has demonstrated its effectiveness for HSIs feature extraction. However, most of the existing GEDA-based DR methods largely rely on manually tuning the parameters so as to obtain the optimal model, which proves to be troublesome and inefficient. Motivated by the nonparametric Gaussian Process (GP) model, we propose a novel supervised DR algorithm, namely Gaussian Process Graph-based Discriminate Analysis (GPGDA). Our algorithm takes full advantage of the covariance matrix in GP to constructing the graph similarity matrix in GEDA framework. In this way, more superior performance can be provided with the model parameters tuned automatically. Experiments on three real HSIs datasets demonstrate that the proposed GPGDA outperforms some classic and state-of-the-art DR methods.

Download Full-text

Revisiting the Analysis of the Isochronous Mass Measurements of Uranium Fission Fragments at the ESR

EPJ Web of Conferences ◽

10.1051/epjconf/202022702012 ◽

2020 ◽

Vol 227 ◽

pp. 02012

Author(s):

R. S. Sidhu ◽

R. J. Chen ◽

Yu. A Litvinov ◽

Y. H. Zhang ◽

Keyword(s):

Experimental Data ◽

Data Analysis ◽

State Of The Art ◽

Fission Products ◽

Mass Measurements ◽

Fission Fragments ◽

Uranium Fission

The re-analysis of experimental data on mass measurements of ura- nium fission products obtained at the ESR in 2002 is discussed. State-of-the-art data analysis procedures developed for such measurements are employed.

Download Full-text

Utilization of Eco-Friendly Waste Generated Nanomaterials in Water-Based Drilling Fluids; State of the Art Review

Materials ◽

10.3390/ma14154171 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4171

Author(s):

Rabia Ikram ◽

Badrul Mohamed Jan ◽

Akhmal Sidek ◽

George Kenanakis

Keyword(s):

State Of The Art ◽

Superior Performance ◽

Future Research ◽

Drilling Fluids ◽

Drill Cuttings ◽

Environmental Friendly ◽

Water Based ◽

Drilling Operations ◽

Filtration Properties

An important aspect of hydrocarbon drilling is the usage of drilling fluids, which remove drill cuttings and stabilize the wellbore to provide better filtration. To stabilize these properties, several additives are used in drilling fluids that provide satisfactory rheological and filtration properties. However, commonly used additives are environmentally hazardous; when drilling fluids are disposed after drilling operations, they are discarded with the drill cuttings and additives into water sources and causes unwanted pollution. Therefore, these additives should be substituted with additives that are environmental friendly and provide superior performance. In this regard, biodegradable additives are required for future research. This review investigates the role of various bio-wastes as potential additives to be used in water-based drilling fluids. Furthermore, utilization of these waste-derived nanomaterials is summarized for rheology and lubricity tests. Finally, sufficient rheological and filtration examinations were carried out on water-based drilling fluids to evaluate the effect of wastes as additives on the performance of drilling fluids.

Download Full-text

xCELLanalyzer: A Framework for the Analysis of Cellular Impedance Measurements for Mode of Action Discovery

SLAS DISCOVERY Advancing Life Sciences ◽

10.1177/2472555218819459 ◽

2019 ◽

Vol 24 (3) ◽

pp. 213-223 ◽

Cited By ~ 1

Author(s):

Raimo Franke ◽

Bettina Hinkelmann ◽

Verena Fetz ◽

Theresia Stradal ◽

Florenz Sasse ◽

...

Keyword(s):

Data Analysis ◽

Bioactive Compounds ◽

Mode Of Action ◽

Mammalian Cells ◽

Cellular Response ◽

Label Free ◽

Synthesis Inhibitor ◽

Analysis Pipeline ◽

Bioactive Natural Products ◽

Data Analysis Pipeline

Mode of action (MoA) identification of bioactive compounds is very often a challenging and time-consuming task. We used a label-free kinetic profiling method based on an impedance readout to monitor the time-dependent cellular response profiles for the interaction of bioactive natural products and other small molecules with mammalian cells. Such approaches have been rarely used so far due to the lack of data mining tools to properly capture the characteristics of the impedance curves. We developed a data analysis pipeline for the xCELLigence Real-Time Cell Analysis detection platform to process the data, assess and score their reproducibility, and provide rank-based MoA predictions for a reference set of 60 bioactive compounds. The method can reveal additional, previously unknown targets, as exemplified by the identification of tubulin-destabilizing activities of the RNA synthesis inhibitor actinomycin D and the effects on DNA replication of vioprolide A. The data analysis pipeline is based on the statistical programming language R and is available to the scientific community through a GitHub repository.

Download Full-text

Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis

BMC Bioinformatics ◽

10.1186/1471-2105-12-358 ◽

2011 ◽

Vol 12 (1) ◽

pp. 358 ◽

Cited By ~ 4

Author(s):

Rudolf Frühwirth ◽

D R Mani ◽

Saumyadipta Pyne

Keyword(s):

Data Analysis ◽

Label Free

Download Full-text

Machine Learning-Assisted Sampling of SERS Substrates Improves Data Collection Efficiency

Applied Spectroscopy ◽

10.1177/00037028211034543 ◽

2021 ◽

pp. 000370282110345

Author(s):

Tatu Rojalin ◽

Dexter Antonio ◽

Ambarish Kulkarni ◽

Randy P. Carney

Keyword(s):

Machine Learning ◽

Data Collection ◽

Domain Knowledge ◽

Collection Efficiency ◽

Point Of Care ◽

Automated Analysis ◽

Downstream Processing ◽

Machine Learning Algorithms ◽

Label Free ◽

Expert User

Surface-enhanced Raman scattering (SERS) is a powerful technique for sensitive label-free analysis of chemical and biological samples. While much recent work has established sophisticated automation routines using machine learning and related artificial intelligence methods, these efforts have largely focused on downstream processing (e.g., classification tasks) of previously collected data. While fully automated analysis pipelines are desirable, current progress is limited by cumbersome and manually intensive sample preparation and data collection steps. Specifically, a typical lab-scale SERS experiment requires the user to evaluate the quality and reliability of the measurement (i.e., the spectra) as the data are being collected. This need for expert user-intuition is a major bottleneck that limits applicability of SERS-based diagnostics for point-of-care clinical applications, where trained spectroscopists are likely unavailable. While application-agnostic numerical approaches (e.g., signal-to-noise thresholding) are useful, there is an urgent need to develop algorithms that leverage expert user intuition and domain knowledge to simplify and accelerate data collection steps. To address this challenge, in this work, we introduce a machine learning-assisted method at the acquisition stage. We tested six common algorithms to measure best performance in the context of spectral quality judgment. For adoption into future automation platforms, we developed an open-source python package tailored for rapid expert user annotation to train machine learning algorithms. We expect that this new approach to use machine learning to assist in data acquisition can serve as a useful building block for point-of-care SERS diagnostic platforms.

Download Full-text

VENDOR-NEUTRAL STOCHASTIC INVERSION OF LWD DEEP AZIMUTHAL RESISTIVITY DATA AS A STEP TOWARD EFFICIENCY STANDARDIZATION OF GEOSTEERING SERVICES

10.30632/spwla-2021-0103 ◽

2021 ◽

Author(s):

Mikhail Sviridov ◽

◽

Anton Mosin ◽

Sergey Lebedev ◽

Ron Thompson ◽

...

Keyword(s):

Markov Chains ◽

Quality Indicators ◽

Parallel Execution ◽

Oil Field ◽

Model Complexity ◽

Computational Time ◽

Model Parameters ◽

Formation Model ◽

Inversion Algorithm ◽

Azimuthal Resistivity

While proactive geosteering, special inversion algorithms are used to process the readings of logging-while-drilling resistivity tools in real-time and provide oil field operators with formation models to make informed steering decisions. Currently, there is no industry standard for inversion deliverables and corresponding quality indicators because major tool vendors develop their own device-specific algorithms and use them internally. This paper presents the first implementation of vendor-neutral inversion approach applicable for any induction resistivity tool and enabling operators to standardize the efficiency of various geosteering services. The necessity of such universal inversion approach was inspired by the activity of LWD Deep Azimuthal Resistivity Services Standardization Workgroup initiated by SPWLA Resistivity Special Interest Group in 2016. Proposed inversion algorithm utilizes a 1D layer-cake formation model and is performed interval-by-interval. The following model parameters can be determined: horizontal and vertical resistivities of each layer, positions of layer boundaries, and formation dip. The inversion can support arbitrary deep azimuthal induction resistivity tool with coaxial, tilted, or orthogonal transmitting and receiving antennas. The inversion is purely data-driven; it works in automatic mode and provides fully unbiased results obtained from tool readings only. The algorithm is based on statistical reversible-jump Markov chain Monte Carlo method that does not require any predefined assumptions about the formation structure and enables searching of models explaining the data even if the number of layers in the model is unknown. To globalize search, the algorithm runs several Markov chains capable of exchanging their states between one another to move from the vicinity of local minimum to more perspective domain of model parameter space. While execution, the inversion keeps all models it is dealing with to estimate the resolution accuracy of formation parameters and generate several quality indicators. Eventually, these indicators are delivered together with recovered resistivity models to help operators with the evaluation of inversion results reliability. To ensure high performance of the inversion, a fast and accurate semi-analytical forward solver is employed to compute required responses of a tool with specific geometry and their derivatives with respect to any parameter of multi-layered model. Moreover, the reliance on the simultaneous evolution of multiple Markov chains makes the algorithm suitable for parallel execution that significantly decreases the computational time. Application of the proposed inversion is shown on a series of synthetic examples and field case studies such as navigating the well along the reservoir roof or near the oil-water-contact in oil sands. Inversion results for all scenarios confirm that the proposed algorithm can successfully evaluate formation model complexity, recover model parameters, and quantify their uncertainty within a reasonable computational time. Presented vendor-neutral stochastic approach to data processing leads to the standardization of the inversion output including the resistivity model and its quality indicators that helps operators to better understand capabilities of tools from different vendors and eventually make more confident geosteering decisions.

Download Full-text