scholarly journals Robust summarization and inference in proteome-wide label-free quantification

2019 ◽  
Author(s):  
Adriaan Sticker ◽  
Ludger Goeminne ◽  
Lennart Martens ◽  
Lieven Clement

AbstractLabel-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis due to peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outper-form summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialised end-user, and do not provide protein summaries, which are important for visualisation or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared to the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRob-Sum, which estimates MSqRob’s model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob’s superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarising peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored towards their specific applications.

2020 ◽  
Vol 19 (7) ◽  
pp. 1209-1219
Author(s):  
Adriaan Sticker ◽  
Ludger Goeminne ◽  
Lennart Martens ◽  
Lieven Clement

Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.


2020 ◽  
Vol 34 (04) ◽  
pp. 3641-3648 ◽  
Author(s):  
Eli Chien ◽  
Antonia Tulino ◽  
Jaime Llorca

The geometric block model is a recently proposed generative model for random graphs that is able to capture the inherent geometric properties of many community detection problems, providing more accurate characterizations of practical community structures compared with the popular stochastic block model. Galhotra et al. recently proposed a motif-counting algorithm for unsupervised community detection in the geometric block model that is proved to be near-optimal. They also characterized the regimes of the model parameters for which the proposed algorithm can achieve exact recovery. In this work, we initiate the study of active learning in the geometric block model. That is, we are interested in the problem of exactly recovering the community structure of random graphs following the geometric block model under arbitrary model parameters, by possibly querying the labels of a limited number of chosen nodes. We propose two active learning algorithms that combine the use of motif-counting with two different label query policies. Our main contribution is to show that sampling the labels of a vanishingly small fraction of nodes (sub-linear in the total number of nodes) is sufficient to achieve exact recovery in the regimes under which the state-of-the-art unsupervised method fails. We validate the superior performance of our algorithms via numerical simulations on both real and synthetic datasets.


2021 ◽  
Author(s):  
Wenkai Han ◽  
Yuqi Cheng ◽  
Jiayang Chen ◽  
Huawen Zhong ◽  
Zhihang Hu ◽  
...  

Single-cell RNA-sequencing (scRNA-seq) has become a powerful tool to reveal the complex biological diversity and heterogeneity among cell populations. However, the technical noise and bias of the technology still have negative impacts on the downstream analysis. Here, we present a self-supervised Contrastive LEArning framework for scRNA-seq (CLEAR) profile representation and the downstream analysis. CLEAR overcomes the heterogeneity of the experimental data with a specifically designed representation learning task and thus can handle batch effects and dropout events. In the task, the deep learning model learns to pull together the representations of similar cells while pushing apart distinct cells, without manual labeling. It achieves superior performance on a broad range of fundamental tasks, including clustering, visualization, dropout correction, batch effect removal, and pseudo-time inference. The proposed method successfully identifies and illustrates inflammatory-related mechanisms in a COVID-19 disease study with 43,695 single cells from peripheral blood mononuclear cells. Further experiments to process a million-scale single-cell dataset demonstrate the scalability of CLEAR. This scalable method generates effective scRNA-seq data representation while eliminating technical noise, and it will serve as a general computational framework for single-cell data analysis.


2019 ◽  
Vol 11 (19) ◽  
pp. 2288 ◽  
Author(s):  
Xin Song ◽  
Xinwei Jiang ◽  
Junbin Gao ◽  
Zhihua Cai

Dimensionality Reduction (DR) models are highly useful for tackling Hyperspectral Images (HSIs) classification tasks. They mainly address two issues: the curse of dimensionality with respect to spectral features, and the limited number of labeled training samples. Among these DR techniques, the Graph-Embedding Discriminant Analysis (GEDA) framework has demonstrated its effectiveness for HSIs feature extraction. However, most of the existing GEDA-based DR methods largely rely on manually tuning the parameters so as to obtain the optimal model, which proves to be troublesome and inefficient. Motivated by the nonparametric Gaussian Process (GP) model, we propose a novel supervised DR algorithm, namely Gaussian Process Graph-based Discriminate Analysis (GPGDA). Our algorithm takes full advantage of the covariance matrix in GP to constructing the graph similarity matrix in GEDA framework. In this way, more superior performance can be provided with the model parameters tuned automatically. Experiments on three real HSIs datasets demonstrate that the proposed GPGDA outperforms some classic and state-of-the-art DR methods.


2020 ◽  
Vol 227 ◽  
pp. 02012
Author(s):  
R. S. Sidhu ◽  
R. J. Chen ◽  
Yu. A Litvinov ◽  
Y. H. Zhang ◽  

The re-analysis of experimental data on mass measurements of ura- nium fission products obtained at the ESR in 2002 is discussed. State-of-the-art data analysis procedures developed for such measurements are employed.


Materials ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4171
Author(s):  
Rabia Ikram ◽  
Badrul Mohamed Jan ◽  
Akhmal Sidek ◽  
George Kenanakis

An important aspect of hydrocarbon drilling is the usage of drilling fluids, which remove drill cuttings and stabilize the wellbore to provide better filtration. To stabilize these properties, several additives are used in drilling fluids that provide satisfactory rheological and filtration properties. However, commonly used additives are environmentally hazardous; when drilling fluids are disposed after drilling operations, they are discarded with the drill cuttings and additives into water sources and causes unwanted pollution. Therefore, these additives should be substituted with additives that are environmental friendly and provide superior performance. In this regard, biodegradable additives are required for future research. This review investigates the role of various bio-wastes as potential additives to be used in water-based drilling fluids. Furthermore, utilization of these waste-derived nanomaterials is summarized for rheology and lubricity tests. Finally, sufficient rheological and filtration examinations were carried out on water-based drilling fluids to evaluate the effect of wastes as additives on the performance of drilling fluids.


2019 ◽  
Vol 24 (3) ◽  
pp. 213-223 ◽  
Author(s):  
Raimo Franke ◽  
Bettina Hinkelmann ◽  
Verena Fetz ◽  
Theresia Stradal ◽  
Florenz Sasse ◽  
...  

Mode of action (MoA) identification of bioactive compounds is very often a challenging and time-consuming task. We used a label-free kinetic profiling method based on an impedance readout to monitor the time-dependent cellular response profiles for the interaction of bioactive natural products and other small molecules with mammalian cells. Such approaches have been rarely used so far due to the lack of data mining tools to properly capture the characteristics of the impedance curves. We developed a data analysis pipeline for the xCELLigence Real-Time Cell Analysis detection platform to process the data, assess and score their reproducibility, and provide rank-based MoA predictions for a reference set of 60 bioactive compounds. The method can reveal additional, previously unknown targets, as exemplified by the identification of tubulin-destabilizing activities of the RNA synthesis inhibitor actinomycin D and the effects on DNA replication of vioprolide A. The data analysis pipeline is based on the statistical programming language R and is available to the scientific community through a GitHub repository.


2021 ◽  
pp. 000370282110345
Author(s):  
Tatu Rojalin ◽  
Dexter Antonio ◽  
Ambarish Kulkarni ◽  
Randy P. Carney

Surface-enhanced Raman scattering (SERS) is a powerful technique for sensitive label-free analysis of chemical and biological samples. While much recent work has established sophisticated automation routines using machine learning and related artificial intelligence methods, these efforts have largely focused on downstream processing (e.g., classification tasks) of previously collected data. While fully automated analysis pipelines are desirable, current progress is limited by cumbersome and manually intensive sample preparation and data collection steps. Specifically, a typical lab-scale SERS experiment requires the user to evaluate the quality and reliability of the measurement (i.e., the spectra) as the data are being collected. This need for expert user-intuition is a major bottleneck that limits applicability of SERS-based diagnostics for point-of-care clinical applications, where trained spectroscopists are likely unavailable. While application-agnostic numerical approaches (e.g., signal-to-noise thresholding) are useful, there is an urgent need to develop algorithms that leverage expert user intuition and domain knowledge to simplify and accelerate data collection steps. To address this challenge, in this work, we introduce a machine learning-assisted method at the acquisition stage. We tested six common algorithms to measure best performance in the context of spectral quality judgment. For adoption into future automation platforms, we developed an open-source python package tailored for rapid expert user annotation to train machine learning algorithms. We expect that this new approach to use machine learning to assist in data acquisition can serve as a useful building block for point-of-care SERS diagnostic platforms.


2021 ◽  
Author(s):  
Mikhail Sviridov ◽  
◽  
Anton Mosin ◽  
Sergey Lebedev ◽  
Ron Thompson ◽  
...  

While proactive geosteering, special inversion algorithms are used to process the readings of logging-while-drilling resistivity tools in real-time and provide oil field operators with formation models to make informed steering decisions. Currently, there is no industry standard for inversion deliverables and corresponding quality indicators because major tool vendors develop their own device-specific algorithms and use them internally. This paper presents the first implementation of vendor-neutral inversion approach applicable for any induction resistivity tool and enabling operators to standardize the efficiency of various geosteering services. The necessity of such universal inversion approach was inspired by the activity of LWD Deep Azimuthal Resistivity Services Standardization Workgroup initiated by SPWLA Resistivity Special Interest Group in 2016. Proposed inversion algorithm utilizes a 1D layer-cake formation model and is performed interval-by-interval. The following model parameters can be determined: horizontal and vertical resistivities of each layer, positions of layer boundaries, and formation dip. The inversion can support arbitrary deep azimuthal induction resistivity tool with coaxial, tilted, or orthogonal transmitting and receiving antennas. The inversion is purely data-driven; it works in automatic mode and provides fully unbiased results obtained from tool readings only. The algorithm is based on statistical reversible-jump Markov chain Monte Carlo method that does not require any predefined assumptions about the formation structure and enables searching of models explaining the data even if the number of layers in the model is unknown. To globalize search, the algorithm runs several Markov chains capable of exchanging their states between one another to move from the vicinity of local minimum to more perspective domain of model parameter space. While execution, the inversion keeps all models it is dealing with to estimate the resolution accuracy of formation parameters and generate several quality indicators. Eventually, these indicators are delivered together with recovered resistivity models to help operators with the evaluation of inversion results reliability. To ensure high performance of the inversion, a fast and accurate semi-analytical forward solver is employed to compute required responses of a tool with specific geometry and their derivatives with respect to any parameter of multi-layered model. Moreover, the reliance on the simultaneous evolution of multiple Markov chains makes the algorithm suitable for parallel execution that significantly decreases the computational time. Application of the proposed inversion is shown on a series of synthetic examples and field case studies such as navigating the well along the reservoir roof or near the oil-water-contact in oil sands. Inversion results for all scenarios confirm that the proposed algorithm can successfully evaluate formation model complexity, recover model parameters, and quantify their uncertainty within a reasonable computational time. Presented vendor-neutral stochastic approach to data processing leads to the standardization of the inversion output including the resistivity model and its quality indicators that helps operators to better understand capabilities of tools from different vendors and eventually make more confident geosteering decisions.


Sign in / Sign up

Export Citation Format

Share Document