AutoTuner: High fidelity, robust, and rapid parameter selection for metabolomics data processing

Mapping Intimacies ◽

10.1101/812370 ◽

2019 ◽

Cited By ~ 3

Author(s):

Craig McLean ◽

Elizabeth B. Kujawinski

Keyword(s):

Data Processing ◽

Parameter Optimization ◽

R Package ◽

Parameter Selection ◽

Single Step ◽

Monte Carlo Experiment ◽

Parameter Estimates ◽

Metabolomics Data ◽

Raw Data ◽

Mass Spectral

AbstractUntargeted metabolomics experiments provide a snapshot of cellular metabolism, but remain challenging to interpret due to the computational complexity involved in data processing and analysis. Prior to any interpretation, raw data must be processed to remove noise and to align mass-spectral peaks across samples. This step requires selection of dataset-specific parameters, as erroneous parameters can result in noise inflation. While several algorithms exist to automate parameter selection, each depends on gradient descent optimization functions. In contrast, our new parameter optimization algorithm, AutoTuner, obtains parameter estimates from raw data in a single step as opposed to many iterations. Here, we tested the accuracy and the run time of AutoTuner in comparison to isotopologue parameter optimization (IPO), the most commonly-used parameter selection tool, and compared the resulting parameters’ influence on the quality of feature tables after processing. We performed a Monte Carlo experiment to test the robustness of AutoTuner parameter selection, and found that AutoTuner generated similar parameter estimates from random subsets of samples. We conclude that AutoTuner is a desirable alternative to existing tools, because it is scalable, highly robust, and very fast (∼100-1000X speed improvement from other algorithms going from days to minutes). AutoTuner is freely available as an R package through BioConductor.

Download Full-text

AutoTuner: High Fidelity and Robust Parameter Selection for Metabolomics Data Processing

Analytical Chemistry ◽

10.1021/acs.analchem.9b04804 ◽

2020 ◽

Vol 92 (8) ◽

pp. 5724-5732 ◽

Cited By ~ 4

Author(s):

Craig McLean ◽

Elizabeth B. Kujawinski

Keyword(s):

Data Processing ◽

Parameter Selection ◽

High Fidelity ◽

Metabolomics Data ◽

Robust Parameter ◽

Selection For

Download Full-text

metaboprep: an R package for pre-analysis data description and processing.

10.1101/2021.07.07.451488 ◽

2021 ◽

Author(s):

David A Hughes ◽

Kurt A Taylor ◽

Nancy McBride ◽

Matthew A Lee ◽

Dan Mason ◽

...

Keyword(s):

Data Processing ◽

Analytical Data ◽

Analysis Data ◽

R Package ◽

Quality Metrics ◽

Data Sets ◽

Summary Statistics ◽

Metabolomics Data ◽

Open Disclosure ◽

Processing Steps

Motivation Metabolomics is an increasingly common part of health research and there is need for pre-analytical data processing. Researchers typically need to characterize the data and to exclude errors within the context of the intended analysis. While some pre-processing steps are common, there is currently a lack of standardization and reporting transparency for these procedures. Results Here we introduce metaboprep, a standardized data processing workflow to extract and characterize high quality metabolomics data sets. The package extracts data from pre-formed worksheets, provides summary statistics and enables the user to select samples and metabolites for their analysis based on a set of quality metrics. A report summarizing quality metrics and the influence of available batch variables on the data is generated for the purpose of open disclosure. Where possible, we provide users flexibility in defining their own selection thresholds. Availability and implementation metaboprep is an open-source R package available at https://github.com/MRCIEU/metaboprep

Download Full-text

Repository scale classification and decomposition of tandem mass spectral data

Scientific Reports ◽

10.1038/s41598-021-87796-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mihir Mongia ◽

Hosein Mohimani

Keyword(s):

High Accuracy ◽

Mass Spectral Data ◽

Metabolomics Data ◽

Mass Spectral ◽

Molecular Features ◽

Oriented Samples ◽

Ingredient Composition ◽

Single Phenotype ◽

Tandem Mass Spectral Data ◽

Scale Classification

AbstractVarious studies have shown associations between molecular features and phenotypes of biological samples. These studies, however, focus on a single phenotype per study and are not applicable to repository scale metabolomics data. Here we report MetSummarizer, a method for predicting (i) the biological phenotypes of environmental and host-oriented samples, and (ii) the raw ingredient composition of complex mixtures. We show that the aggregation of various metabolomic datasets can improve the accuracy of predictions. Since these datasets have been collected using different standards at various laboratories, in order to get unbiased results it is crucial to detect and discard standard-specific features during the classification step. We further report high accuracy in prediction of the raw ingredient composition of complex foods from the Global Foodomics Project.

Download Full-text

Raw Data Processing Techniques for Material Classification of Objects in Dual Energy X-ray Baggage Inspection Systems

Radiation Physics and Chemistry ◽

10.1016/j.radphyschem.2021.109512 ◽

2021 ◽

pp. 109512

Author(s):

R. Kayalvizhi ◽

Amit kumar ◽

S. Malarvizhi ◽

Anita Topkar ◽

P. Vijayakumar

Keyword(s):

Data Processing ◽

Dual Energy ◽

Raw Data ◽

X Ray ◽

Material Classification ◽

Inspection Systems ◽

Processing Techniques

Download Full-text

The Maritime Administration Maintenance and Repair Data Processing and Evaluation System

Marine Technology and SNAME News ◽

10.5957/mt1.1969.6.1.48 ◽

1969 ◽

Vol 6 (01) ◽

pp. 48-57

Author(s):

Edward S. Karlson ◽

John J. Davis

Keyword(s):

Data Processing ◽

Evaluation System ◽

Data System ◽

Automatic Data Processing ◽

Coding System ◽

Raw Data ◽

Automatic Data ◽

Maintenance And Repair ◽

Operational System

An operational system for providing processed maintenance and repair information for vessels is described. Content includes description of a detailed coding system for reducing raw data to composite code numbers suitable for automatic data processing. Objectives of the system and constraints thereon are discussed. The Marad data system has been operational for four years. Scope of data processed and its utilization are presented. Seven current studies concerning vessels as a whole and specific shipboard equipments are included.

Download Full-text

Metabolomics Data Processing and Data Analysis—Current Best Practices

10.3390/books978-3-0365-1195-5 ◽

2021 ◽

Keyword(s):

Data Analysis ◽

Best Practices ◽

Data Processing ◽

Metabolomics Data

Download Full-text

Peculiarities of Maser Data Correlation / Postcorrelation in Radioastron Mission

Proceedings of the International Astronomical Union ◽

10.1017/s1743921317011383 ◽

2017 ◽

Vol 13 (S336) ◽

pp. 443-444

Author(s):

I. D. Litovchenko ◽

S. F. Likhachev ◽

V. I. Kostenko ◽

I. A. Girin ◽

V. A. Ladygin ◽

...

Keyword(s):

Spectral Line ◽

Data Processing ◽

Detection Rate ◽

Angular Size ◽

Data Archive ◽

Data Correlation ◽

Raw Data ◽

Unique Data ◽

Technical Requirements ◽

Upper Limits

AbstractWe discuss specific aspects of space-ground VLBI (SVLBI) data processing of spectral line experiments (H2O & OH masers) in Radioastron project. In order to meet all technical requirements of the Radioastron mission a new software FX correlator (ASCFX) and the unique data archive which stores raw data from all VLBI stations for all experiments of the project were developed in Astro Space Center. Currently all maser observations conducted in Radioastron project were correlated using the ASCFX correlator. Positive detections on the space-ground baselines were found in 38 sessions out of 144 (detection rate of about 27%). Finally, we presented upper limits on the angular size of the most compact spots observed in two galactic H2O masers, W3OH(H2O) and OH043.8-0.1.

Download Full-text

Corrigendum to “Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection” [ACA 1029, (2018) 50–57]

Analytica Chimica Acta ◽

10.1016/j.aca.2018.10.029 ◽

2018 ◽

Vol 1044 ◽

pp. 199

Author(s):

Zhucui Li ◽

Yan Lu ◽

Yufeng Guo ◽

Haijie Cao ◽

Qinhong Wang ◽

...

Keyword(s):

Data Processing ◽

Feature Detection ◽

Comprehensive Evaluation ◽

Untargeted Metabolomics ◽

Metabolomics Data ◽

Marker Selection ◽

Data Processing Software ◽

Processing Software

Download Full-text

MGMM: An R Package for fitting Gaussian Mixture Models on Incomplete Data

10.1101/2019.12.20.884551 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zachary R. McCaw ◽

Hanna Julienne ◽

Hugues Aschard

Keyword(s):

Missing Data ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Model Fitting ◽

Simulated Data ◽

R Package ◽

Gaussian Mixture ◽

Parameter Estimates ◽

Cluster Assignment ◽

Underlying Distribution

AbstractAlthough missing data are prevalent in applications, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that, when the underlying distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than state of the art imputation followed by standard GMM. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty even when the generative distribution is not a GMM. This assessment may be used to identify unassignable observations. MGMM is available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM.

Download Full-text

AN EFFICIENT CLUSTERING METHOD FOR DBSCAN GEOGRAPHIC SPATIO-TEMPORAL LARGE DATA WITH IMPROVED PARAMETER OPTIMIZATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w10-581-2020 ◽

2020 ◽

Vol XLII-3/W10 ◽

pp. 581-584

Author(s):

J. W. Li ◽

X. Q. Han ◽

J. W. Jiang ◽

Y. Hu ◽

L. Liu

Keyword(s):

Parameter Optimization ◽

Clustering Algorithm ◽

Optimal Solution ◽

Large Data ◽

Parameter Selection ◽

Physical Analysis ◽

Clustering Method ◽

K Value ◽

Dbscan Clustering ◽

Spatio Temporal

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.

Download Full-text