scholarly journals AutoTuner: High fidelity, robust, and rapid parameter selection for metabolomics data processing

2019 ◽  
Author(s):  
Craig McLean ◽  
Elizabeth B. Kujawinski

AbstractUntargeted metabolomics experiments provide a snapshot of cellular metabolism, but remain challenging to interpret due to the computational complexity involved in data processing and analysis. Prior to any interpretation, raw data must be processed to remove noise and to align mass-spectral peaks across samples. This step requires selection of dataset-specific parameters, as erroneous parameters can result in noise inflation. While several algorithms exist to automate parameter selection, each depends on gradient descent optimization functions. In contrast, our new parameter optimization algorithm, AutoTuner, obtains parameter estimates from raw data in a single step as opposed to many iterations. Here, we tested the accuracy and the run time of AutoTuner in comparison to isotopologue parameter optimization (IPO), the most commonly-used parameter selection tool, and compared the resulting parameters’ influence on the quality of feature tables after processing. We performed a Monte Carlo experiment to test the robustness of AutoTuner parameter selection, and found that AutoTuner generated similar parameter estimates from random subsets of samples. We conclude that AutoTuner is a desirable alternative to existing tools, because it is scalable, highly robust, and very fast (∼100-1000X speed improvement from other algorithms going from days to minutes). AutoTuner is freely available as an R package through BioConductor.

2021 ◽  
Author(s):  
David A Hughes ◽  
Kurt A Taylor ◽  
Nancy McBride ◽  
Matthew A Lee ◽  
Dan Mason ◽  
...  

Motivation Metabolomics is an increasingly common part of health research and there is need for pre-analytical data processing. Researchers typically need to characterize the data and to exclude errors within the context of the intended analysis. While some pre-processing steps are common, there is currently a lack of standardization and reporting transparency for these procedures. Results Here we introduce metaboprep, a standardized data processing workflow to extract and characterize high quality metabolomics data sets. The package extracts data from pre-formed worksheets, provides summary statistics and enables the user to select samples and metabolites for their analysis based on a set of quality metrics. A report summarizing quality metrics and the influence of available batch variables on the data is generated for the purpose of open disclosure. Where possible, we provide users flexibility in defining their own selection thresholds. Availability and implementation metaboprep is an open-source R package available at https://github.com/MRCIEU/metaboprep


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mihir Mongia ◽  
Hosein Mohimani

AbstractVarious studies have shown associations between molecular features and phenotypes of biological samples. These studies, however, focus on a single phenotype per study and are not applicable to repository scale metabolomics data. Here we report MetSummarizer, a method for predicting (i) the biological phenotypes of environmental and host-oriented samples, and (ii) the raw ingredient composition of complex mixtures. We show that the aggregation of various metabolomic datasets can improve the accuracy of predictions. Since these datasets have been collected using different standards at various laboratories, in order to get unbiased results it is crucial to detect and discard standard-specific features during the classification step. We further report high accuracy in prediction of the raw ingredient composition of complex foods from the Global Foodomics Project.


1969 ◽  
Vol 6 (01) ◽  
pp. 48-57
Author(s):  
Edward S. Karlson ◽  
John J. Davis

An operational system for providing processed maintenance and repair information for vessels is described. Content includes description of a detailed coding system for reducing raw data to composite code numbers suitable for automatic data processing. Objectives of the system and constraints thereon are discussed. The Marad data system has been operational for four years. Scope of data processed and its utilization are presented. Seven current studies concerning vessels as a whole and specific shipboard equipments are included.


2017 ◽  
Vol 13 (S336) ◽  
pp. 443-444
Author(s):  
I. D. Litovchenko ◽  
S. F. Likhachev ◽  
V. I. Kostenko ◽  
I. A. Girin ◽  
V. A. Ladygin ◽  
...  

AbstractWe discuss specific aspects of space-ground VLBI (SVLBI) data processing of spectral line experiments (H2O & OH masers) in Radioastron project. In order to meet all technical requirements of the Radioastron mission a new software FX correlator (ASCFX) and the unique data archive which stores raw data from all VLBI stations for all experiments of the project were developed in Astro Space Center. Currently all maser observations conducted in Radioastron project were correlated using the ASCFX correlator. Positive detections on the space-ground baselines were found in 38 sessions out of 144 (detection rate of about 27%). Finally, we presented upper limits on the angular size of the most compact spots observed in two galactic H2O masers, W3OH(H2O) and OH043.8-0.1.


Author(s):  
Zachary R. McCaw ◽  
Hanna Julienne ◽  
Hugues Aschard

AbstractAlthough missing data are prevalent in applications, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that, when the underlying distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than state of the art imputation followed by standard GMM. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty even when the generative distribution is not a GMM. This assessment may be used to identify unassignable observations. MGMM is available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM.


Author(s):  
J. W. Li ◽  
X. Q. Han ◽  
J. W. Jiang ◽  
Y. Hu ◽  
L. Liu

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.


Sign in / Sign up

Export Citation Format

Share Document