Part–Whole Relations: New Insights about the Dynamics of Complex Geochemical Riverine Systems

Caterina Gozzi; Roberta Sauro Graziano; Antonella Buccianti

doi:10.3390/min10060501

Part–Whole Relations: New Insights about the Dynamics of Complex Geochemical Riverine Systems

Minerals ◽

10.3390/min10060501 ◽

2020 ◽

Vol 10 (6) ◽

pp. 501

Author(s):

Caterina Gozzi ◽

Roberta Sauro Graziano ◽

Antonella Buccianti

Keyword(s):

Compositional Data ◽

Chemical Elements ◽

Chemical Constituents ◽

Principal Component ◽

Cumulative Distribution ◽

Compositional Data Analysis ◽

Scaling Properties ◽

Global Biogeochemical Cycles ◽

Riverine Systems ◽

Log Ratio

Nature is often characterized by systems that are far from thermodynamic equilibrium, and rivers are not an exception for the Earth’s critical zone. When the chemical composition of stream waters is investigated, it emerges that riverine systems behave as complex systems. This means that the compositions have properties that depend on the integrity of the whole (i.e., the composition with all the chemical constituents), properties that arise thanks to the innumerable nonlinear interactions between the elements of the composition. The presence of interconnections indicates that the properties of the whole cannot be fully understood by examining the parts of the system in isolation. In this work, we propose investigating the complexity of riverine chemistry by using the CoDA (Compositional Data Analysis) methodology and the performance of the perturbation operator in the simplex geometry. With riverine bicarbonate considered as a key component of regional and global biogeochemical cycles and Ca2+ considered as mostly related to the weathering of carbonatic rocks, perturbations were calculated for subsequent couples of compositions after ranking the data for increasing values of the log-ratio ln(Ca2+/HCO3−). Numerical values were analyzed by using robust principal component analysis and non-parametric correlations between compositional parts (heat map) associated with distributional and multifractal methods. The results indicate that HCO3−, Ca2+, Mg2+ and Sr2+ are more resilient, thus contributing to compositional changes for all the values of ln(Ca2+/HCO3−) to a lesser degree with respect to the other chemical elements/components. Moreover, the complementary cumulative distribution function of all the sequences tracing the compositional change and the nonlinear relationship between the Q-th moment versus the scaling exponents for each of them indicate the presence of multifractal variability, thus revealing scaling properties of the fluctuations.

Download Full-text

Statistical Analysis of Chemical Element Compositions in Food Science: Problems and Possibilities

Molecules ◽

10.3390/molecules26195752 ◽

2021 ◽

Vol 26 (19) ◽

pp. 5752

Author(s):

Matthias Templ ◽

Barbara Templ

Keyword(s):

Statistical Analysis ◽

Missing Values ◽

Compositional Data ◽

Chemical Elements ◽

Chemical Components ◽

Compositional Data Analysis ◽

Research Fields ◽

Log Ratio ◽

Processing Steps

In recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.

Download Full-text

Variable selection in microbiome compositional data analysis

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa029 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 2

Author(s):

Antoni Susin ◽

Yiwen Wang ◽

Kim-Anh Lê Cao ◽

M Luz Calle

Keyword(s):

Data Analysis ◽

Variable Selection ◽

Compositional Data ◽

Penalized Regression ◽

Compositional Data Analysis ◽

Forward Selection ◽

Computationally Efficient ◽

Parsimonious Model ◽

Microbiome Data ◽

Log Ratio

Abstract Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.

Download Full-text

Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa040 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 1

Author(s):

David R Lovell ◽

Xin-Yi Chua ◽

Annette McGrath

Keyword(s):

Count Data ◽

Compositional Data ◽

Compositional Data Analysis ◽

Ratio Analysis ◽

Sequencing Technology ◽

Scale Invariant ◽

Measurement And Analysis ◽

Discrete Nature ◽

The Impact ◽

Log Ratio

Abstract Thanks to sequencing technology, modern molecular bioscience datasets are often compositions of counts, e.g. counts of amplicons, mRNAs, etc. While there is growing appreciation that compositional data need special analysis and interpretation, less well understood is the discrete nature of these count compositions (or, as we call them, lattice compositions) and the impact this has on statistical analysis, particularly log-ratio analysis (LRA) of pairwise association. While LRA methods are scale-invariant, count compositional data are not; consequently, the conclusions we draw from LRA of lattice compositions depend on the scale of counts involved. We know that additive variation affects the relative abundance of small counts more than large counts; here we show that additive (quantization) variation comes from the discrete nature of count data itself, as well as (biological) variation in the system under study and (technical) variation from measurement and analysis processes. Variation due to quantization is inevitable, but its impact on conclusions depends on the underlying scale and distribution of counts. We illustrate the different distributions of real molecular bioscience data from different experimental settings to show why it is vital to understand the distributional characteristics of count data before applying and drawing conclusions from compositional data analysis methods.

Download Full-text

Performance Assessment in Water Polo Using Compositional Data Analysis

Journal of Human Kinetics ◽

10.1515/hukin-2016-0043 ◽

2016 ◽

Vol 54 (1) ◽

pp. 143-151 ◽

Cited By ~ 4

Author(s):

Enrique García Ordóñez ◽

María del Carmen Iglesias Pérez ◽

Carlos Touriño González

Keyword(s):

Performance Indicators ◽

Cross Validation ◽

Compositional Data ◽

Compositional Data Analysis ◽

Original Sample ◽

Water Polo ◽

Discriminant Analyses ◽

Multivariate Discriminant ◽

Log Ratio ◽

Match Score

AbstractThe aim of the present study was to identify groups of offensive performance indicators which best discriminated between a match score (favourable, balanced or unfavourable) in water polo. The sample comprised 88 regular season games (2011-2014) from the Spanish Professional Water Polo League. The offensive performance indicators were clustered in five groups: Attacks in relation to the different playing situations; Shots in relation to the different playing situations; Attacks outcome; Origin of shots; Technical execution of shots. The variables of each group had a constant sum which equalled 100%. The data were compositional data, therefore the variables were changed by means of the additive log-ratio (alr) transformation. Multivariate discriminant analyses to compare the match scores were calculated using the transformed variables. With regard to the percentage of right classification, the results showed the group that discriminated the most between the match scores was “Attacks outcome” (60.4% for the original sample and 52.2% for cross-validation). The performance indicators that discriminated the most between the match scores in games with penalties were goals (structure coefficient (SC) = .761), counterattack shots (SC = .541) and counterattacks (SC = .481). In matches without penalties, goals were the primary discriminating factor (SC = .576). This approach provides a new tool to compare the importance of the offensive performance groups and their effect on the match score discrimination.

Download Full-text

Assessing Global Covid-19 Cases Data through Compositional Data Analysis(CoDa)

10.1101/2020.12.17.20248424 ◽

2020 ◽

Author(s):

Luis P.V. Braga ◽

Dina Feigenbaum

Keyword(s):

Data Analysis ◽

Compositional Data ◽

Compositional Data Analysis ◽

Discrete Groups ◽

Data Sets ◽

Cumulative Number ◽

Governmental Agencies ◽

Global Pandemic ◽

Number Of Patients ◽

Log Ratio

AbstractBackgroundCovid-19 cases data pose an enormous challenge to any analysis. The evaluation of such a global pandemic requires matching reports that follow different procedures and even overcoming some countries’ censorship that restricts publications.MethodsThis work proposes a methodology that could assist future studies. Compositional Data Analysis (CoDa) is proposed as the proper approach as Covid-19 cases data is compositional in nature. Under this methodology, for each country three attributes were selected: cumulative number of deaths (D); cumulative number of recovered patients(R); present number of patients (A).ResultsAfter the operation called closure, with c=1, a ternary diagram and Log-Ratio plots, as well as, compositional statistics are presented. Cluster analysis is then applied, splitting the countries into discrete groups.ConclusionsThis methodology can also be applied to other data sets such as countries, cities, provinces or districts in order to help authorities and governmental agencies to improve their actions to fight against a pandemic.

Download Full-text

A field guide for the compositional analysis of any-omics data

GigaScience ◽

10.1093/gigascience/giz107 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 22

Author(s):

Thomas P Quinn ◽

Ionas Erb ◽

Greg Gloor ◽

Cedric Notredame ◽

Mark F Richardson ◽

...

Keyword(s):

Data Analysis ◽

General Solution ◽

Compositional Data ◽

Compositional Analysis ◽

Compositional Data Analysis ◽

Nucleotide Synthesis ◽

Library Size ◽

Next Generation Sequencing Ngs ◽

Concise Guide ◽

Log Ratio

Abstract Background Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared. Results Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. Conclusions In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, “Relative to some important activity of the cell, what is changing?”

Download Full-text

On establishing ceramic chemical groups: exploring the influence of data analysis methods and the role of the elements chosen in analysis

Open Journal of Archaeometry ◽

10.4081/arc.2013.e1 ◽

2013 ◽

Vol 1 (1) ◽

pp. 1 ◽

Cited By ~ 5

Author(s):

Kostalena Michelaki ◽

Michael J. Hughes ◽

Ronald G.V. Hancock

Keyword(s):

Principal Component Analysis ◽

Data Analysis ◽

Compositional Data ◽

Principal Component ◽

Component Analysis ◽

Data Exploration ◽

Short Paper ◽

Bivariate Data ◽

Chemical Groups ◽

Log Ratio

Since the 1970s, archaeologists have increasingly depended on archaeometric rather than strictly stylistic data to explore questions of ceramic provenance and technol- ogy, and, by extension, trade, exchange, social networks and even identity. It is accepted as obvious by some archaeometrists and statisti- cians that the results of the analyses of compo- sitional data may be dependent on the format of the data used, on the data exploration method employed and, in the case of multivari- ate analyses, even on the number of elements considered. However, this is rarely articulated clearly in publications, making it less obvious to archaeologists. In this short paper, we re- examine compositional data from a collection of bricks, tiles and ceramics from Hill Hall, near Epping in Essex, England, as a case study to show how the method of data exploration used and the number of elements considered in multivariate analyses of compositional data can affect the sorting of ceramic samples into chemical groups. We compare bivariate data splitting (BDS) with principal component analysis (PCA) and centered log ratio-principal component analysis (CLR-PCA) of different unstandardized data formats [original concen- tration data and logarithmically transformed (i.e. log10 data)], using different numbers of elements. We confirm that PCA, in its various forms, is quite sensitive to the numbers and types of elements used in data analysis.

Download Full-text

The use of methods of compositional data analysis for the separation of geochemical signals in fluvial sediments

10.5194/egusphere-egu2020-21224 ◽

2020 ◽

Author(s):

Kamila Fačevicová ◽

Tomáš Matys Grygar ◽

Karel Hron ◽

Jitka Elznicová

Keyword(s):

Data Analysis ◽

Compositional Data ◽

Principal Component ◽

Ordinary Least Squares ◽

Compositional Data Analysis ◽

Fluvial Sediments ◽

Least Squares Regression ◽

Target Element ◽

Local Enrichment ◽

Regression Correlation

<p>Fluvial sediments datasets, similarly as other types of a concentration based data, are typical by their relative nature and therefore they need preprocessing or normalization prior to the main statistical analysis. In the geochemical practice, several normalization methods are used, like a simple normalization of the target element concentration with the concentration of the reference (conservative, lithogenic) one, double normalization or concentration conversion to local enrichment factor. As an alternative to these methods, the approach using the principles of compositional data analysis (CoDA) can be considered.&#160; Instead of the standard statistical analytical methods, like ordinary least squares regression, correlation of principal component analysis (PCA), applied on the raw or the target element normalized concentrations, the CoDA methods consider the relative structure of the whole dataset. CoDA together with the use of robust statistical methods, which are down weighting the influence of the outlying observations, have a potential to provide more accurate results. This property is demonstrated and discussed on the base of dataset from mapping the sediments from the Skalka Reservoir in the Oh&#345;e River, Czech Republic, and its tributaries. Mainly the performance of the robust versions of regression, correlation and principal components analysis, respecting the CoDA principles, will be presented and the way to them will be explained.&#160;</p>

Download Full-text

An application of compositional data analysis to multiomic time-series data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa079 ◽

2020 ◽

Vol 2 (4) ◽

Cited By ~ 1

Author(s):

Laura Sisk-Hackworth ◽

Scott T Kelley

Keyword(s):

Data Analysis ◽

Time Series Data ◽

Compositional Data ◽

Series Data ◽

Compositional Data Analysis ◽

Metabolomics Data ◽

Normalization Methods ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Log Ratio

Abstract Compositional data analysis (CoDA) methods have increased in popularity as a new framework for analyzing next-generation sequencing (NGS) data. CoDA methods, such as the centered log-ratio (clr) transformation, adjust for the compositional nature of NGS counts, which is not addressed by traditional normalization methods. CoDA has only been sparsely applied to NGS data generated from microbial communities or to multiple ‘omics’ datasets. In this study, we applied CoDA methods to analyze NGS and untargeted metabolomic datasets obtained from bacterial and fungal communities. Specifically, we used clr transformation to reanalyze NGS amplicon and metabolomics data from a study investigating the effects of building material type, moisture and time on microbial and metabolomic diversity. Compared to analysis of untransformed data, analysis of clr-transformed data revealed novel relationships and stronger associations between sample conditions and microbial and metabolic community profiles.

Download Full-text

A Theoretical Concept of Compositional Nutrient Diagnosis

Journal of the American Society for Horticultural Science ◽

10.21273/jashs.117.2.239 ◽

1992 ◽

Vol 117 (2) ◽

pp. 239-242 ◽

Cited By ~ 103

Author(s):

L.E. Parent ◽

M. Dafir

Keyword(s):

Compositional Data ◽

Nutrient Balance ◽

Principal Component ◽

Theoretical Concept ◽

Integrated System ◽

Compositional Data Analysis ◽

Diagnostic Systems ◽

Nutrient Interactions ◽

Geometric Means ◽

Curvature Problem

The premises underlying univariate (CVA = critical value approach) and bivariate (DRIS = diagnosis and recommendation integrated system) diagnostic systems were reexamined with regard to compositional data analysis (CDA). CDA recognizes a structure of dependence among plant nutrients, the bounded sum constraint to one (the whole composition equals 100% or 1), and removes the curvature problem carried by crude components and by dual ratios or logratios when treated in isolation. Linearization by “rowcentered logrationing” of nutrient fractions shows great potential for carrying multivariate diagnosis and principal component analysis on nutrient data. Compositional nutrient diagnosis (CND) is supported by the theory of CDA. CND is the multivariate expansion of CVA and DRIS and is fully compatible with PCA. CND takes all possible nutrient interactions into account. CND nutrient indices are composed of two separate functions, one considering differences between nutrient levels, another examining differences between nutrient balances (as defined by nutrient geometric means), of individual and target specimens. These functions indicate that nutrient insufficiency can be corrected by either adding a single nutrient or taking advantage of multiple nutrient interactions to improve nutrient balance as a whole. A theoretical interpretative table is presented for CND.

Download Full-text