The GMD-biplot and its application to microbiome data

Mapping Intimacies ◽

10.1101/814269 ◽

2019 ◽

Author(s):

Yue Wang ◽

Timothy W Randolph ◽

Ali Shojaie ◽

Jing Ma

Keyword(s):

Human Microbiome ◽

Simulated Data ◽

Matrix Decomposition ◽

Data Sets ◽

Original Matrix ◽

Arbitrary Matrix ◽

Generalized Matrix ◽

Eigen Decomposition ◽

Value Decomposition ◽

Microbiome Data

AbstractExploratory analysis of human microbiome data is often based on dimension-reduced graphical displays derived from similarities based on non-Euclidean distances, such as UniFrac or Bray-Curtis. However, a display of this type, often referred to as the principal coordinate analysis (PCoA) plot, does not reveal which taxa are related to the observed clustering because the configuration of samples is not based on a coordinate system in which both the samples and variables can be represented. The reason is that the PCoA plot is based on the eigen-decomposition of a similarity matrix and not the singular value decomposition (SVD) of the sample-by-abundance matrix. We propose a novel biplot that is based on an extension of the SVD, called the generalized matrix decomposition (GMD), which involves an arbitrary matrix of similarities and the original matrix of variable measures, such as taxon abundances. As in a traditional biplot, points represent the samples and arrows represent the variables. The proposed GMD-biplot is illustrated by analyzing multiple real and simulated data sets which demonstrate that the GMD-biplot provides improved clustering capability and a more meaningful relationship between the arrows and the points.

Download Full-text

The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data

mSystems ◽

10.1128/msystems.00504-19 ◽

2019 ◽

Vol 4 (6) ◽

Author(s):

Yue Wang ◽

Timothy W. Randolph ◽

Ali Shojaie ◽

Jing Ma

Keyword(s):

Coordinate System ◽

Human Microbiome ◽

Matrix Decomposition ◽

Exploratory Analysis ◽

Original Matrix ◽

Arbitrary Matrix ◽

Graphical Displays ◽

Generalized Matrix ◽

Euclidean Distances ◽

Microbiome Data

ABSTRACT Exploratory analysis of human microbiome data is often based on dimension-reduced graphical displays derived from similarities based on non-Euclidean distances, such as UniFrac or Bray-Curtis. However, a display of this type, often referred to as the principal-coordinate analysis (PCoA) plot, does not reveal which taxa are related to the observed clustering because the configuration of samples is not based on a coordinate system in which both the samples and variables can be represented. The reason is that the PCoA plot is based on the eigen-decomposition of a similarity matrix and not the singular value decomposition (SVD) of the sample-by-abundance matrix. We propose a novel biplot that is based on an extension of the SVD, called the generalized matrix decomposition biplot (GMD-biplot), which involves an arbitrary matrix of similarities and the original matrix of variable measures, such as taxon abundances. As in a traditional biplot, points represent the samples, and arrows represent the variables. The proposed GMD-biplot is illustrated by analyzing multiple real and simulated data sets which demonstrate that the GMD-biplot provides improved clustering capability and a more meaningful relationship between the arrows and points. IMPORTANCE Biplots that simultaneously display the sample clustering and the important taxa have gained popularity in the exploratory analysis of human microbiome data. Traditional biplots, assuming Euclidean distances between samples, are not appropriate for microbiome data, when non-Euclidean distances are used to characterize dissimilarities among microbial communities. Thus, incorporating information from non-Euclidean distances into a biplot becomes useful for graphical displays of microbiome data. The proposed GMD-biplot accounts for any arbitrary non-Euclidean distances and provides a robust and computationally efficient approach for graphical visualization of microbiome data. In addition, the proposed GMD-biplot displays both the samples and taxa with respect to the same coordinate system, which further allows the configuration of future samples.

Download Full-text

A Bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data

Biostatistics ◽

10.1093/biostatistics/kxz050 ◽

2019 ◽

Author(s):

Shuang Jiang ◽

Guanghua Xiao ◽

Andrew Y Koh ◽

Jiwoong Kim ◽

Qiwei Li ◽

...

Keyword(s):

Regression Model ◽

Negative Binomial ◽

Human Microbiome ◽

Simulated Data ◽

Negative Binomial Regression ◽

Bayesian Regression ◽

Negative Binomial Regression Model ◽

Disease States ◽

Binomial Regression ◽

Microbiome Data

Summary Microbiome omics approaches can reveal intriguing relationships between the human microbiome and certain disease states. Along with identification of specific bacteria taxa associated with diseases, recent scientific advancements provide mounting evidence that metabolism, genetics, and environmental factors can all modulate these microbial effects. However, the current methods for integrating microbiome data and other covariates are severely lacking. Hence, we present an integrative Bayesian zero-inflated negative binomial regression model that can both distinguish differentially abundant taxa with distinct phenotypes and quantify covariate-taxa effects. Our model demonstrates good performance using simulated data. Furthermore, we successfully integrated microbiome taxonomies and metabolomics in two real microbiome datasets to provide biologically interpretable findings. In all, we proposed a novel integrative Bayesian regression model that features bacterial differential abundance analysis and microbiome-covariate effects quantifications, which makes it suitable for general microbiome studies.

Download Full-text

Scalable learning of interpretable rules for the dynamic microbiome domain

10.1101/2020.06.25.172270 ◽

2020 ◽

Author(s):

Venkata Suhas Maringanti ◽

Vanni Bucci ◽

Georg K. Gerber

Keyword(s):

Time Series Data ◽

Human Microbiome ◽

Predictive Performance ◽

Series Data ◽

Data Sets ◽

Human Host ◽

Scalable Learning ◽

The Status ◽

Biological Insight ◽

Microbiome Data

AbstractThe microbiome, which is inherently dynamic, plays essential roles in human physiology and its disruption has been implicated in numerous human diseases. Linking dynamic changes in the microbiome to the status of the human host is an important problem, which is complicated by limitations and complexities of the data. Model interpretability is key in the microbiome field, as practitioners seek to derive testable biological hypotheses from data or develop diagnostic tests that can be understood by clinicians. Interpretable structure must take into account domainspecific information key to biologists and clinicians including evolutionary relationships (phylogeny) and dynamic behavior of the microbiome. A Bayesian model was previously developed in the field, which uses Markov Chain Monte Carlo inference to learn human interpretable rules for classifying the status of the human host based on microbiome time-series data, but that approach is not scalable to increasingly large microbiome datasets being produced. We present a new fully-differentiable model that also learns human-interpretable rules for the same classification task, but in an end-to-end gradient-descent based framework. We validate the performance of our model on human microbiome data sets and demonstrate our approach has similar predictive performance to the fully Bayesian method, while running orders-of-magnitude faster and moreover learning a larger set of rules, thus providing additional biological insight into the effects of diet and environment on the microbiome.

Download Full-text

Pathway-Based Integrative Analysis of Metabolome and Microbiome Data from Hepatocellular Carcinoma and Liver Cirrhosis Patients

Cancers ◽

10.3390/cancers12092705 ◽

2020 ◽

Vol 12 (9) ◽

pp. 2705

Author(s):

Boram Kim ◽

Eun Ju Cho ◽

Jung-Hwan Yoon ◽

Soon Sun Kim ◽

Jae Youn Cheong ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Liver Cirrhosis ◽

Human Microbiome ◽

Metabolic Reprogramming ◽

Data Sets ◽

Biological Interactions ◽

Functional Potential ◽

Metabolomic Data ◽

Microbiome Data ◽

Insight Into

Aberrations of the human microbiome are associated with diverse liver diseases, including hepatocellular carcinoma (HCC). Even if we can associate specific microbes with particular diseases, it is difficult to know mechanistically how the microbe contributes to the pathophysiology. Here, we sought to reveal the functional potential of the HCC-associated microbiome with the human metabolome which is known to play a role in connecting host phenotype to microbiome function. To utilize both microbiome and metabolomic data sets, we propose an innovative, pathway-based analysis, Hierarchical structural Component Model for pathway analysis of Microbiome and Metabolome (HisCoM-MnM), for integrating microbiome and metabolomic data. In particular, we used pathway information to integrate these two omics data sets, thus providing insight into biological interactions between different biological layers, with regard to the host’s phenotype. The application of HisCoM-MnM to data sets from 103 and 97 patients with HCC and liver cirrhosis (LC), respectively, showed that this approach could identify HCC-related pathways related to cancer metabolic reprogramming, in addition to the significant metabolome and metagenome that make up those pathways.

Download Full-text

MB-GAN: Microbiome Simulation via Generative Adversarial Network

GigaScience ◽

10.1093/gigascience/giab005 ◽

2021 ◽

Vol 10 (2) ◽

Author(s):

Ruichen Rong ◽

Shuang Jiang ◽

Lin Xu ◽

Guanghua Xiao ◽

Yang Xie ◽

...

Keyword(s):

Learning Community ◽

Association Studies ◽

Human Microbiome ◽

Simulated Data ◽

Original Data ◽

Generative Adversarial Network ◽

Methodology Development ◽

Adversarial Network ◽

Microbiome Data ◽

Analytical Tools

Abstract Background Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models. Results To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently. Conclusions By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.

Download Full-text

A phylogenetic model for the recruitment of species into microbial communities and application to studies of the human microbiome

10.1101/685644 ◽

2019 ◽

Author(s):

John L. Darcy ◽

Alex D. Washburne ◽

Michael S. Robeson ◽

Tiffany Prest ◽

Steven K. Schmidt ◽

...

Keyword(s):

New Species ◽

Microbial Communities ◽

Phylogenetic Relationships ◽

Temporal Dynamics ◽

Human Microbiome ◽

Close Relative ◽

Data Sets ◽

Close Relatives ◽

Phylogenetic Overdispersion ◽

Microbiome Data

AbstractUnderstanding when and why new species are recruited into microbial communities is a formidable problem with implications for managing microbial systems, for instance by helping us better understand whether a probiotic or pathogen would be expected to colonize a human microbiome. Much theory in microbial temporal dynamics is focused on how phylogenetic relationships between microbes impact the order in which those microbes are recruited; for example species that are closely related may competitively exclude each other. However, several recent human microbiome studies have observed closely-related bacteria being recruited into microbial communities in short succession, suggesting that microbial community assembly is historically contingent, but competitive exclusion of close relatives may not be important. To address this, we developed a mathematical model that describes the order in which new species are detected in microbial communities over time within a phylogenetic framework. We use our model to test three hypothetical assembly modes: underdispersion (species recruitment is more likely if a close relative was previously detected), overdispersion (recruitment is more likely if a close relative has not been previously detected), and the neutral model (recruitment likelihood is not related to phylogenetic relationships among species). We applied our model to longitudinal human microbiome data, and found that for the individuals we analyzed, the human microbiome generally follows the underdispersion (i.e. nepotism) hypothesis. Exceptions were oral communities and the fecal communities of two infants that had undergone heavy antibiotic treatment. None of the data sets we analyzed showed statistically significant phylogenetic overdispersion.

Download Full-text

Mean reversion in corporate leverage: evidence from India

Managerial Finance ◽

10.1108/mf-09-2018-0425 ◽

2019 ◽

Vol 45 (9) ◽

pp. 1183-1198

Author(s):

Gaurav S. Chauhan ◽

Pradip Banerjee

Keyword(s):

Capital Structure ◽

Emerging Market ◽

Simulated Data ◽

Mean Reversion ◽

Developed Countries ◽

Data Sets ◽

Debt Ratio ◽

Testing Strategy ◽

Content Type ◽

Financing Behavior

Purpose Recent papers on target capital structure show that debt ratio seems to vary widely in space and time, implying that the functional specifications of target debt ratios are of little empirical use. Further, target behavior cannot be adjudged correctly using debt ratios, as they could revert due to mechanical reasons. The purpose of this paper is to develop an alternative testing strategy to test the target capital structure. Design/methodology/approach The authors make use of a major “shock” to the debt ratios as an event and think of a subsequent reversion as a movement toward a mean or target debt ratio. By doing this, the authors no longer need to identify target debt ratios as a function of firm-specific variables or any other rigid functional form. Findings Similar to the broad empirical evidence in developed economies, there is no perceptible and systematic mean reversion by Indian firms. However, unlike developed countries, proportionate usage of debt to finance firms’ marginal financing deficits is extensive; equity is used rather sparingly. Research limitations/implications The trade-off theory could be convincingly refuted at least for the emerging market of India. The paper here stimulated further research on finding reasons for specific financing behavior of emerging market firms. Practical implications The results show that the firms’ financing choices are not only depending on their own firm’s specific variables but also on the financial markets in which they operate. Originality/value This study attempts to assess mean reversion in debt ratios in a unique but reassuring manner. The results are confirmed by extensive calibration of the testing strategy using simulated data sets.

Download Full-text

A Machine Learning-Based Seismic Data Compression and Interpretation Using a Novel Shifted-Matrix Decomposition Algorithm

Applied Sciences ◽

10.3390/app11114874 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4874

Author(s):

Milan Brankovic ◽

Eduardo Gildin ◽

Richard L. Gibson ◽

Mark E. Everett

Keyword(s):

Real Time ◽

Seismic Data ◽

Matrix Decomposition ◽

Original Data ◽

Velocity Estimation ◽

Singular Vectors ◽

Well Stimulation ◽

Marine Seismic ◽

Value Decomposition ◽

Seismic Data Compression

Seismic data provides integral information in geophysical exploration, for locating hydrocarbon rich areas as well as for fracture monitoring during well stimulation. Because of its high frequency acquisition rate and dense spatial sampling, distributed acoustic sensing (DAS) has seen increasing application in microseimic monitoring. Given large volumes of data to be analyzed in real-time and impractical memory and storage requirements, fast compression and accurate interpretation methods are necessary for real-time monitoring campaigns using DAS. In response to the developments in data acquisition, we have created shifted-matrix decomposition (SMD) to compress seismic data by storing it into pairs of singular vectors coupled with shift vectors. This is achieved by shifting the columns of a matrix of seismic data before applying singular value decomposition (SVD) to it to extract a pair of singular vectors. The purpose of SMD is data denoising as well as compression, as reconstructing seismic data from its compressed form creates a denoised version of the original data. By analyzing the data in its compressed form, we can also run signal detection and velocity estimation analysis. Therefore, the developed algorithm can simultaneously compress and denoise seismic data while also analyzing compressed data to estimate signal presence and wave velocities. To show its efficiency, we compare SMD to local SVD and structure-oriented SVD, which are similar SVD-based methods used only for denoising seismic data. While the development of SMD is motivated by the increasing use of DAS, SMD can be applied to any seismic data obtained from a large number of receivers. For example, here we present initial applications of SMD to readily available marine seismic data.

Download Full-text

Spectral Convolution Feature-Based SPD Matrix Representation for Signal Detection Using a Deep Neural Network

Entropy ◽

10.3390/e22090949 ◽

2020 ◽

Vol 22 (9) ◽

pp. 949

Author(s):

Jiangyi Wang ◽

Min Liu ◽

Xinwu Zeng ◽

Xiaoqiang Hua

Keyword(s):

Neural Network ◽

Signal Detection ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Detection Method ◽

Learning Algorithm ◽

Simulated Data ◽

Data Sets ◽

Feature Maps ◽

Simulated Data Sets

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.

Download Full-text

Two-Step Root-MUSIC for Direction of Arrival Estimation without EVD/SVD Computation

International Journal of Antennas and Propagation ◽

10.1155/2018/9695326 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Feng-Gang Yan ◽

Shuai Liu ◽

Jun Wang ◽

Ming Jin

Keyword(s):

Super Resolution ◽

Direction Of Arrival ◽

Doa Estimation ◽

Low Complexity ◽

Music Algorithm ◽

Correlation Matrices ◽

Noise Subspace ◽

Using Data ◽

Eigen Decomposition ◽

Value Decomposition

Most popular techniques for super-resolution direction of arrival (DOA) estimation rely on an eigen-decomposition (EVD) or a singular value decomposition (SVD) computation to determine the signal/noise subspace, which is computationally expensive for real-time applications. A two-step root multiple signal classification (TS-root-MUSIC) algorithm is proposed to avoid the complex EVD/SVD computation using a uniform linear array (ULA) based on a mild assumption that the number of signals is less than half that of sensors. The ULA is divided into two subarrays, and three noise-free cross-correlation matrices are constructed using data collected by the two subarrays. A low-complexity linear operation is derived to obtain a rough noise subspace for a first-step DOA estimate. The performance is further enhanced in the second step by using the first-step result to renew the previous estimated noise subspace with a slightly increased complexity. The new technique can provide close root mean square error (RMSE) performance to root-MUSIC with reduced computational burden, which are verified by numerical simulations.

Download Full-text