belg: A Tool for Calculating Boltzmann Entropy of Landscape Gradients

Jakub Nowosad; Peichao Gao

doi:10.3390/e22090937

belg: A Tool for Calculating Boltzmann Entropy of Landscape Gradients

Entropy ◽

10.3390/e22090937 ◽

2020 ◽

Vol 22 (9) ◽

pp. 937

Author(s):

Jakub Nowosad ◽

Peichao Gao

Keyword(s):

Missing Values ◽

Real Data ◽

R Package ◽

Landscape Patterns ◽

Boltzmann Entropy ◽

The Core ◽

Landscape Gradients ◽

Landscape Mosaics ◽

Basic Functions ◽

Core Functionality

Entropy is a fundamental concept in thermodynamics that is important in many fields, including image processing, neurobiology, urban planning, and sustainability. As of recently, the application of Boltzmann entropy for landscape patterns was mostly limited to the conceptual discussion. However, in the last several years, a number of methods for calculating Boltzmann entropy for landscape mosaics and gradients were proposed. We developed an R package belg as an open source tool for calculating Boltzmann entropy of landscape gradients. The package contains functions to calculate relative and absolute Boltzmann entropy using the hierarchy-based and the aggregation-based methods. It also supports input raster with missing (NA) values, allowing for calculations on real data. In this study, we explain ideas behind implemented methods, describe the core functionality of the software, and present three examples of its use. The examples show the basic functions in this package, how to adjust Boltzmann entropy values for data with missing values, and how to use the belg package in larger workflows. We expect that the belg package will be a useful tool in the discussion of using entropy for a description of landscape patterns and facilitate a thermodynamic understanding of landscape dynamics.

Download Full-text

Model-based joint visualization of multiple compositional omics datasets

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa050 ◽

2020 ◽

Vol 2 (3) ◽

Cited By ~ 1

Author(s):

Stijn Hawinkel ◽

Luc Bijnens ◽

Kim-Anh Lê Cao ◽

Olivier Thas

Keyword(s):

Data Integration ◽

Latent Variable ◽

Missing Values ◽

Real Data ◽

R Package ◽

Model Based ◽

Link Functions ◽

Integration Techniques ◽

Log Ratio ◽

Flexible Model

Abstract The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi.

Download Full-text

Generalizing Boltzmann Configurational Entropy to Surfaces, Point Patterns and Landscape Mosaics

Entropy ◽

10.3390/e23121616 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1616

Author(s):

Samuel A. Cushman

Keyword(s):

Configurational Entropy ◽

Direct Application ◽

Entropy Function ◽

Landscape Patterns ◽

Neutral Model ◽

Boltzmann Entropy ◽

Point Patterns ◽

Landscape Mosaics ◽

Landscape Patch ◽

Permutation Analysis

Several methods have been recently proposed to calculate configurational entropy, based on Boltzmann entropy. Some of these methods appear to be fully thermodynamically consistent in their application to landscape patch mosaics, but none have been shown to be fully generalizable to all kinds of landscape patterns, such as point patterns, surfaces, and patch mosaics. The goal of this paper is to evaluate if the direct application of the Boltzmann relation is fully generalizable to surfaces, point patterns, and landscape mosaics. I simulated surfaces and point patterns with a fractal neutral model to control their degree of aggregation. I used spatial permutation analysis to produce distributions of microstates and fit functions to predict the distributions of microstates and the shape of the entropy function. The results confirmed that the direct application of the Boltzmann relation is generalizable across surfaces, point patterns, and landscape mosaics, providing a useful general approach to calculating landscape entropy.

Download Full-text

Detection of differentially methylated CpG sites between tumor samples with uneven tumor purities

Bioinformatics ◽

10.1093/bioinformatics/btz885 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2017-2024

Author(s):

Weiwei Zhang ◽

Ziyi Li ◽

Nana Wei ◽

Hua-Jun Wu ◽

Xiaoqi Zheng

Keyword(s):

Real Data ◽

R Package ◽

Differential Methylation ◽

Least Square ◽

Epigenetic Mechanism ◽

Supplementary Information ◽

Cpg Sites ◽

Tumor Purity ◽

Different Sources ◽

Normal Controls

Abstract Motivation Inference of differentially methylated (DM) CpG sites between two groups of tumor samples with different geno- or pheno-types is a critical step to uncover the epigenetic mechanism of tumorigenesis, and identify biomarkers for cancer subtyping. However, as a major source of confounding factor, uneven distributions of tumor purity between two groups of tumor samples will lead to biased discovery of DM sites if not properly accounted for. Results We here propose InfiniumDM, a generalized least square model to adjust tumor purity effect for differential methylation analysis. Our method is applicable to a variety of experimental designs including with or without normal controls, different sources of normal tissue contaminations. We compared our method with conventional methods including minfi, limma and limma corrected by tumor purity using simulated datasets. Our method shows significantly better performance at different levels of differential methylation thresholds, sample sizes, mean purity deviations and so on. We also applied the proposed method to breast cancer samples from TCGA database to further evaluate its performance. Overall, both simulation and real data analyses demonstrate favorable performance over existing methods serving similar purpose. Availability and implementation InfiniumDM is a part of R package InfiniumPurify, which is freely available from GitHub (https://github.com/Xiaoqizheng/InfiniumPurify). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes

BMC Bioinformatics ◽

10.1186/s12859-020-03945-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bing Song ◽

August E. Woerner ◽

John Planz

Keyword(s):

Population Genetics ◽

Linkage Disequilibrium ◽

Genetic Markers ◽

Software Package ◽

Tandem Repeats ◽

Population Data ◽

Real Data ◽

R Package ◽

Nucleotide Polymorphisms ◽

Mutual Independence

Abstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. Results This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package “mixIndependR” calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy–Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. Conclusion The package “mixIndependR” is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package “mixIndependR” makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. Availability The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html.

Download Full-text

Missing values and lack of information in water management datasets: an approach based on Bayesian Networks

10.5194/egusphere-egu21-10004 ◽

2021 ◽

Author(s):

Rosa F Ropero ◽

M Julia Flores ◽

Rafael Rumí

Keyword(s):

Extreme Events ◽

Coastal Area ◽

Flood Risk ◽

Missing Values ◽

Real Data ◽

Continuous Variables ◽

Target Variable ◽

River Level ◽

Lack Of Information ◽

Level Variable

Environmental data often present missing values or lack of information that make modelling tasks difficult. Under the framework of SAICMA Research Project, a flood risk management system is modelled for Andalusian Mediterranean catchment using information from the Andalusian Hydrological System. Hourly data were collected from October 2011 to September 2020, and present two issues:<ul><li>In Guadarranque River, for the dam level variable there is no data from May to August 2020, probably because of sensor damage.</li> <li>No information about river level is collected in the lower part of Guadiaro River, which make difficult to estimate flood risk in the coastal area.</li> </ul>In order to avoid removing dam variable from the entire model (or those missing months), or even reject modelling one river system, this abstract aims to provide modelling solutions based on Bayesian networks (BNs) that overcome this limitation.Guarranque River. Missing values.Dataset contains 75687 observations for 6 continuous variables. BNs regression models based on fixed structures (Na&#239;ve Bayes, NB, and Tree Augmented Na&#239;ve, TAN) were learnt using the complete dataset (until September 2019) with the aim of predicting the dam level variable as accurately as possible. A scenario was carried out with data from October 2019 to March 2020 and compared the prediction made for the target variable with the real data. Results show both NB (rmse: 6.29) and TAN (rmse: 5.74) are able to predict the behaviour of the target variable.Besides, a BN based on expert&#8217;s structural learning was learnt with real data and both datasets with imputed values by NB and TAN. Results show models learnt with imputed data (NB: 3.33; TAN: 3.07) improve the error rate of model with respect to real data (4.26).Guadairo River. Lack of information.Dataset contains 73636 observations with 14 continuous variables. Since rainfall variables present a high percentage of zero values (over 94%), they were discretised by Equal Frequency method with 4 intervals. The aim is to predict flooding risk in the coastal area but no data is collected from this area. Thus, an unsupervised classification based on hybrid BNs was performed. Here, target variable classifies all observations into a set of homogeneous groups and gives, for each observation, the probability of belonging to each group. Results show a total of 3 groups:<ul><li>Group 0, &#8220;Normal situation&#8221;: with rainfall values equal to 0, and mean of river level very low.</li> <li>Group 1, &#8220;Storm situation&#8221;: mean rainfall values are over 0.3 mm and all river level variables duplicate the mean with respect to group 0.</li> <li>Group 2, &#8220;Extreme situation&#8221;: Both rainfall and river level means values present the highest values far away from both previous groups.</li> </ul>Even when validation shows this methodology is able to identify extreme events, further work is needed. In this sense, data from autumn-winter season (from October 2020 to March 2021) will be used. Including this new information it would be possible to check if last extreme events (flooding event during December and Filomenastorm during January) are identified.&#160;&#160;&#160;

Download Full-text

Development of Lifting and Lowering-in Plan for the Control of Construction Stresses

Volume 2: Pipeline Safety Management Systems; Project Management, Design, Construction, and Environmental Issues; Strain Based Design; Risk and Reliability; Northern, Offshore, and Production Pipelines ◽

10.1115/ipc2020-9753 ◽

2020 ◽

Author(s):

Jiawei Wang ◽

Yong-Yi Wang ◽

William A. Bruce ◽

Steve Rapp ◽

Russell Scoles

Keyword(s):

Stress Analysis ◽

Analysis Tool ◽

Graphical Interface ◽

Height Range ◽

The Core ◽

Extensive Analysis ◽

Concrete Coating ◽

Cross Country ◽

Girth Welds ◽

Core Functionality

Abstract Construction of a cross-country pipeline involves lifting the pipeline off the skids and lowering it into the trench (lifting and lowering-in). This can introduce the highest stress magnitude that the pipe may experience over its service life. If not managed properly, overly high stresses may cause integrity issues during construction and/or service. If the girth welds are qualified and accepted using alternative flaw acceptance criteria, such as those in API 1104 Annex A and CSA Z662 Annex K, these stresses must be kept below a preset level during lifting and lowering-in to satisfy the requirements of those standards. This paper covers the development and usage of a stress analysis tool for the continuous lifting and lowering-in of pipe strings without a concrete coating or river weights. The outcome of the stress analysis can be used to develop lifting and lowering-in plans for construction crews. The core functionality of the application tool is to calculate the stresses from bending in the vertical and horizontal planes. The stresses from vertical bending are derived from an extensive analysis of continuous lifting and lowering-in processes. The stresses from horizontal bending are calculated using closed-form analytical solutions. The tool provides a graphical interface that interprets the background stress analysis results and displays information necessary for the development of lifting and lowering-in plans. The tool can be used to evaluate what-if scenarios for various tentative lifting and lowering-in scenarios. The process of using the tool to develop lifting and lowering-in plans is demonstrated in this paper through an example problem. The number of sidebooms and other lifting and lowering-in parameters such as sideboom spacing and lifting height range are changed to make the lifting and lowering-in plan easy to use for the laying contractors. Such tradeoffs can be addressed proactively with construction contractors to ensure that a mutually acceptable approach to lifting and lowering-in is taken.

Download Full-text

Semiparametric inverse propensity weighting for nonignorable missing data

Biometrika ◽

10.1093/biomet/asv071 ◽

2016 ◽

Vol 103 (1) ◽

pp. 175-187 ◽

Cited By ~ 31

Author(s):

Jun Shao ◽

Lei Wang

Keyword(s):

Missing Data ◽

Missing Values ◽

Generalized Method Of Moments ◽

Estimating Equations ◽

Real Data ◽

Population Parameters ◽

Finite Sample ◽

External Data ◽

Nonignorable Missing ◽

Inverse Propensity Weighting

Abstract To estimate unknown population parameters based on data having nonignorable missing values with a semiparametric exponential tilting propensity, Kim & Yu (2011) assumed that the tilting parameter is known or can be estimated from external data, in order to avoid the identifiability issue. To remove this serious limitation on the methodology, we use an instrument, i.e., a covariate related to the study variable but unrelated to the missing data propensity, to construct some estimating equations. Because these estimating equations are semiparametric, we profile the nonparametric component using a kernel-type estimator and then estimate the tilting parameter based on the profiled estimating equations and the generalized method of moments. Once the tilting parameter is estimated, so is the propensity, and then other population parameters can be estimated using the inverse propensity weighting approach. Consistency and asymptotic normality of the proposed estimators are established. The finite-sample performance of the estimators is studied through simulation, and a real-data example is also presented.

Download Full-text

Categorical Functional Data Analysis. The cfda R Package

Mathematics ◽

10.3390/math9233074 ◽

2021 ◽

Vol 9 (23) ◽

pp. 3074

Author(s):

Cristian Preda ◽

Quentin Grimonprez ◽

Vincent Vandewalle

Keyword(s):

Functional Data ◽

Multiple Correspondence Analysis ◽

Real Data ◽

Jump Process ◽

R Package ◽

Finite Basis ◽

Data Set ◽

Stochastic Jump ◽

Finite Set ◽

Infinite Set

Categorical functional data represented by paths of a stochastic jump process with continuous time and a finite set of states are considered. As an extension of the multiple correspondence analysis to an infinite set of variables, optimal encodings of states over time are approximated using an arbitrary finite basis of functions. This allows dimension reduction, optimal representation, and visualisation of data in lower dimensional spaces. The methodology is implemented in the cfda R package and is illustrated using a real data set in the clustering framework.

Download Full-text

Situating allostasis and interoception at the core of human brain function

10.31234/osf.io/wezv8 ◽

2021 ◽

Author(s):

Yuta Katsumi ◽

Karen Quigley ◽

Lisa Feldman Barrett

Keyword(s):

Brain Function ◽

Brain Evolution ◽

Human Mind ◽

Biological Mechanisms ◽

The Core ◽

Starting Point ◽

Basic Functions ◽

Mental Events ◽

Physical Phenomena ◽

The Brain

It is now well known that brain evolution, development, and structure do not respect Western folk categories of mind – that is, the boundaries of those folk categories have never been identified in nature, despite decades of search. Categories for cognitions, emotions, perceptions, and so on, may be useful for describing the mental phenomena that constitute a human mind, but they make a poor starting point for understanding the interplay of mechanisms that create those mental events in the first place. In this paper, we integrate evolutionary, developmental, anatomical, and functional evidence and propose that predictive regulation of the body’s internal systems (allostasis) and modeling the sensory consequences of this regulation (interoception) may be basic functions of the brain that are embedded in coordinated structural and functional gradients. Our approach offers the basis for a coherent, neurobiologically-inspired research program that attempts to explain how a variety of psychological and physical phenomena may emerge from the same biological mechanisms, thus providing an opportunity to unify them under a common explanatory framework that can be used to develop shared vocabulary for theory building and knowledge accumulation.

Download Full-text

The asymptotic distribution of the Net Benefit estimator in presence of right-censoring

Statistical Methods in Medical Research ◽

10.1177/09622802211037067 ◽

2021 ◽

pp. 096228022110370

Author(s):

Brice Ozenne ◽

Esben Budtz-Jørgensen ◽

Julien Péron

Keyword(s):

Asymptotic Distribution ◽

Nuisance Parameter ◽

Real Data ◽

R Package ◽

Right Censoring ◽

Drop Out ◽

Net Benefit ◽

Asymptotic Results ◽

Finite Samples ◽

Benefit Risk Assessment

The benefit–risk balance is a critical information when evaluating a new treatment. The Net Benefit has been proposed as a metric for the benefit–risk assessment, and applied in oncology to simultaneously consider gains in survival and possible side effects of chemotherapies. With complete data, one can construct a U-statistic estimator for the Net Benefit and obtain its asymptotic distribution using standard results of the U-statistic theory. However, real data is often subject to right-censoring, e.g. patient drop-out in clinical trials. It is then possible to estimate the Net Benefit using a modified U-statistic, which involves the survival time. The latter can be seen as a nuisance parameter affecting the asymptotic distribution of the Net Benefit estimator. We present here how existing asymptotic results on U-statistics can be applied to estimate the distribution of the net benefit estimator, and assess their validity in finite samples. The methodology generalizes to other statistics obtained using generalized pairwise comparisons, such as the win ratio. It is implemented in the R package BuyseTest (version 2.3.0 and later) available on Comprehensive R Archive Network.

Download Full-text