Calculating the Wasserstein Metric-Based Boltzmann Entropy of a Landscape Mosaic

Hong Zhang; Zhiwei Wu; Tian Lan; Yanyu Chen; Peichao Gao

doi:10.3390/e22040381

Calculating the Wasserstein Metric-Based Boltzmann Entropy of a Landscape Mosaic

Entropy ◽

10.3390/e22040381 ◽

2020 ◽

Vol 22 (4) ◽

pp. 381 ◽

Cited By ~ 3

Author(s):

Hong Zhang ◽

Zhiwei Wu ◽

Tian Lan ◽

Yanyu Chen ◽

Peichao Gao

Keyword(s):

Spatial Data ◽

Software Tool ◽

Data File ◽

Landscape Patterns ◽

Wasserstein Metric ◽

Boltzmann Entropy ◽

Landscape Mosaic ◽

Numerical Work ◽

Data Set ◽

User Friendly

Shannon entropy is currently the most popular method for quantifying the disorder or information of a spatial data set such as a landscape pattern and a cartographic map. However, its drawback when applied to spatial data is also well documented; it is incapable of capturing configurational disorder. In addition, it has been recently criticized to be thermodynamically irrelevant. Therefore, Boltzmann entropy was revisited, and methods have been developed for its calculation with landscape patterns. The latest method was developed based on the Wasserstein metric. This method incorporates spatial repetitiveness, leading to a Wasserstein metric-based Boltzmann entropy that is capable of capturing the configurational disorder of a landscape mosaic. However, the numerical work required to calculate this entropy is beyond what can be practically achieved through hand calculation. This study developed a new software tool for conveniently calculating the Wasserstein metric-based Boltzmann entropy. The tool provides a user-friendly human–computer interface and many functions. These functions include multi-format data file import function, calculation function, and data clear or copy function. This study outlines several essential technical implementations of the tool and reports the evaluation of the software tool and a case study. Experimental results demonstrate that the software tool is both efficient and convenient.

Download Full-text

A Framework for Augmenting the Visualization of Dynamic Raster Surfaces

Information Visualization ◽

10.1057/palgrave.ivs.9500043 ◽

2003 ◽

Vol 2 (2) ◽

pp. 126-139 ◽

Cited By ~ 12

Author(s):

Sanjay Rana ◽

Jason Dykes

Keyword(s):

Computer Science ◽

Information Visualization ◽

Spatial Data ◽

Software Tool ◽

Geographic Information ◽

Interactive Software ◽

Specific Context ◽

Data Set ◽

Morphometric Features ◽

Temporal Interpolation

Animated sequences of raster images that represent continuously varying surfaces, such as a temporal series of an evolving landform or an attribute series of socio-economic variation, are often used in an attempt to gain insight from ordered sequences of raster spatial data. Despite their aesthetic appeal and condensed nature, such representations are limited in terms of their suitability for prompting ideas and offering insight due to their poor information delivery and the lack of the levels of interactivity that are required to support visualization. Cartographic techniques aim to assist users of geographic information through processes of abstraction, by selecting, simplifying, smoothing and exaggerating when representing an underlying spatial data set graphically. Here we suggest a number of transformations and abstractions that take advantage of these techniques in a specific context–that of addressing the limitations associated with using animated raster surfaces for visualization, and propose them in the context of a framework that can be used to inform practice. The five techniques proposed are spatial and attribute smoothing, temporal interpolation, transformation of the surfaces into a network of morphometric features, the use of a graphic lag or fading and the employment of techniques for conditional interactivity that are appropriate for visualization. These efforts allow us to generate graphical environments that support visualization when using animated sequences of images representing continuous surfaces and are analogous to traditional cartographic techniques, namely, smoothing and exaggeration, simplification, enhancement and the various issues of design. By developing a framework for considering cartography in support of visualization from this particular type of data and phenomenon we aim to highlight the utility of a generically cartographic approach to information visualization. A number of particular techniques originating from computer science and conventional cartography are used in an application of the framework. A suitably interactive software tool is offered for evaluation–to establish the results of applying the framework and demonstrate ways in which we may augment the visualization of dynamic raster surfaces through animation and more generally aim to offer opportunity for insight through cartographic design.

Download Full-text

Boltzmann Entropy for the Spatial Information of Raster Data

Abstracts of the ICA ◽

10.5194/ica-abs-1-86-2019 ◽

2019 ◽

Vol 1 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Peichao Gao ◽

Hong Zhang ◽

Zhilin Li

Keyword(s):

Boltzmann Equation ◽

Information Content ◽

Shannon Entropy ◽

Spatial Data ◽

Spatial Information ◽

Statistical Information ◽

Boltzmann Entropy ◽

Raster Data ◽

Data Set ◽

The Boltzmann Equation

Abstract. Entropy is an important concept that originated in thermodynamics. It is the subject of the famous Second Law of Thermodynamics, which states that “the entropy of a closed system increases continuously and irrevocably toward a maximum” (Huettner 1976, 102) or “the disorder in the universe always increases” (Framer and Cook 2013, 21). Accordingly, it has been widely regarded as an ideal measure of disorder. Its computation can be theoretically performed according to the Boltzmann equation, which was proposed by the Austrian physicist Ludwig Boltzmann in 1872. In practice, however, the Boltzmann equation involves two problems that are difficult to solve, that is the definition of the macrostate of a system and the determination of the number of possible microstates in the microstate. As noted by the American sociologist Kenneth Bailey, “when the notion of entropy is extended beyond physics, researchers may not be certain how to specify and measure the macrostate/microstate relations” (Bailey 2009, 151). As a result, this entropy (also referred to as Boltzmann entropy and thermodynamic entropy) has remained largely at a conceptual level. In practice, the widely used entropy is actually proposed by the American mathematician, electrical engineer, and cryptographer Claude Elwood Shannon in 1948, hence the term Shannon entropy. Shannon entropy was proposed to quantify the statistical disorder of telegraph messages in the area of communications. The quantification result was interpreted as the information content of a telegraph message, hence also the term information entropy. This entropy has served as the cornerstone of information theory and was introduced to various fields including chemistry, biology, and geography. It has been widely utilized to quantify the information content of geographic data (or spatial data) in either a vector format (i.e., vector data) or a raster format (i.e., raster data). However, only the statistical information of spatial data can be quantified by using Shannon entropy. The spatial information is ignored by Shannon entropy; for example, a grey image and its corresponding error image share the same Shannon entropy. Therefore, considerable efforts have been made to improve the suitability of Shannon entropy for spatial data, and a number of improved Shannon entropies have been put forward. Rather than further improving Shannon entropy, this study introduces a novel strategy, namely shifting back from Shannon entropy to Boltzmann entropy. There are two advantages of employing Boltzmann entropy. First, as previously mentioned, Boltzmann entropy is the ideal, standard measure of disorder or information. It is theoretically capable of quantifying not only the statistical information but also the spatial information of a data set. Second, Boltzmann entropy can serve as the bridge between spatial patterns and thermodynamic interpretations. In this sense, the Boltzmann entropy of spatial data may have wider applications. In this study, Boltzmann entropy is employed to quantify the spatial information of raster data, such as images, raster maps, digital elevation models, landscape mosaics, and landscape gradients. To this end, the macrostate of raster data is defined, and the number of all possible microstates in the macrostate is determined. To demonstrate the usefulness of Boltzmann entropy, it is applied to satellite remote sensing image processing, and a comparison is made between its performance and that of Shannon entropy.

Download Full-text

Wasserstein metric-based Boltzmann entropy of a landscape mosaic: a clarification, correction, and evaluation of thermodynamic consistency

Landscape Ecology ◽

10.1007/s10980-020-01177-4 ◽

2021 ◽

Author(s):

Peichao Gao ◽

Hong Zhang ◽

Zhiwei Wu

Keyword(s):

Thermodynamic Consistency ◽

Wasserstein Metric ◽

Boltzmann Entropy ◽

Landscape Mosaic

Download Full-text

Curation and annotation of planarian gene expression patterns with segmented reference morphologies

Bioinformatics ◽

10.1093/bioinformatics/btaa023 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2881-2887

Author(s):

Joy Roy ◽

Eric Cheung ◽

Junaid Bhatti ◽

Abraar Muneem ◽

Daniel Lobo

Keyword(s):

Gene Expression ◽

Spatial Data ◽

Expression Patterns ◽

Software Tool ◽

Supplementary Information ◽

Gene Expression Patterns ◽

Mechanistic Models ◽

Body Regions ◽

Anatomical Ontology ◽

User Friendly

Abstract Motivation Morphological and genetic spatial data from functional experiments based on genetic, surgical and pharmacological perturbations are being produced at an extraordinary pace in developmental and regenerative biology. However, our ability to extract knowledge from these large datasets are hindered due to the lack of formalization methods and tools able to unambiguously describe, centralize and interpret them. Formalizing spatial phenotypes and gene expression patterns is especially challenging in organisms with highly variable morphologies such as planarian worms, which due to their extraordinary regenerative capability can experimentally result in phenotypes with almost any combination of body regions or parts. Results Here, we present a computational methodology and mathematical formalism to encode and curate the morphological outcomes and gene expression patterns in planaria. Worm morphologies are encoded with mathematical graphs based on anatomical ontology terms to automatically generate reference morphologies. Gene expression patterns are registered to these standard reference morphologies, which can then be annotated automatically with anatomical ontology terms by analyzing the spatial expression patterns and their textual descriptions. This methodology enables the curation and annotation of complex experimental morphologies together with their gene expression patterns in a centralized standardized dataset, paving the way for the extraction of knowledge and reverse-engineering of the much sought-after mechanistic models in planaria and other regenerative organisms. Availability and implementation We implemented this methodology in a user-friendly graphical software tool, PlanGexQ, freely available together with the data in the manuscript at https://lobolab.umbc.edu/plangexq. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

THE CULTURAL HERITAGE SPATIAL DATA SET OF KLAIPĖDA OLD TOWN / SAUGOMŲ KLAIPĖDOS SENAMIESČIO OBJEKTŲ ERDVINIŲ DUOMENŲ RINKINYS / НАБОР ПРОСТРАНСТВЕННЫХ ДАННЫХ ОХРАНЯЕМЫХ ОБЪЕКТОВ СТАРОГО ГОРОДА КЛАЙПЕДЫ

Geodesy and Cartography ◽

10.3846/13921541.2011.645352 ◽

2012 ◽

Vol 37 (4) ◽

pp. 172-176

Author(s):

Lina Kuklienė ◽

Dainora Jankauskienė ◽

Indrius Kuklys

Keyword(s):

Cultural Heritage ◽

Spatial Data ◽

Road Construction ◽

Heritage Management ◽

Data Set ◽

Management Planning ◽

Cultural Heritage Management ◽

Input Devices ◽

Relevant Point ◽

Cultural Heritage Objects

The purpose of the thesis is to analyze the main geodetic databases of Lithuania and to create a geodetic database of cultural heritage objects in Klaipėda using program ArcGIS 9.3. The problem is that the geodetic database storing graphical and attributive information about cultural heritage in Klaipeda city has not been created yet. Thus, in order to incorporate GIS technologies into the management of cultural heritage, starting the creation of such a database seems to be a relevant point. The fully completed and regularly updated geodetic database can be used for cultural heritage management, planning, design, road construction, etc. Therefore, the following objectives have been set: 1) describing geo-data collection and input devices; 2) stimulating the geodetic database that introduces information about buildings, building complexes, cemeteries, locations of archaeological and cultural heritage; 3) giving a detailed description of the database creation process; 4) analyzing the need for establishing a geodetic database of cultural heritage objects in Klaipėda. Santrauka Lietuvoje GIS pagrindu sukurta daug įvairiems tikslams skirtų georeferencinių bei teminių erdvinių duomenų rinkinių. Viena iš šių rinkinių panaudojimo sričių – valstybės registruose esančių duomenų kaupimas. Tokiu principu yra sukurta Kultūros vertybių registro duomenų bazė, kurios pagrindiniai duomenys buvo panaudoti kuriant Klaipėdos miesto kultūros paveldo objektų erdvinių duomenų rinkinį. Siekiant kuo operatyviau įtraukti GIS technologijas į kultūros paveldo objektų tvarkybą, aktualu Klaipėdoje pradėti kurti kultūros paveldo objektų erdvinių duomenų rinkinį. Nuolat atnaujinamas erdvinių duomenų rinkinys palengvins įvairių sričių specialistų atliekamus kultūros paveldo objektų administravimo, teritorijų planavimo, projektavimo, kelių tiesimo ir kitus darbus. Резюме В Литве на основе ГИС для различных целей создано множество гео-ссылок, а также тематических наборов пространственных данных. Область использования одного из множеств – сбор данных, имеющихся в государственном учете. По такому принципу создана регистрационная база культурных ценностей, основные данные которой были использованы при создании набора пространственных данных объектов культурного наследия города Клайпеды. С целью оперативно обеспечить управление объектами культурного наследия технологиями ГИС следует начать создание набора пространственных данных объектов культурного наследия в Клайпеде. Полностью заполненный и постоянно обновляемый набор пространственных данных облегчит работу специалистов в различных областях: администрировании объектов культурного наследия, планировании территорий, проектировании, строительстве дорог и других.

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

Investigating the Broken-Heart Effect: a Model for Short-Term Dependence between the Remaining Lifetimes of Joint Lives

Annals of Actuarial Science ◽

10.1017/s1748499512000292 ◽

2012 ◽

Vol 7 (2) ◽

pp. 236-257 ◽

Cited By ~ 12

Author(s):

Jaap Spreeuw ◽

Iqbal Owadally

Keyword(s):

Mortality Rates ◽

Short Term ◽

Numerical Work ◽

Data Set ◽

Multiple State ◽

Multiple State Model ◽

Insurance Data ◽

Over Time ◽

Assurance Contracts

AbstractWe analyze the mortality of couples by fitting a multiple state model to a large insurance data set. We find evidence that mortality rates increase after the death of a partner and, in addition, that this phenomenon diminishes over time. This is popularly known as a “broken-heart” effect and we find that it affects widowers more than widows. Remaining lifetimes of joint lives therefore exhibit short-term dependence. We carry out numerical work involving the pricing and valuation of typical contingent assurance contracts and of a joint life and survivor annuity. If insurers ignore dependence, or mis-specify it as long-term dependence, then significant mis-pricing and inappropriate provisioning can result. Detailed numerical results are presented.

Download Full-text

SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control

mSystems ◽

10.1128/msystems.00202-17 ◽

2018 ◽

Vol 3 (3) ◽

Cited By ~ 15

Author(s):

Gabriel A. Al-Ghalith ◽

Benjamin Hillmann ◽

Kaiwei Ang ◽

Robin Shields-Cutler ◽

Dan Knights

Keyword(s):

Quality Control ◽

Dna Sequences ◽

Sequence Data ◽

Background Knowledge ◽

Sequencing Technology ◽

Data Set ◽

Short Read ◽

Dna Quality ◽

Public Data ◽

User Friendly

ABSTRACT Next-generation sequencing technology is of great importance for many biological disciplines; however, due to technical and biological limitations, the short DNA sequences produced by modern sequencers require numerous quality control (QC) measures to reduce errors, remove technical contaminants, or merge paired-end reads together into longer or higher-quality contigs. Many tools for each step exist, but choosing the appropriate methods and usage parameters can be challenging because the parameterization of each step depends on the particularities of the sequencing technology used, the type of samples being analyzed, and the stochasticity of the instrumentation and sample preparation. Furthermore, end users may not know all of the relevant information about how their data were generated, such as the expected overlap for paired-end sequences or type of adaptors used to make informed choices. This increasing complexity and nuance demand a pipeline that combines existing steps together in a user-friendly way and, when possible, learns reasonable quality parameters from the data automatically. We propose a user-friendly quality control pipeline called SHI7 (canonically pronounced “shizen”), which aims to simplify quality control of short-read data for the end user by predicting presence and/or type of common sequencing adaptors, what quality scores to trim, whether the data set is shotgun or amplicon sequencing, whether reads are paired end or single end, and whether pairs are stitchable, including the expected amount of pair overlap. We hope that SHI7 will make it easier for all researchers, expert and novice alike, to follow reasonable practices for short-read data quality control. IMPORTANCE Quality control of high-throughput DNA sequencing data is an important but sometimes laborious task requiring background knowledge of the sequencing protocol used (such as adaptor type, sequencing technology, insert size/stitchability, paired-endedness, etc.). Quality control protocols typically require applying this background knowledge to selecting and executing numerous quality control steps with the appropriate parameters, which is especially difficult when working with public data or data from collaborators who use different protocols. We have created a streamlined quality control pipeline intended to substantially simplify the process of DNA quality control from raw machine output files to actionable sequence data. In contrast to other methods, our proposed pipeline is easy to install and use and attempts to learn the necessary parameters from the data automatically with a single command.

Download Full-text

Recognition of spatial data from natural language texts for the purpose of visualization

Transaction Kola Science Cetnre ◽

10.37614/2307-5252.2021.5.12.004 ◽

2021 ◽

Vol 12 (5-2021) ◽

pp. 50-56

Author(s):

Boris M. Pileckiy ◽

Keyword(s):

Natural Language ◽

Spatial Data ◽

Software Tool ◽

Syntactic Analysis ◽

Practical Implementation ◽

Distributed Software ◽

Preliminary Results

This paper describes one of the possible implementation options for the recognition of spatial data from natural language texts. The proposed option is based on the lexico-syntactic analysis of texts, which requires the use of special grammars and dictionaries. Spatial data recognition is carried out for their subsequent geocoding and visualization. The practical implementation of spatial data recognition is done using a free, freely distributed software tool. Also, some applications of spatial data are considered in the work and preliminary results of spatial data recognition are given.

Download Full-text

CHR: An Integrated Software Tool for chilling requirements

10.21203/rs.3.rs-34169/v1 ◽

2020 ◽

Author(s):

Chen Chen ◽

Wanyu Xu ◽

Ningning Gou ◽

Lasu Bai ◽

Lin Wang ◽

...

Keyword(s):

Statistical Analysis ◽

Software Tool ◽

Fruit Trees ◽

Cold Weather ◽

Chilling Requirements ◽

Deciduous Fruit ◽

Heat Requirement ◽

Integrated Software ◽

User Friendly ◽

Analyze Data

Abstract Background Bud dormancy in deciduous fruit trees enables plants to survive cold weather. The buds adopt dormant state and resume growth after satisfying the chilling requirements. Chilling requirements play a key role in flowering time. So far, several chilling models, including ≤ 7.2 °C model, the 0–7.2 °C model, Utah model, and Dynamic Model, have been developed; however, it is still time-consuming to determine the chilling requirements employing any model. This calls for efficient tools that can analyze data. Results In this study, we developed novel software Chilling and Heat Requirement (CHR), by flexibly integrating data conversions, model selection, calculations, statistical analysis, and plotting. Conclusion CHR is a tool for chilling requirements estimation, which will be very useful to researchers. It is very simple, easy, and user-friendly.

Download Full-text