A Multiple Subspaces-Based Model: Interpreting Urban Functional Regions with Big Geospatial Data

Jiawei Zhu; Chao Tao; Xin Lin; Jian Peng; Haozhe Huang; Li Chen; Qiongjie Wang

doi:10.3390/ijgi10020066

A Multiple Subspaces-Based Model: Interpreting Urban Functional Regions with Big Geospatial Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020066 ◽

2021 ◽

Vol 10 (2) ◽

pp. 66

Author(s):

Jiawei Zhu ◽

Chao Tao ◽

Xin Lin ◽

Jian Peng ◽

Haozhe Huang ◽

...

Keyword(s):

Data Science ◽

Information Science ◽

Dimensional Space ◽

Human Mobility ◽

Activity Patterns ◽

Real Data ◽

Geospatial Data ◽

Geographical Information ◽

Urban Spatial Structure ◽

Functional Regions

Analyzing the urban spatial structure of a city is a core topic within urban geographical information science that has the ability to assist urban planning, site selection, location recommendation, etc. Among previous studies, comprehending the functionality of places is a central topic and corresponds to understanding how people use places. With the help of big geospatial data which contain affluent information about human mobility and activity, we propose a novel multiple subspaces-based model to interpret the urban functional regions. This model is based on the assumption that the temporal activity patterns of places lie in a high-dimensional space and can be represented by a union of low-dimensional subspaces. These subspaces are obtained through finding sparse representations using the data science method known as sparse subspace clustering (SSC). The paper details how to use this method in the context of detecting functional regions. With these subspaces, we can detect the functionality of urban regions in a designated study area and further explore the characteristics of functional regions. We conducted experiments using real data in Shanghai. The experimental results and outperformance of our proposed model against the single subspace-based method prove the efficacy and feasibility of our model.

Download Full-text

GEOGRAPHICAL INFORMATION SYSTEM (GISy) IMPLEMENTATION WITHOUT GEOGRAPHICAL INFORMATION SCIENCE (GISc) FUNDAMENTAL

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/491022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 807-815

Keyword(s):

Information System ◽

Geographical Information System ◽

Spatial Data ◽

Professional Training ◽

Information Science ◽

Rapid Development ◽

Geospatial Data ◽

Geographical Information ◽

Basic Knowledge ◽

The Impact

This article reviews the use of Geographical Information System (GIS) has been primarily applied in spatial decision making from simple to complex geospatial problems. GIS is usually referred to as a computer system used explicitly to store, manage, analyze, manipulate, and visualize geospatial data. GIS can produce meaningful information for a better understanding of solving related geographic/spatial problems. With the technology, hardware, and software assistance, GIS is at its progressive pace even though GIS starts with a simple and straightforward question of geographic features/event location. This rapid development has made GIS and spatial data becoming a critical commodity today. However, without the basic knowledge and GIS understanding, the actual GIS capabilities, such as understanding geographical concepts, managing geographic phenomena, and solving geographical problems, become limited. To become worse, GIS is was seen as a tool to facilitate map display and simple spatial analysis. Furthermore, the market's professional training emphasizes simple GIS components such as hardware, software, geospatial data mapping, extracting geographical data from tables (tabular data), simple queries or display, and spatial data editing mastered using GIS manuals in training. Thus, this article highlights the impact of implementing GIS without sufficient GIS fundamental knowledge, resulting in complicated spatial decision planning issues.

Download Full-text

Comparison of Twelve Machine Learning Regression Methods for Spatial Decomposition of Demographic Data Using Multisource Geospatial Data: An Experiment in Guangzhou City, China

Applied Sciences ◽

10.3390/app11209424 ◽

2021 ◽

Vol 11 (20) ◽

pp. 9424

Author(s):

Guanwei Zhao ◽

Zhitao Li ◽

Muzhuang Yang

Keyword(s):

Machine Learning ◽

Information Science ◽

Demographic Data ◽

Ordinary Least Squares ◽

Geospatial Data ◽

Geographical Information ◽

Model Parameters ◽

Spatial Decomposition ◽

Regression Algorithms ◽

Global Regression

The spatial decomposition of demographic data at a fine resolution is a classic and crucial problem in the field of geographical information science. The main objective of this study was to compare twelve well-known machine learning regression algorithms for the spatial decomposition of demographic data with multisource geospatial data. Grid search and cross-validation methods were used to ensure that the optimal model parameters were obtained. The results showed that all the global regression algorithms used in the study exhibited acceptable results, besides the ordinary least squares (OLS) algorithm. In addition, the regularization method and the subsetting method were both useful for alleviating overfitting in the OLS model, and the former was better than the latter. The more competitive performance of the nonlinear regression algorithms than the linear regression algorithms implies that the relationship between population density and influence factors is likely to be non-linear. Among the global regression algorithms used in the study, the best results were achieved by the k-nearest neighbors (KNN) regression algorithm. In addition, it was found that multi-sources geospatial data can improve the accuracy of spatial decomposition results significantly, and thus the proposed method in our study can be applied to the study of spatial decomposition in other areas.

Download Full-text

HCVS: Pinpointing Chromatin States Through Hierarchical Clustering and Visualization Scheme

Current Bioinformatics ◽

10.2174/1574893613666180402141107 ◽

2019 ◽

Vol 14 (2) ◽

pp. 148-156

Author(s):

Nighat Noureen ◽

Sahar Fazal ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Hierarchical Clustering ◽

Real Data ◽

Cell Types ◽

Computational Scheme ◽

Data Set ◽

Chromatin States ◽

Functional Regions ◽

Visualization Strategy ◽

Hidden States ◽

Next Generation Sequencing Ngs

Background: Specific combinations of Histone Modifications (HMs) contributing towards histone code hypothesis lead to various biological functions. HMs combinations have been utilized by various studies to divide the genome into different regions. These study regions have been classified as chromatin states. Mostly Hidden Markov Model (HMM) based techniques have been utilized for this purpose. In case of chromatin studies, data from Next Generation Sequencing (NGS) platforms is being used. Chromatin states based on histone modification combinatorics are annotated by mapping them to functional regions of the genome. The number of states being predicted so far by the HMM tools have been justified biologically till now. Objective: The present study aimed at providing a computational scheme to identify the underlying hidden states in the data under consideration. </P><P> Methods: We proposed a computational scheme HCVS based on hierarchical clustering and visualization strategy in order to achieve the objective of study. Results: We tested our proposed scheme on a real data set of nine cell types comprising of nine chromatin marks. The approach successfully identified the state numbers for various possibilities. The results have been compared with one of the existing models as well which showed quite good correlation. Conclusion: The HCVS model not only helps in deciding the optimal state numbers for a particular data but it also justifies the results biologically thereby correlating the computational and biological aspects.

Download Full-text

Semantics of Voids within Data: Ignorance-Aware Machine Learning

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040246 ◽

2021 ◽

Vol 10 (4) ◽

pp. 246

Author(s):

Vagan Terziyan ◽

Anton Nikulin

Keyword(s):

Spatial Data ◽

Information Science ◽

Nearest Neighbor ◽

Geographical Information ◽

Two Dimensional ◽

Prototype Selection ◽

Data Space ◽

Important Concern ◽

Traditional Classification ◽

Nearest Neighbor Classifiers

Operating with ignorance is an important concern of geographical information science when the objective is to discover knowledge from the imperfect spatial data. Data mining (driven by knowledge discovery tools) is about processing available (observed, known, and understood) samples of data aiming to build a model (e.g., a classifier) to handle data samples that are not yet observed, known, or understood. These tools traditionally take semantically labeled samples of the available data (known facts) as an input for learning. We want to challenge the indispensability of this approach, and we suggest considering the things the other way around. What if the task would be as follows: how to build a model based on the semantics of our ignorance, i.e., by processing the shape of “voids” within the available data space? Can we improve traditional classification by also modeling the ignorance? In this paper, we provide some algorithms for the discovery and visualization of the ignorance zones in two-dimensional data spaces and design two ignorance-aware smart prototype selection techniques (incremental and adversarial) to improve the performance of the nearest neighbor classifiers. We present experiments with artificial and real datasets to test the concept of the usefulness of ignorance semantics discovery.

Download Full-text

Supervised dimensionality reduction for big data

Nature Communications ◽

10.1038/s41467-021-23102-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Joshua T. Vogelstein ◽

Eric W. Bridgeford ◽

Minh Tang ◽

Da Zheng ◽

Christopher Douville ◽

...

Keyword(s):

Dimensionality Reduction ◽

Data Science ◽

Real Data ◽

Low Rank ◽

Conditional Moment ◽

Desktop Computer ◽

Reduction Techniques ◽

Reduction Methods ◽

The Individual ◽

Low Dimensional

AbstractTo solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees. We introduce an approach to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest version, Linear Optimal Low-rank projection, incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that Linear Optimal Low-Rank Projection and its generalizations lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of more than 150 million features, and several genomics datasets with more than 500,000 features, Linear Optimal Low-Rank Projection outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer.

Download Full-text

BIRDS - Bridging the Gap between Information Science, Information Retrieval and Data Science

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3397271.3401463 ◽

2020 ◽

Author(s):

Ingo Frommholz ◽

Haiming Liu ◽

Massimo Melucci

Keyword(s):

Information Retrieval ◽

Data Science ◽

Information Science ◽

Science Information

Download Full-text

Australian Geographers and Geographical Information Science

Geographical Research ◽

10.1111/j.1745-5871.2012.00777.x ◽

2012 ◽

Vol 50 (4) ◽

pp. 404-410 ◽

Cited By ~ 1

Author(s):

BRIAN LEES

Keyword(s):

Information Science ◽

Geographical Information

Download Full-text

Trivariate Empirical Mode Decomposition via Convex Optimization for Rolling Bearing Condition Identification

Sensors ◽

10.3390/s18072325 ◽

2018 ◽

Vol 18 (7) ◽

pp. 2325 ◽

Cited By ~ 2

Author(s):

Yong Lv ◽

Houzhuang Zhang ◽

Cancan Yi

Keyword(s):

Convex Optimization ◽

Empirical Mode Decomposition ◽

Dimensional Space ◽

Real Data ◽

Rolling Bearing ◽

Low Rank ◽

Data Matrix ◽

Mode Decomposition ◽

Uniform Sample ◽

Local Mean

As a multichannel signal processing method based on data-driven, multivariate empirical mode decomposition (MEMD) has attracted much attention due to its potential ability in self-adaption and multi-scale decomposition for multivariate data. Commonly, the uniform projection scheme on a hypersphere is used to estimate the local mean. However, the unbalanced data distribution in high-dimensional space often conflicts with the uniform samples and its performance is sensitive to the noise components. Considering the common fact that the vibration signal is generated by three sensors located in different measuring positions in the domain of the structural health monitoring for the key equipment, thus a novel trivariate empirical mode decomposition via convex optimization was proposed for rolling bearing condition identification in this paper. For the trivariate data matrix, the low-rank matrix approximation via convex optimization was firstly conducted to achieve the denoising. It is worthy to note that the non-convex penalty function as a regularization term is introduced to enhance the performance. Moreover, the non-uniform sample scheme was determined by applying singular value decomposition (SVD) to the obtained low-rank trivariate data and then the approach used in conventional MEMD algorithm was employed to estimate the local mean. Numerical examples of synthetic defined by the fault model and real data generated by the fault rolling bearing on the experimental bench are provided to demonstrate the fruitful applications of the proposed method.

Download Full-text