Geographic information. Data quality measures

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text

Data quality measures based on granular computing for multi-label classification

Information Sciences ◽

10.1016/j.ins.2021.01.027 ◽

2021 ◽

Vol 560 ◽

pp. 51-67

Author(s):

Marilyn Bello ◽

Gonzalo Nápoles ◽

Koen Vanhoof ◽

Rafael Bello

Keyword(s):

Data Quality ◽

Granular Computing ◽

Quality Measures

Download Full-text

Corporate Editors in the Evolving Landscape of OpenStreetMap

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8050232 ◽

2019 ◽

Vol 8 (5) ◽

pp. 232 ◽

Cited By ~ 6

Author(s):

Jennings Anderson ◽

Dipto Sarkar ◽

Leysia Palen

Keyword(s):

Data Quality ◽

Road Networks ◽

Volunteered Geographic Information ◽

Geographic Information ◽

Points Of Interest ◽

The World ◽

History Of ◽

Multiple Uses ◽

Map Data ◽

Information Project

OpenStreetMap (OSM), the largest Volunteered Geographic Information project in the world, is characterized both by its map as well as the active community of the millions of mappers who produce it. The discourse about participation in the OSM community largely focuses on the motivations for why members contribute map data and the resulting data quality. Recently, large corporations including Apple, Microsoft, and Facebook have been hiring editors to contribute to the OSM database. In this article, we explore the influence these corporate editors are having on the map by first considering the history of corporate involvement in the community and then analyzing historical quarterly-snapshot OSM-QA-Tiles to show where and what these corporate editors are mapping. Cumulatively, millions of corporate edits have a global footprint, but corporations vary in geographic reach, edit types, and quantity. While corporations currently have a major impact on road networks, non-corporate mappers edit more buildings and points-of-interest: representing the majority of all edits, on average. Since corporate editing represents the latest stage in the evolution of corporate involvement, we raise questions about how the OSM community—and researchers—might proceed as corporate editing grows and evolves as a mechanism for expanding the map for multiple uses.

Download Full-text

Quality measures of the population-based Finnish Cancer Registry indicate sound data quality for solid malignant tumours

European Journal of Cancer ◽

10.1016/j.ejca.2017.02.017 ◽

2017 ◽

Vol 77 ◽

pp. 31-39 ◽

Cited By ~ 62

Author(s):

Maarit K. Leinonen ◽

Joonas Miettinen ◽

Sanna Heikkinen ◽

Janne Pitkäniemi ◽

Nea Malila

Keyword(s):

Data Quality ◽

Cancer Registry ◽

Quality Measures ◽

Population Based ◽

Malignant Tumours ◽

Sound Data

Download Full-text

Trap questions in online surveys: Results from three web survey experiments

International Journal of Market Research ◽

10.1177/1470785317744856 ◽

2018 ◽

Vol 60 (1) ◽

pp. 32-49 ◽

Cited By ~ 7

Author(s):

Mingnan Liu ◽

Laura Wronski

Keyword(s):

Data Quality ◽

Quality Measures ◽

Web Survey ◽

Close Attention ◽

Online Surveys ◽

The Third ◽

Survey Questions ◽

Survey Experiments ◽

Quality Checks ◽

Level Of Difficulty

This study examines the use of trap questions as indicators of data quality in online surveys. Trap questions are intended to identify respondents who are not paying close attention to survey questions, which would mean that they are providing sub-optimal responses to not only the trap question itself but to other questions included in the survey. We conducted three experiments using an online non-probability panel. In the first experiment, we examine whether there is any difference in responses to surveys with one trap question as those that have two trap questions. In the second study, we examine responses to surveys with trap questions of varying difficulty. In the third experiment, we test the level of difficulty, the placement of the trap question, and other forms of attention checks. In all studies, we correlate the responses to the trap question(s) with other data quality checks, most of which were derived from the literature on satisficing. Also, we compare the responses to several substance questions by the response to the trap questions. This would tell us whether participants who failed the trap questions gave consistently different answers from those who passed the trap questions. We find that the rate of passing/failing various trap questions varies widely, from 27% to 87% among the types we tested. We also find evidence that some types of trap questions are more significantly correlated with other data quality measures.

Download Full-text

The Spatial-Comprehensiveness (S-COM) Index: Identifying Optimal Spatial Extents in Volunteered Geographic Information Point Datasets

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090497 ◽

2020 ◽

Vol 9 (9) ◽

pp. 497

Author(s):

Haydn Lawrence ◽

Colin Robertson ◽

Rob Feick ◽

Trisalyn Nelson

Keyword(s):

Social Media ◽

Data Quality ◽

Spatial Scales ◽

Volunteered Geographic Information ◽

Geographic Information ◽

Quality Metric ◽

Fine Grained ◽

Multi Scale ◽

Feasible Study

Social media and other forms of volunteered geographic information (VGI) are used frequently as a source of fine-grained big data for research. While employing geographically referenced social media data for a wide array of purposes has become commonplace, the relevant scales over which these data apply to is typically unknown. For researchers to use VGI appropriately (e.g., aggregated to areal units (e.g., neighbourhoods) to elicit key trend or demographic information), general methods for assessing the quality are required, particularly, the explicit linkage of data quality and relevant spatial scales, as there are no accepted standards or sampling controls. We present a data quality metric, the Spatial-comprehensiveness Index (S-COM), which can delineate feasible study areas or spatial extents based on the quality of uneven and dynamic geographically referenced VGI. This scale-sensitive approach to analyzing VGI is demonstrated over different grains with data from two citizen science initiatives. The S-COM index can be used both to assess feasible study extents based on coverage, user-heterogeneity, and density and to find feasible sub-study areas from a larger, indefinite area. The results identified sub-study areas of VGI for focused analysis, allowing for a larger adoption of a similar methodology in multi-scale analyses of VGI.

Download Full-text

The geographic information systems spatial data quality evaluation in the Metamodel context

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/698/4/044018 ◽

2019 ◽

Vol 698 ◽

pp. 044018

Author(s):

T V Shuchkevich

Keyword(s):

Information Systems ◽

Geographic Information Systems ◽

Data Quality ◽

Spatial Data ◽

Quality Evaluation ◽

Geographic Information ◽

Spatial Data Quality

Download Full-text