Geographic information. Data quality measures

2015 ◽  
2012 ◽  
Author(s):  
Nurul A. Emran ◽  
Noraswaliza Abdullah ◽  
Nuzaimah Mustafa

2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


2021 ◽  
Vol 560 ◽  
pp. 51-67
Author(s):  
Marilyn Bello ◽  
Gonzalo Nápoles ◽  
Koen Vanhoof ◽  
Rafael Bello

2019 ◽  
Vol 8 (5) ◽  
pp. 232 ◽  
Author(s):  
Jennings Anderson ◽  
Dipto Sarkar ◽  
Leysia Palen

OpenStreetMap (OSM), the largest Volunteered Geographic Information project in the world, is characterized both by its map as well as the active community of the millions of mappers who produce it. The discourse about participation in the OSM community largely focuses on the motivations for why members contribute map data and the resulting data quality. Recently, large corporations including Apple, Microsoft, and Facebook have been hiring editors to contribute to the OSM database. In this article, we explore the influence these corporate editors are having on the map by first considering the history of corporate involvement in the community and then analyzing historical quarterly-snapshot OSM-QA-Tiles to show where and what these corporate editors are mapping. Cumulatively, millions of corporate edits have a global footprint, but corporations vary in geographic reach, edit types, and quantity. While corporations currently have a major impact on road networks, non-corporate mappers edit more buildings and points-of-interest: representing the majority of all edits, on average. Since corporate editing represents the latest stage in the evolution of corporate involvement, we raise questions about how the OSM community—and researchers—might proceed as corporate editing grows and evolves as a mechanism for expanding the map for multiple uses.


2017 ◽  
Vol 77 ◽  
pp. 31-39 ◽  
Author(s):  
Maarit K. Leinonen ◽  
Joonas Miettinen ◽  
Sanna Heikkinen ◽  
Janne Pitkäniemi ◽  
Nea Malila

2018 ◽  
Vol 60 (1) ◽  
pp. 32-49 ◽  
Author(s):  
Mingnan Liu ◽  
Laura Wronski

This study examines the use of trap questions as indicators of data quality in online surveys. Trap questions are intended to identify respondents who are not paying close attention to survey questions, which would mean that they are providing sub-optimal responses to not only the trap question itself but to other questions included in the survey. We conducted three experiments using an online non-probability panel. In the first experiment, we examine whether there is any difference in responses to surveys with one trap question as those that have two trap questions. In the second study, we examine responses to surveys with trap questions of varying difficulty. In the third experiment, we test the level of difficulty, the placement of the trap question, and other forms of attention checks. In all studies, we correlate the responses to the trap question(s) with other data quality checks, most of which were derived from the literature on satisficing. Also, we compare the responses to several substance questions by the response to the trap questions. This would tell us whether participants who failed the trap questions gave consistently different answers from those who passed the trap questions. We find that the rate of passing/failing various trap questions varies widely, from 27% to 87% among the types we tested. We also find evidence that some types of trap questions are more significantly correlated with other data quality measures.


2020 ◽  
Vol 9 (9) ◽  
pp. 497
Author(s):  
Haydn Lawrence ◽  
Colin Robertson ◽  
Rob Feick ◽  
Trisalyn Nelson

Social media and other forms of volunteered geographic information (VGI) are used frequently as a source of fine-grained big data for research. While employing geographically referenced social media data for a wide array of purposes has become commonplace, the relevant scales over which these data apply to is typically unknown. For researchers to use VGI appropriately (e.g., aggregated to areal units (e.g., neighbourhoods) to elicit key trend or demographic information), general methods for assessing the quality are required, particularly, the explicit linkage of data quality and relevant spatial scales, as there are no accepted standards or sampling controls. We present a data quality metric, the Spatial-comprehensiveness Index (S-COM), which can delineate feasible study areas or spatial extents based on the quality of uneven and dynamic geographically referenced VGI. This scale-sensitive approach to analyzing VGI is demonstrated over different grains with data from two citizen science initiatives. The S-COM index can be used both to assess feasible study extents based on coverage, user-heterogeneity, and density and to find feasible sub-study areas from a larger, indefinite area. The results identified sub-study areas of VGI for focused analysis, allowing for a larger adoption of a similar methodology in multi-scale analyses of VGI.


Sign in / Sign up

Export Citation Format

Share Document