Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

Amerah Alghanim; Musfira Jilani; Michela Bertolotto; Gavin McArdle

doi:10.3390/ijgi10070436

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text

Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2020040101 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-26

Author(s):

Naghmeh Moradpoor Sheykhkanloo ◽

Adam Hall

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Third Party ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Imbalanced Dataset ◽

The Impact

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.

Download Full-text

Prediction of Misclassification Data using Cognitive Bayes Computation Techniques (COBACO)

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c7975.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 928-932

Keyword(s):

Machine Learning ◽

Missing Data ◽

Large Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Accuracy Rate ◽

Data Set ◽

Predictive Values ◽

Time Operation

Missing data arise major issues in the large database regarding quantitative analysis. Due to this issues, the inference of the computational process produce bias results, more damage of data, the error rate can increase, and more difficult to accomplish the process of imputation. Prediction of disguised missing data occurs in the large data sets are another major problems in real time operation. Machine learning (ML) techniques to connect with the classification of measurement to enforce the accuracy rate of predictive values. These techniques overcome the various challenges to the problem of losing data. Recent work based on the prediction of misclassification using supervised ML approach; to predict an output for an unseen input with limited parameters in a data set. When increase the size of parameter, then it generates the outcome of less accuracy rate. This article presented a new approach COBACO, an effective supervised machine learning technique. Several strategies describe the classification of predictive techniques for missing data analysis in efficient supervised machine learning techniques. The proposed predictive techniques COBACO generated more precise, accurate results than the other predictive approaches. The Experimental results obtained using both real and synthetic data set show that the proposed approach offers a valuable and promising insight to the problem of prediction of missing information.

Download Full-text

Traditional vs. Machine-Learning Techniques for OSM Quality Assessment

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch022 ◽

2019 ◽

pp. 469-487

Author(s):

Musfira Jilani ◽

Michela Bertolotto ◽

Padraig Corcoran ◽

Amerah Alghanim

Keyword(s):

Machine Learning ◽

Data Quality ◽

Quality Assessment ◽

Spatial Data ◽

Current Data ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Data Quality Assessment ◽

Expensive Process

Nowadays an ever-increasing number of applications require complete and up-to-date spatial data, in particular maps. However, mapping is an expensive process and the vastness and dynamics of our world usually render centralized and authoritative maps outdated and incomplete. In this context crowd-sourced maps have the potential to provide a complete, up-to-date, and free representation of our world. However, the proliferation of such maps largely remains limited due to concerns about their data quality. While most of the current data quality assessment mechanisms for such maps require referencing to authoritative maps, we argue that such referencing of a crowd-sourced spatial database is ineffective. Instead we focus on the use of machine learning techniques that we believe have the potential to not only allow the assessment but also to recommend the improvement of the quality of crowd-sourced maps without referencing to external databases. This chapter gives an overview of these approaches.

Download Full-text

A supervised technique for drill-core mineral mapping using Hyperspectral data

10.5194/egusphere-egu2020-13526 ◽

2020 ◽

Author(s):

Cecilia Contreras ◽

Mahdi Khodadadzadeh ◽

Laura Tusa ◽

Richard Gloaguen

Keyword(s):

Machine Learning ◽

Near Infrared ◽

Hyperspectral Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Drill Core ◽

Data Set ◽

Mineral Mapping ◽

Core Logging ◽

Learning Techniques

Drilling is a key task in exploration campaigns to characterize mineral deposits at depth. Drillcores are first logged in the field by a geologist and with regards to, e.g., mineral assemblages, alteration patterns, and structural features. The core-logging information is then used to locate and target the important ore accumulations and select representative samples that are further analyzed by laboratory measurements (e.g., Scanning Electron Microscopy (SEM), Xray diffraction (XRD), X-ray Fluorescence (XRF)). However, core-logging is a laborious task and subject to the expertise of the geologist. Hyperspectral imaging is a non-invasive and non-destructive technique that is increasingly being used to support the geologist in the analysis of drill-core samples. Nonetheless, the benefit and impact of using hyperspectral data depend on the applied methods. With this in mind, machine learning techniques, which have been applied in different research fields, provide useful tools for an advance and more automatic analysis of the data. Lately, machine learning frameworks are also being implemented for mapping minerals in drill-core hyperspectral data. In this context, this work follows an approach to map minerals on drill-core hyperspectral data using supervised machine learning techniques, in which SEM data, integrated with the mineral liberation analysis (MLA) software, are used in training a classifier. More specifically, the highresolution mineralogical data obtained by SEM-MLA analysis is resampled and co-registered to the hyperspectral data to generate a training set. Due to the large difference in spatial resolution between the SEM-MLA and hyperspectral images, a pre-labeling strategy is required to link these two images at the hyperspectral data spatial resolution. In this study, we use the SEM-MLA image to compute the abundances of minerals for each hyperspectral pixel in the corresponding SEM-MLA region. We then use the abundances as features in a clustering procedure to generate the training labels. In the final step, the generated training set is fed into a supervised classification technique for the mineral mapping over a large area of a drill-core. The experiments are carried out on a visible to near-infrared (VNIR) and shortwave infrared (SWIR) hyperspectral data set and based on preliminary tests the mineral mapping task improves significantly.

Download Full-text

Preliminary Cardiac Disease Risk Prediction Based on Medical and Behavioural Data Set Using Supervised Machine Learning Techniques

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i31/96740 ◽

2016 ◽

Vol 9 (31) ◽

Cited By ~ 3

Author(s):

Thendral Puyalnithi ◽

V. Madhu Viswanatham

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Cardiac Disease ◽

Disease Risk ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Learning Techniques

Download Full-text

Exploring the Efficiency of Various Supervised Machine Learning Techniques to Predict the Heart Disease using Risk Factors

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a1063.1191s19 ◽

2019 ◽

Vol 9 (1S) ◽

pp. 309-312

Keyword(s):

Machine Learning ◽

Health Care ◽

Heart Disease ◽

Major Part ◽

Data Science ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set

Data Science in healthcare is a innovative and capable for industry implementing the data science applications. Data analytics is recent science in to discover the medical data set to explore and discover the disease. It’s a beginning attempt to identify the disease with the help of large amount of medical dataset. Using this data science methodology, it makes the user to find their disease without the help of health care centres. Healthcare and data science are often linked through finances as the industry attempts to reduce its expenses with the help of large amounts of data. Data science and medicine are rapidly developing, and it is important that they advance together. Health care information is very effective in the society. In a human life day to day heart disease had increased. Based on the heart disease to monitor different factors in human body to analyse and prevent the heart disease. To classify the factors using the machine learning algorithms and to predict the disease is major part. Major part of involves machine level based supervised learning algorithm such as SVM, Naviebayes, Decision Trees and Random forest.

Download Full-text

Learning and Explaining the Impact of Enterprises’ Organizational Quality on their Economic Results

Intelligent Data Analysis for Real-Life Applications ◽

10.4018/978-1-4666-1806-0.ch012 ◽

2012 ◽

pp. 228-248 ◽

Cited By ~ 3

Author(s):

Marko Pregeljc ◽

Erik Štrumbelj ◽

Miran Mihelcic ◽

Igor Kononenko

Keyword(s):

Machine Learning ◽

Learning Approaches ◽

Data Set ◽

Economic Interpretation ◽

Organizational Quality ◽

Social Units ◽

The Impact ◽

Insight Into ◽

General Explanation

The authors employed traditional and novel machine learning to improve insight into the connections between the quality of an organization of enterprises as a type of formal social units and the results of enterprises’ performance in this chapter. The analyzed data set contains 72 Slovenian enterprises’ economic results across four years and indicators of their organizational quality. The authors hypothesize that a causal relationship exists between the latter and the former. In the first part of a two-part process, they use several classification algorithms to study these relationships and to evaluate how accurately they predict the target economic results. However, the most successful models were often very complex and difficult to interpret, especially for non-technical users. Therefore, in the second part, the authors take advantage of a novel general explanation method that can be used to explain the influence of individual features on the model’s prediction. Results show that traditional machine-learning approaches are successful at modeling the dependency relationship. Furthermore, the explanation of the influence of the input features on the predicted economic results provides insights that have a meaningful economic interpretation.

Download Full-text

Traditional vs. Machine-Learning Techniques for OSM Quality Assessment

Advances in Geospatial Technologies - Volunteered Geographic Information and the Future of Geospatial Data ◽

10.4018/978-1-5225-2446-5.ch003 ◽

2017 ◽

pp. 47-64

Author(s):

Musfira Jilani ◽

Michela Bertolotto ◽

Padraig Corcoran ◽

Amerah Alghanim

Keyword(s):

Machine Learning ◽

Data Quality ◽

Quality Assessment ◽

Spatial Data ◽

Current Data ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Data Quality Assessment ◽

Expensive Process

Download Full-text

Supervised Machine Learning Techniques for Quality of Transmission Assessment in Optical Networks

2018 20th International Conference on Transparent Optical Networks (ICTON) ◽

10.1109/icton.2018.8473819 ◽

2018 ◽

Cited By ~ 5

Author(s):

Javier Mata ◽

Ignacio de Miguel ◽

Ramon J. Duran ◽

Juan Carlos Aguado ◽

Noemi Merayo ◽

...

Keyword(s):

Machine Learning ◽

Optical Networks ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Quality Of Transmission

Download Full-text

The impact of using large training data set KDD99 on classification accuracy

10.7287/peerj.preprints.2838v1 ◽

2017 ◽

Author(s):

Atilla Özgür ◽

Hamit Erdem

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Training Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Set ◽

Test Dataset ◽

Negative Rate ◽

Positive Rate ◽

The Impact

This study investigates the effects of using a large data set on supervised machine learning classifiers in the domain of Intrusion Detection Systems (IDS). To investigate this effect 12 machine learning algorithms have been applied. These algorithms are: (1) Adaboost, (2) Bayesian Nets, (3) Decision Tables, (4) Decision Trees (J48), (5)Logistic Regression, (6) Multi-Layer Perceptron, (7) Naive Bayes, (8) OneRule, (9)Random Forests, (10) Radial Basis Function Neural Networks, (11) Support Vector Machines (two different training algorithms), and (12) ZeroR. A well-known IDS benchmark dataset, KDD99 has been used to train and test classifiers. Full training data set of KDD99 is 4.9 million instances while full test dataset is 311,000 instances. In contrast to similar previous studies, which used 0.08%–10% for training and 1.2%–100% for testing, this study uses full training dataset and full test dataset. Weka Machine Learning Toolbox has been used for modeling and simulation. The performance of classifiers has been evaluated using standard binary performance metrics: Detection Rate, True Positive Rate, True Negative Rate, False Positive Rate, False Negative Rate, Precision, and F1-Rate. To show effects of dataset size, performance of classifiers has been also evaluated using following hardware metrics: Training Time, Working Memory and Model Size. Test results shows improvements in classifiers in standard performance metrics compared to previous studies.

Download Full-text