A Comparative Analysis of Machine/Deep Learning Models for Parking Space Availability Prediction

Faraz Malik Awan; Yasir Saleem; Roberto Minerva; Noel Crespi

doi:10.3390/s20010322

A Comparative Analysis of Machine/Deep Learning Models for Parking Space Availability Prediction

Sensors ◽

10.3390/s20010322 ◽

2020 ◽

Vol 20 (1) ◽

pp. 322 ◽

Cited By ~ 9

Author(s):

Faraz Malik Awan ◽

Yasir Saleem ◽

Roberto Minerva ◽

Noel Crespi

Keyword(s):

Deep Learning ◽

Comparative Analysis ◽

Random Forest ◽

Decision Tree ◽

Multilayer Perceptron ◽

Large Data ◽

Data Sets ◽

Application Domain ◽

Parking Space ◽

Data Set

Machine/Deep Learning (ML/DL) techniques have been applied to large data sets in order to extract relevant information and for making predictions. The performance and the outcomes of different ML/DL algorithms may vary depending upon the data sets being used, as well as on the suitability of algorithms to the data and the application domain under consideration. Hence, determining which ML/DL algorithm is most suitable for a specific application domain and its related data sets would be a key advantage. To respond to this need, a comparative analysis of well-known ML/DL techniques, including Multilayer Perceptron, K-Nearest Neighbors, Decision Tree, Random Forest, and Voting Classifier (or the Ensemble Learning Approach) for the prediction of parking space availability has been conducted. This comparison utilized Santander’s parking data set, initiated while working on the H2020 WISE-IoT project. The data set was used in order to evaluate the considered algorithms and to determine the one offering the best prediction. The results of this analysis show that, regardless of the data set size, the less complex algorithms like Decision Tree, Random Forest, and KNN outperform complex algorithms such as Multilayer Perceptron, in terms of higher prediction accuracy, while providing comparable information for the prediction of parking space availability. In addition, in this paper, we are providing Top-K parking space recommendations on the basis of distance between current position of vehicles and free parking spots.

Download Full-text

Decision tree classification algorithm for non-equilibrium data set based on random forests

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-179937 ◽

2020 ◽

Vol 39 (2) ◽

pp. 1639-1648

Author(s):

Peng Wang ◽

Ningchao Zhang

Keyword(s):

Random Forest ◽

Decision Tree ◽

Wavelet Packet ◽

Classification Algorithm ◽

Data Sets ◽

Data Set ◽

Redundant Data ◽

Decision Tree Classification ◽

Equilibrium Data ◽

Non Equilibrium

In order to overcome the problems of poor accuracy and high complexity of current classification algorithm for non-equilibrium data set, this paper proposes a decision tree classification algorithm for non-equilibrium data set based on random forest. Wavelet packet decomposition is used to denoise non-equilibrium data, and SNM algorithm and RFID are combined to remove redundant data from data sets. Based on the results of data processing, the non-equilibrium data sets are classified by random forest method. According to Bootstrap resampling method with certain constraints, the majority and minority samples of each sample subset are sampled, CART is used to train the data set, and a decision tree is constructed. Obtain the final classification results by voting on the CART decision tree classification. Experimental results show that the proposed algorithm has the characteristics of high classification accuracy and low complexity, and it is a feasible classification algorithm for non-equilibrium data set.

Download Full-text

Research on Random Forest Algorithm Based on Big Data in Parallel Load Forecasting

MATEC Web of Conferences ◽

10.1051/matecconf/201822801020 ◽

2018 ◽

Vol 228 ◽

pp. 01020 ◽

Cited By ~ 1

Author(s):

Qingqing Liu

Keyword(s):

Random Forest ◽

Decision Tree ◽

Prediction Accuracy ◽

Load Forecasting ◽

Large Data ◽

Prototype System ◽

Data Sets ◽

Random Forest Algorithm ◽

Cluster Management ◽

Forecasting Method

The paper propose a parallel load forecasting method based on random forest algorithm, through the analysis of historical load, temperature, wind speed and other data, the algorithm can shorten the load forecasting time and improve the processing capability of large data. This paper also designs and implements parallel load forecasting prototype system based on power user side large data of a Hadoop, including data cluster management, data management, prediction classification algorithm library and other functions. The experimental results show that the accuracy of parallel stochastic forest algorithm is obviously higher than decision tree, and the prediction accuracy on the different data sets is generally higher than decision tree, and it can better analyze and process large data.

Download Full-text

Human Action Recognition Based on Multiple Features and Modified Deep Learning Model

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420550228 ◽

2020 ◽

Vol 34 (10) ◽

pp. 2055022 ◽

Cited By ~ 1

Author(s):

Shaoping Zhu ◽

Yongliang Xiao ◽

Weimin Ma

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Large Data ◽

Human Action Recognition ◽

Human Action ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Multiple Features ◽

Deep Learning Model

In order to improve the accuracy of human action recognition in video and the computational efficiency of large data sets, an action recognition algorithm based on multiple features and modified deep learning model is proposed. First, the deep network pre-training process is used to learn and optimize the RBM parameters, and the deep belief nets (DBN) model is constructed through deep learning. Then, human 13 joint points and critical points of optical flow are automatically extracted by DBN model. Second, these more abstract and more effective human motion features are combined to represent human actions. Ultimately, the entire DBN network structure is fine-tuned by support vector machine (SVM) algorithm to classify human actions. We demonstrate that human 13 joint points and critical points of optical flow are two very effective human action characterizations, our proposed approach greatly reduces the required samples, and shortens the training time of the samples, can efficiently process large data sets and can effectively recognize novel actions. We performed experiments on the KTH data set, Weizmann data set, the ballet data set and UCF101 data set to evaluate the proposed method, the experiment results show that the average recognition accuracy is over 98%, which validates its effectiveness, and show that our results are stable, reliable, and significantly better than the results of two state-of-the-art approaches on four different data sets. So, it lays a good theoretical foundation for practical applications.

Download Full-text

SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with unparalleled generalization performance

10.1101/636472 ◽

2019 ◽

Cited By ~ 2

Author(s):

Hui Kwon Kim ◽

Younggwang Kim ◽

Sungtae Lee ◽

Seonwoo Min ◽

Jung Yoon Bae ◽

...

Keyword(s):

Deep Learning ◽

High Throughput ◽

Human Cell ◽

Large Data ◽

Data Sets ◽

Target Sequence ◽

Activity Prediction ◽

Data Set ◽

Generalization Performance ◽

Cell Library

AbstractWe evaluated SpCas9 activities at 12,832 target sequences using a high-throughput approach based on a human cell library containing sgRNA-encoding and target sequence pairs. Deep learning-based training on this large data set of SpCas9-induced indel frequencies led to the development of a SpCas9-activity predicting model named DeepSpCas9. When tested against independently generated data sets (our own and those published by other groups), DeepSpCas9 showed unprecedentedly high generalization performance. DeepSpCas9 is available athttp://deepcrispr.info/DeepCas9.

Download Full-text

Soil Quality Analysis and Crop Fertility Prediction

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0280 ◽

2020 ◽

pp. 189-193

Keyword(s):

Data Mining ◽

Decision Tree ◽

Large Data ◽

Soil Classification ◽

Quality Analysis ◽

Soil Parameters ◽

Data Sets ◽

Data Set ◽

Nutrient Analysis ◽

Tree Algorithms

Data Mining is a technique used to retrieve information for the analysis and discovery of hidden trends in large data sets. Data Mining extends to numerous areas such as education, banking, marketing, retail, communications and agriculture. Agriculture is the backbone of country’s economy. It is the important source of livelihood. Agriculture depends primarily on the weather, geology, soil and biology. Agricultural Mining is a technology that can contribute information for the growth of agriculture. The current study presents the various techniques of data mining, and their role in soil fertility, nutrient analysis. Decision tree is a well-known Data Mining classification approach. C4.5 and Classification and Regression Trees (ID3) are two widely used decision tree algorithms for classification. The C4.5, ID3 and the proposed classifier have been trained using the soil sample data set by taking into account the optimal soil parameters pH (hydrogen power), EC (electrical conductivity) and ESP (exchangeable sodium percentage). The model is evaluated using a collection of soil samples test results. Classification of soil is the division of soil into classes or groups each having similar characteristics and likely similar behavior. Soil classification is easy to allow the farmer to know the type of soil and to plough the crops based on the soil type.

Download Full-text

Random Forest Regression models for Lactation and Successful Insemination in Holstein Friesian cows

10.1101/2020.11.17.386318 ◽

2020 ◽

Author(s):

Lillian Oluoch ◽

László Stachó ◽

László Viharos ◽

Andor Viharos ◽

Edit Mikó

Keyword(s):

Random Forest ◽

Milk Production ◽

Production Control ◽

Large Data ◽

Data Sets ◽

Random Forest Regression ◽

Data Set ◽

Explanatory Variables ◽

Holstein Friesian ◽

Reliable Model

AbstractTo overcome well-known difficulties in establishing reliable models based on large data sets, the Random Forest Regression (RFR) method is applied to study economical breeding and milk production of dairy cows. As for the features of RFR, there are several positive experiences in various areas of applications supporting that with RFR one can achieve reliable model predictions for industrial production of any product providing a useful base for decisions. In this study, a data set of a period of ten years including about eighty thousand cows was analysed by means of RFR. Ranking of production control parameters is obtained, the most important explanatory variables are found by computing the variances of the target variable on the sets created during the training phases of the RFR. Predictions are made for the milk production and the conception of the calves with high accuracy on given data and simulations are used to investigate prediction accuracy. This paper is primarily concerned with the mathematical aspects of a forthcoming work focused on the agricultural viewpoints. As for future mathematical research plans, the results will be compared with models based on factor analysis and linear regression.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Generation of geometric interpolations of building types with deep variational autoencoders

Design Science ◽

10.1017/dsj.2020.31 ◽

2020 ◽

Vol 6 ◽

Author(s):

Jaime de Miguel Rodríguez ◽

Maria Eugenia Villafañe ◽

Luka Piškorec ◽

Fernando Sancho Caparrini

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Large Data ◽

Learning Model ◽

Large Data Sets ◽

Data Sets ◽

Connectivity Map ◽

Data Set ◽

3D Objects ◽

Machine Learning Model

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.

Download Full-text