Regression Models for Symbolic Interval-Valued Variables

Jose Emmanuel Chacón; Oldemar Rodríguez

doi:10.3390/e23040429

Regression Models for Symbolic Interval-Valued Variables

Entropy ◽

10.3390/e23040429 ◽

2021 ◽

Vol 23 (4) ◽

pp. 429

Author(s):

Jose Emmanuel Chacón ◽

Oldemar Rodríguez

Keyword(s):

Regression Models ◽

Mean Squared Error ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbors ◽

R Language ◽

Squared Error ◽

Vector Machines ◽

Synthetic Datasets ◽

Interval Valued

This paper presents new approaches to fit regression models for symbolic internal-valued variables, which are shown to improve and extend the center method suggested by Billard and Diday and the center and range method proposed by Lima-Neto, E.A.and De Carvalho, F.A.T. Like the previously mentioned methods, the proposed regression models consider the midpoints and half of the length of the intervals as additional variables. We considered various methods to fit the regression models, including tree-based models, K-nearest neighbors, support vector machines, and neural networks. The approaches proposed in this paper were applied to a real dataset and to synthetic datasets generated with linear and nonlinear relations. For an evaluation of the methods, the root-mean-squared error and the correlation coefficient were used. The methods presented herein are available in the the RSDA package written in the R language, which can be installed from CRAN.

Download Full-text

Persian Handwritten Number Recognition Using Adapted Framing Feature and Support Vector Machines

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026816500048 ◽

2016 ◽

Vol 15 (01) ◽

pp. 1650004 ◽

Cited By ~ 3

Author(s):

Hedieh Sajedi ◽

Mehran Bahador

Keyword(s):

Support Vector Machines ◽

Recognition Rate ◽

Nearest Neighbors ◽

Polynomial Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

New Approach ◽

Number Recognition ◽

Vector Machines

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.

Download Full-text

Assessment of Interventions in Fuel Management Zones Using Remote Sensing

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090533 ◽

2020 ◽

Vol 9 (9) ◽

pp. 533 ◽

Cited By ~ 2

Author(s):

Ricardo Afonso ◽

André Neves ◽

Carlos Viegas Damásio ◽

João Moura Pires ◽

Fernando Birra ◽

...

Keyword(s):

Satellite Images ◽

Vegetation Indices ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Support Vector ◽

Fuel Management ◽

K Nearest Neighbors ◽

Management Zones ◽

Vector Machines ◽

Sentinel 2

Every year, wildfires strike the Portuguese territory and are a concern for public entities and the population. To prevent a wildfire progression and minimize its impact, Fuel Management Zones (FMZs) have been stipulated, by law, around buildings, settlements, along national roads, and other infrastructures. FMZs require monitoring of the vegetation condition to promptly proceed with the maintenance and cleaning of these zones. To improve FMZ monitoring, this paper proposes the use of satellite images, such as the Sentinel-1 and Sentinel-2, along with vegetation indices and extracted temporal characteristics (max, min, mean and standard deviation) associated with the vegetation within and outside the FMZs and to determine if they were treated. These characteristics feed machine-learning algorithms, such as XGBoost, Support Vector Machines, K-nearest neighbors and Random Forest. The results show that it is possible to detect an intervention in an FMZ with high accuracy, namely with an F1-score ranging from 90% up to 94% and a Kappa ranging from 0.80 up to 0.89.

Download Full-text

Visualization & Prediction of COVID-19 Future Outbreak by Using Machine Learning

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.03.02 ◽

2021 ◽

Vol 13 (3) ◽

pp. 16-32

Author(s):

Ahmed Hassan Mohammed Hassan ◽

◽

Arfan Ali Mohammed Qasem ◽

Walaa Faisal Mohammed Abdalla ◽

Omer H. Elhassan

Keyword(s):

Machine Learning ◽

Polynomial Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Future Perspective ◽

Support Vector ◽

Squared Error ◽

Vector Machines ◽

The World ◽

Negative Factors

Day by day, the accumulative incidence of COVID-19 is rapidly increasing. After the spread of the Corona epidemic and the death of more than a million people around the world countries, scientists and researchers have tended to conduct research and take advantage of modern technologies to learn machine to help the world to get rid of the Coronavirus (COVID-19) epidemic. To track and predict the disease Machine Learning (ML) can be deployed very effectively. ML techniques have been anticipated in areas that need to identify dangerous negative factors and define their priorities. The significance of a proposed system is to find the predict the number of people infected with COVID19 using ML. Four standard models anticipate COVID-19 prediction, which are Neural Network (NN), Support Vector Machines (SVM), Bayesian Network (BN) and Polynomial Regression (PR). The data utilized to test these models content of number of deaths, newly infected cases, and recoveries in the next 20 days. Five measures parameters were used to evaluate the performance of each model, namely root mean squared error (RMSE), mean squared error (MAE), mean absolute error (MSE), Explained Variance score and r2 score (R2). The significance and value of proposed system auspicious mechanism to anticipate these models for the current cenario of the COVID-19 epidemic. The results showed NN outperformed the other models, while in the available dataset the SVM performs poorly in all the prediction. Reference to our results showed that injuries will increase slightly in the coming days. Also, we find that the results give rise to hope due to the low death rate. For future perspective, case explanation and data amalgamation must be kept up persistently.

Download Full-text

Descripción del movimiento humano basado en el marco de Frenet Serret y datos tipo MOCAP

Revista Politécnica ◽

10.33571/rpolitec.v17n34a11 ◽

2021 ◽

Vol 17 (34) ◽

pp. 170-180

Author(s):

Juan Camilo Hernandez-Gomez ◽

Alejandro Restrepo-Martínez ◽

Juliana Valencia-Aguirre

Keyword(s):

Motion Capture ◽

Nearest Neighbors ◽

Human Movement ◽

The Body ◽

Microsoft Kinect ◽

Support Vector ◽

Motion Capture Data ◽

K Nearest Neighbors ◽

Vector Machines ◽

Optical Markers

Clasificar el movimiento humano se ha convertido en una necesidad tecnológica, en donde para definir la posición de un sujeto requiere identificar el recorrido de las extremidades y el tronco del cuerpo, y tener la capacidad de diferenciar esta posición respecto a otros sujetos o movimientos, generándose la necesidad tener datos y algoritmos que faciliten su clasificación. Es así, como en este trabajo, se evalúa la capacidad discriminante de datos de captura de movimiento en rehabilitación física, donde la posición de los sujetos es adquirida con el Kinect de Microsoft y marcadores ópticos, y atributos del movimiento generados con el marco de Frenet Serret, evaluando su capacidad discriminante con los algoritmos máquinas de soporte vectorial, redes neuronales y k vecinos más cercanos. Los resultados presentan porcentajes de acierto del 93.5% en la clasificación con datos obtenidos del Kinect, y un éxito del 100% para los movimientos con marcadores ópticos. Classify human movement has become a technological necessity, where defining the position of a subject requires identifying the trajectory of the limbs and trunk of the body, having the ability to differentiate this position from other subjects or movements, which generates the need to have data and algorithms that help their classification. Therefore, the discriminant capacity of motion capture data in physical rehabilitation is evaluated, where the position of the subjects is acquired with the Microsoft Kinect and optical markers. Attributes of the movement generated with the Frenet Serret framework. Evaluating their discriminant capacity by means of support vector machines, neural networks, and k nearest neighbors algorithms. The obtained results present an accuracy of 93.5% in the classification with data obtained from the Kinect, and success of 100% for movements where the position is defined with optical markers.

Download Full-text

Cardiovascular Disease Prediction Using Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit20659 ◽

2020 ◽

pp. 46-54

Author(s):

Digvijay Kumar ◽

Bavithra

Keyword(s):

Nearest Neighbors ◽

Support Vector ◽

Disease Prediction ◽

Proper Treatment ◽

Huge Number ◽

K Nearest Neighbors ◽

Vector Machines ◽

The World ◽

Supervised Learning Algorithms ◽

Feasible System

Heart-related diseases or Cardiovascular Diseases (CVDs) are the most common and main reasons for a huge number of deaths in the world, not only in India but in the whole world. So, there is a need for a reliable, accurate, and feasible system to diagnose such diseases in time for proper treatment. This research paper represents the various models based on such algorithms and techniques to analyze their performance. Such as Logistic Regression, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naive Bayes, Random Forest, and ensemble models which are Supervised Learning algorithms. Using various important features that are necessary for the prediction of CVDs (like a person is having CVDs or not), which we will further discuss in this paper.

Download Full-text

Detection of Loss Zones while Drilling Using Different Machine Learning Techniques

Journal of Energy Resources Technology ◽

10.1115/1.4051553 ◽

2021 ◽

pp. 1-29

Author(s):

Ahmed Alsaihati ◽

Mahmoud Abughaban ◽

Salaheldin Elkatatny ◽

Abdulazeez Abdulraheem

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Random Forests ◽

Nearest Neighbors ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Vector Machines ◽

Testing Set

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.

Download Full-text

Predictive ability of Random Forests, Boosting, Support Vector Machines and Genomic Best Linear Unbiased Prediction in different scenarios of genomic evaluation

Animal Production Science ◽

10.1071/an15538 ◽

2017 ◽

Vol 57 (2) ◽

pp. 229 ◽

Cited By ~ 11

Author(s):

Farhad Ghafouri-Kesbi ◽

Ghodratollah Rahimi-Mianji ◽

Mahmood Honarvar ◽

Ardeshir Nejati-Javaremi

Keyword(s):

Mean Squared Error ◽

Predictive Accuracy ◽

Computational Time ◽

Support Vector ◽

Genomic Evaluation ◽

Linear Unbiased Prediction ◽

Squared Error ◽

Vector Machines ◽

Best Linear Unbiased ◽

Qtl Effects

Three machine learning algorithms: Random Forests (RF), Boosting and Support Vector Machines (SVM) as well as Genomic Best Linear Unbiased Prediction (GBLUP) were used to predict genomic breeding values (GBV) and their predictive performance was compared in different combinations of heritability (0.1, 0.3, and 0.5), number of quantitative trait loci (QTL) (100, 1000) and distribution of QTL effects (normal, uniform and gamma). To this end, a genome comprised of five chromosomes, one Morgan each, was simulated on which 10000 bi-allelic single nucleotide polymorphisms were distributed. Pearson’s correlation between the true and predicted GBV and Mean Squared Error of GBV prediction were used, respectively, as measures of the predictive accuracy and the overall fit achieved with each method. In all methods, an increase in accuracy of prediction was seen following increase in heritability and decrease in the number of QTL. GBLUP had better predictive accuracy than machine learning methods in particular in the scenarios of higher number of QTL and normal and uniform distributions of QTL effects; though in most cases, the differences were non-significant. In the scenarios of small number of QTL and gamma distribution of QTL effects, Boosting outperformed other methods. Regarding Mean Squared Error of GBV prediction, in most cases Boosting outperformed other methods, although the estimates were close to that of GBLUP. Among methods studied, SVM with 0.6 gigabytes (GIG) was the most efficient user of memory followed by RF, GBLUP and Boosting with 1.2-GIG, 1.3-GIG and 2.3-GIG memory requirements, respectively. Regarding computational time, GBLUP, SVM, RF and Boosting ranked first, second, third and last with 10 min, 15 min, 75 min and 600 min, respectively. It was concluded that although stochastic gradient Boosting can predict GBV with high prediction accuracy, significantly longer computational time and memory requirement can be a serious limitation for this algorithm. Therefore, using of other variants of Boosting such as Random Boosting was recommended for genomic evaluation.

Download Full-text

Using Linear Discriminant Analysis for Dimensionality Reduction for Predicting Anomalies of BGP data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2159.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1989-1995

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Nearest Neighbors ◽

Support Vector ◽

Multi Layer Perceptron ◽

High Dimensions ◽

K Nearest Neighbors ◽

Linear Discriminant ◽

Data Packets ◽

Vector Machines

Border Gateway Protocol (BGP) is a vital protocol on the internet for transfer of data packets among Autonomous System (AS). Security is a major concern for the transmission of BGP packets which are often attacked by worms or are hijacked by an attacker which results in requests entering black holes or loss of connection to the particular sites. The BGP anomalies can be reduced by analyzing the BGP datasets. Since, ASes communicate through messages, therefore, the anomalies can be reduced by identifying the corrupted BGP message in the dataset. In this paper, BGP anomalies have been classified by applying Machine learning (ML) algorithms. The dataset contains information about the sending and receiving time between ASes. The classifiers were used to predict the anomalies. Since the dataset had high dimensions, the dimensions were reduced using Linear Discriminant Analysis (LDA) and then Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Linear Regression, Logistic Regression and Multi-Layer Perceptron (MLP) have been used to classify the anomalies.

Download Full-text

APTITUDE Framework for Learning Data Classification Based on Machine Learning

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.51 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Data Classification ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Course Content ◽

Applied Model ◽

Vector Machines ◽

Learning Data

Learning analytics refers to the machine learning to provide predictions of learner success and prescriptions to learners and teachers. The main goal of paper is to proposed APTITUDE framework for learning data classification in order to achieve an adaptation and recommendations a course content or flow of course activities. This framework has applied model for student learning prediction based on machine learning. The five machine learning algorithms are used to provide learning data classification: random forest, Naïve Bayes, k-nearest neighbors, logistic regression and support vector machines

Download Full-text

A Method for Greenhouse Temperature Prediction Based on XGBoost Algorithm and Linear Residual Model

CONVERTER ◽

10.17762/converter.271 ◽

2021 ◽

pp. 108-121

Author(s):

Huijin Han, Et al.

Keyword(s):

Mean Squared Error ◽

Prediction Method ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Small Scale ◽

Temperature Prediction ◽

Precise Control ◽

Squared Error ◽

Vector Machines ◽

Better Than

Temperature prediction is significant for precise control of the greenhouse environment. Traditional machine learning methods usually rely on a large amount of data. Therefore, it is difficult to make a stable and accurate prediction based on a small amount of data. This paper proposes a temperature prediction method for greenhouses. With the prediction target transformed to the logarithmic difference of temperature inside and outside the greenhouse,the method first uses XGBoost algorithm to make a preliminary prediction. Second, a linear model is used to predict the residuals of the predicted target. The predicted temperature is obtained combining the preliminary prediction and the residuals. Based on the 20-day greenhouse data, the results show that the target transformation applied in our method is better than the others presented in the paper. The MSE (Mean Squared Error) of our method is 0.0844, which is respectively 20.7%, 76.0%, 10.2%, and 95.3% of the MSE of LR (Logistic Regression), SGD (Stochastic Gradient Descent), SVM (Support Vector Machines), and XGBoost algorithm. The results indicate that our method significantly improves the accuracy of the prediction based on the small-scale data.

Download Full-text