A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

Hamid Saadatfar; Samiyeh Khosravi; Javad Hassannataj Joloudari; Amir Mosavi; Shahaboddin Shamshirband

doi:10.3390/math8020286

A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

Mathematics ◽

10.3390/math8020286 ◽

2020 ◽

Vol 8 (2) ◽

pp. 286 ◽

Cited By ~ 8

Author(s):

Hamid Saadatfar ◽

Samiyeh Khosravi ◽

Javad Hassannataj Joloudari ◽

Amir Mosavi ◽

Shahaboddin Shamshirband

Keyword(s):

Big Data ◽

Classification Accuracy ◽

Learning Algorithm ◽

Computational Cost ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Parametric Classification ◽

Efficient Data ◽

Data Pruning ◽

Selection Of

The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.

Download Full-text

A Reformed K-Nearest Neighbors Algorithm for Big Data Sets

Journal of Computer Science ◽

10.3844/jcssp.2018.1213.1225 ◽

2018 ◽

Vol 14 (9) ◽

pp. 1213-1225 ◽

Cited By ~ 2

Author(s):

Vo Ngoc Phu ◽

Vo Thi Ngoc Tran

Keyword(s):

Big Data ◽

Nearest Neighbors ◽

Data Sets ◽

K Nearest Neighbors

Download Full-text

Cost-Efficient Data Collection Approach Using K-Nearest Neighbors in a 3D Sensor Network

2010 Eleventh International Conference on Mobile Data Management ◽

10.1109/mdm.2010.59 ◽

2010 ◽

Cited By ~ 3

Author(s):

Prem Prakash Jayaraman ◽

Arkady Zaslavsky ◽

Jerker Delsing

Keyword(s):

Data Collection ◽

Sensor Network ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Efficient Data ◽

Cost Efficient ◽

Data Collection Approach ◽

Efficient Data Collection

Download Full-text

Bio-Inspired Optimization Algorithms for Arabic Handwritten Characters

Handbook of Research on Machine Learning Innovations and Trends - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2229-4.ch039 ◽

2017 ◽

pp. 897-914 ◽

Cited By ~ 3

Author(s):

Ahmed.T. Sahlol ◽

Aboul Ella Hassanien

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Random Forests ◽

Classification Accuracy ◽

Processing Time ◽

Optimization Algorithms ◽

Nearest Neighbors ◽

Benchmark Dataset ◽

K Nearest Neighbors ◽

Linear Discriminant

There are still many obstacles for achieving high recognition accuracy for Arabic handwritten optical character recognition system, each character has a different shape, as well as the similarities between characters. In this chapter, several feature selection-based bio-inspired optimization algorithms including Bat Algorithm, Grey Wolf Optimization, Whale optimization Algorithm, Particle Swarm Optimization and Genetic Algorithm have been presented and an application of Arabic handwritten characters recognition has been chosen to see their ability and accuracy to recognize Arabic characters. The experiments have been performed using a benchmark dataset, CENPARMI by k-Nearest neighbors, Linear Discriminant Analysis, and random forests. The achieved results show superior results for the selected features when comparing the classification accuracy for the selected features by the optimization algorithms with the whole feature set in terms of the classification accuracy and the processing time. The experiments have been performed using a benchmark dataset, CENPARMI by k-Nearest neighbors, Linear Discriminant Analysis, and random forests. The achieved results show superior results for the selected features when comparing the classification accuracy for the selected features by the optimization algorithms with the whole feature set in terms of the classification accuracy and the processing time.

Download Full-text

Selection of Best K of K-Nearest Neighbors Classifier for Enhancement of Performance for the Prediction of Diabetes

Advances in Intelligent Systems and Computing - Progress in Advanced Computing and Intelligent Engineering ◽

10.1007/978-981-33-4299-6_11 ◽

2021 ◽

pp. 135-142

Author(s):

Subhash Chandra Gupta ◽

Noopur Goel

Keyword(s):

Nearest Neighbors ◽

K Nearest Neighbors ◽

Prediction Of Diabetes ◽

Selection Of

Download Full-text

A preliminary study on Hybrid Spill-Tree Fuzzy k-Nearest Neighbors for big data classification

2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2018.8491595 ◽

2018 ◽

Cited By ~ 3

Author(s):

Jesus Maillo ◽

Julian Luengo ◽

Salvador Garcia ◽

Francisco Herrera ◽

Isaac Triguero

Keyword(s):

Big Data ◽

Data Classification ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Preliminary Study ◽

Big Data Classification

Download Full-text

Fast and Scalable Approaches to Accelerate the Fuzzy k-Nearest Neighbors Classifier for Big Data

IEEE Transactions on Fuzzy Systems ◽

10.1109/tfuzz.2019.2936356 ◽

2020 ◽

Vol 28 (5) ◽

pp. 874-886 ◽

Cited By ~ 1

Author(s):

Jesus Maillo ◽

Salvador Garcia ◽

Julian Luengo ◽

Francisco Herrera ◽

Isaac Triguero

Keyword(s):

Big Data ◽

Nearest Neighbors ◽

K Nearest Neighbors

Download Full-text

System Model for Prediction Analytics Using K-Nearest Neighbors Algorithm

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8536 ◽

2019 ◽

Vol 16 (10) ◽

pp. 4425-4430 ◽

Cited By ~ 1

Author(s):

Devendra Prasad ◽

Sandip Kumar Goyal ◽

Avinash Sharma ◽

Amit Bindal ◽

Virendra Singh Kushwah

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Research Work ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

System Model ◽

K Nearest Neighbors ◽

Prediction Analysis ◽

Pros And Cons

Machine Learning is a growing area in computer science in today’s era. This article is focusing on prediction analysis using K-Nearest Neighbors (KNN) Machine Learning algorithm. Data in the dataset are processed, analyzed and predicated using the specified algorithm. Introduction of various Machine Learning algorithms, its pros and cons have been discussed. The KNN algorithm with detail study is given and it is implemented on the specified data with certain parameters. The research work elucidates prediction analysis and explicates the prediction of quality of restaurants.

Download Full-text

Application Machine Learning in Construction Management

TEM Journal ◽

10.18421/tem103-48 ◽

2021 ◽

pp. 1385-1389

Author(s):

Phong Thanh Nguyen

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Price Index ◽

Learning Algorithm ◽

Consumer Price Index ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Economic Variables ◽

K Nearest Neighbors ◽

Socio Economic Variables

Machine Learning is a subset and technology developed in the field of Artificial Intelligence (AI). One of the most widely used machine learning algorithms is the K-Nearest Neighbors (KNN) approach because it is a supervised learning algorithm. This paper applied the K-Nearest Neighbors (KNN) algorithm to predict the construction price index based on Vietnam's socio-economic variables. The data to build the prediction model was from the period 2016 to 2019 based on seven socio-economic variables that impact the construction price index (i.e., industrial production, construction investment capital, Vietnam’s stock price index, consumer price index, foreign exchange rate, total exports, and imports). The research results showed that the construction price index prediction model based on the K-Nearest Neighbors (KNN) regression method has fewer errors than the traditional method.

Download Full-text

Improving an AI-Based Algorithm to Automatically Generate Concept Maps

Computer and Information Science ◽

10.5539/cis.v12n4p72 ◽

2019 ◽

Vol 12 (4) ◽

pp. 72

Author(s):

Sara Alomari ◽

Salha Abdullah

Keyword(s):

Text Analysis ◽

Classification Accuracy ◽

Concept Maps ◽

Concept Map ◽

Nearest Neighbors ◽

Learning Method ◽

K Nearest Neighbors ◽

Intelligent Algorithms ◽

Effective Learning

Concept maps have been used to assist learners as an effective learning method in identifying relationships between information, especially when teaching materials have many topics or concepts. However, making a manual concept map is a long and tedious task. It is time-consuming and demands an intensive effort in reading the full content and reasoning the relationships among concepts. Due to this inefficiency, many studies are carried out to develop intelligent algorithms using several data mining techniques. In this research, the authors aim at improving Text Analysis-Association Rules Mining (TA-ARM) algorithm using the weighted K-nearest neighbors (KNN) algorithm instead of the traditional KNN. The weighted KNN is expected to optimize the classification accuracy, which will, eventually, enhance the quality of the generated concept map.

Download Full-text

The Application of the K-Nearest Neighbors Method as A Recommendation for The Selection of Departments in Higher Education Based on The Results of Multiple Intelligence Tests

Journal of Physics Conference Series ◽

10.1088/1742-6596/1464/1/012024 ◽

2020 ◽

Vol 1464 ◽

pp. 012024

Author(s):

N Nuswantari ◽

Y F Rachman ◽

P W D Setiawan ◽

W D Prakoso

Keyword(s):

Higher Education ◽

Nearest Neighbors ◽

Multiple Intelligence ◽

Intelligence Tests ◽

K Nearest Neighbors ◽

Selection Of

Download Full-text