scholarly journals Improved KNN Algorithm Based on Preprocessing of Center in Smart Cities

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Haiyan Wang ◽  
Peidi Xu ◽  
Jinghua Zhao

The KNN algorithm is one of the most famous algorithms in machine learning and data mining. It does not preprocess the data before classification, which leads to longer time and more errors. To solve the problems, this paper first proposes a PK-means++ algorithm, which can better ensure the stability of a random experiment. Then, based on it and spherical region division, an improved KNNPK+ is proposed. The algorithm can select the center of the spherical region appropriately and then construct an initial classifier for the training set to improve the accuracy and time of classification.

Author(s):  
Sook-Ling Chua ◽  
Stephen Marsland ◽  
Hans W. Guesgen

The problem of behaviour recognition based on data from sensors is essentially an inverse problem: given a set of sensor observations, identify the sequence of behaviours that gave rise to them. In a smart home, the behaviours are likely to be the standard human behaviours of living, and the observations will depend upon the sensors that the house is equipped with. There are two main approaches to identifying behaviours from the sensor stream. One is to use a symbolic approach, which explicitly models the recognition process. Another is to use a sub-symbolic approach to behaviour recognition, which is the focus in this chapter, using data mining and machine learning methods. While there have been many machine learning methods of identifying behaviours from the sensor stream, they have generally relied upon a labelled dataset, where a person has manually identified their behaviour at each time. This is particularly tedious to do, resulting in relatively small datasets, and is also prone to significant errors as people do not pinpoint the end of one behaviour and commencement of the next correctly. In this chapter, the authors consider methods to deal with unlabelled sensor data for behaviour recognition, and investigate their use. They then consider whether they are best used in isolation, or should be used as preprocessing to provide a training set for a supervised method.


Author(s):  
Divya Chaudhary ◽  
Er. Richa Vasuja

In today's scenario all of data is being generated by everyone of us . so it becomes vital for us to handle this data. To do so new technologies are being developed such as machine learning, data mining etc. This paper gives the study related to machine learning(ML).Precise approximations are repetitively being produced by Machine Learning algorithms. Machine learning system effectively “learns” how to guess from training set of completed jobs. The main purpose of the review is to give a jagged estimate or overview about the mostly used algorithms in machine learning.


Author(s):  
Shailesh D. Kamble ◽  
Pawan Patel ◽  
Punit Fulzele ◽  
Yash Bangde ◽  
Hitesh Musale ◽  
...  

The efficient use of data mining in virtual sectors such as e-соmmerсe, and соmmerсe has led to its use in other industries. The mediсаl environment is still rich but weaker in technical analysis field. There is а lot of information that саn оссur within mediсаl systems. Using powerful analytics tооls to identify the hidden relationships with the current data trends. Disease is а term that provides а large number of соnditiоns connected to the heath care. These mediсаl соnditiоns describe unexpected health соnditiоns that directly соntrоl all the оrgаns of the body. Mediсаl data mining methods such as соrроrаte management mines, сlаssifiсаtiоn, integration is used to аnаlyze various types of соmmоn рhysiсаl problems. Seраrаtiоn is an imроrtаnt рrоblem in data mining. Many рорulаr сliрs make decision trees to рrоduсe саtegоry models. Data сlаssifiсаtiоn is based on the ID3 decision tree algorithm that leads to ассurасy, data are estimated to use entrорy verifiсаtiоn methods based on сrоss-seсtiоnаl and segmentation and results are соmраred. The database used for mасhine learning is divided into 3 parts - training, testing, and finally validation. This approach uses а training set to train а model and define its аррrорriаte раrаmeters. А test set is required to test а professional model and its standard performance. It is estimated that 70% of people in India can catch common illnesses such as viruses, flu, coughs, colds etc. every two months. Because most people do not realize that common allergies can be symptoms of something very serious, 25% of people suddenly die from ignoring the first normal symptoms. Therefore, identifying or predicting the disease early using machine learning (ML) is very important to avoid any unwanted injuries.


Author(s):  
Nikulin Vladimir

Imbalanced data represent a significant problem because the corresponding classifier has a tendency to ignore patterns which have smaller representation in the training set. We propose to consider a large number of balanced training subsets where representatives from the larger pattern are selected randomly. As an outcome, the system will produce a matrix of linear regression coefficients where rows represent random subsets and columns represent features. Based on the above matrix we make an assessment of the stability of the influence of the particular features. It is proposed to keep in the model only features with stable influence. The final model represents an average of the single models, which are not necessarily a linear regression. The above model had proven to be efficient and competitive during the PAKDD-2007 Data Mining Competition.


Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2228
Author(s):  
Khalid Haseeb ◽  
Irshad Ahmad ◽  
Israr Iqbal Awan ◽  
Jaime Lloret ◽  
Ignacio Bosch

In recent times, health applications have been gaining rapid popularity in smart cities using the Internet of Medical Things (IoMT). Many real-time solutions are giving benefits to both patients and professionals for remote data accessibility and suitable actions. However, timely medical decisions and efficient management of big data using IoT-based resources are the burning research challenges. Additionally, the distributed nature of data processing in many proposed solutions explicitly increases the threats of information leakages and damages the network integrity. Such solutions impose overhead on medical sensors and decrease the stability of the real-time transmission systems. Therefore, this paper presents a machine-learning model with SDN-enabled security to predict the consumption of network resources and improve the delivery of sensors data. Additionally, it offers centralized-based software define network (SDN) architecture to overcome the network threats among deployed sensors with nominal management cost. Firstly, it offers an unsupervised machine learning technique and decreases the communication overheads for IoT networks. Secondly, it predicts the link status using dynamic metrics and refines its strategies using SDN architecture. In the end, a security algorithm is utilized by the SDN controller that efficiently manages the consumption of the IoT nodes and protects it from unidentified occurrences. The proposed model is verified using simulations and improves system performance in terms of network throughput by 13%, data drop ratio by 39%, data delay by 11%, and faulty packets by 46% compared to HUNA and CMMA schemes.


2020 ◽  
Vol 60 ◽  
pp. 102177 ◽  
Author(s):  
Muhammad Shafiq ◽  
Zhihong Tian ◽  
Ali Kashif Bashir ◽  
Alireza Jolfaei ◽  
Xiangzhan Yu

2019 ◽  
Vol 11 (4) ◽  
pp. 1077 ◽  
Author(s):  
Jovani Souza ◽  
Antonio Francisco ◽  
Cassiano Piekarski ◽  
Guilherme Prado

Smart cities (SC) promote economic development, improve the welfare of their citizens, and help in the ability of people to use technologies to build sustainable services. However, computational methods are necessary to assist in the process of creating smart cities because they are fundamental to the decision-making process, assist in policy making, and offer improved services to citizens. As such, the aim of this research is to present a systematic review regarding data mining (DM) and machine learning (ML) approaches adopted in the promotion of smart cities. The Methodi Ordinatio was used to find relevant articles and the VOSviewer software was performed for a network analysis. Thirty-nine significant articles were identified for analysis from the Web of Science and Scopus databases, in which we analyzed the DM and ML techniques used, as well as the areas that are most engaged in promoting smart cities. Predictive analytics was the most common technique and the studies focused primarily on the areas of smart mobility and smart environment. This study seeks to encourage approaches that can be used by governmental agencies and companies to develop smart cities, being essential to assist in the Sustainable Development Goals.


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

2019 ◽  
Author(s):  
Andrew Medford ◽  
Shengchun Yang ◽  
Fuzhu Liu

Understanding the interaction of multiple types of adsorbate molecules on solid surfaces is crucial to establishing the stability of catalysts under various chemical environments. Computational studies on the high coverage and mixed coverages of reaction intermediates are still challenging, especially for transition-metal compounds. In this work, we present a framework to predict differential adsorption energies and identify low-energy structures under high- and mixed-adsorbate coverages on oxide materials. The approach uses Gaussian process machine-learning models with quantified uncertainty in conjunction with an iterative training algorithm to actively identify the training set. The framework is demonstrated for the mixed adsorption of CH<sub>x</sub>, NH<sub>x</sub> and OH<sub>x</sub> species on the oxygen vacancy and pristine rutile TiO<sub>2</sub>(110) surface sites. The results indicate that the proposed algorithm is highly efficient at identifying the most valuable training data, and is able to predict differential adsorption energies with a mean absolute error of ~0.3 eV based on <25% of the total DFT data. The algorithm is also used to identify 76% of the low-energy structures based on <30% of the total DFT data, enabling construction of surface phase diagrams that account for high and mixed coverage as a function of the chemical potential of C, H, O, and N. Furthermore, the computational scaling indicates the algorithm scales nearly linearly (N<sup>1.12</sup>) as the number of adsorbates increases. This framework can be directly extended to metals, metal oxides, and other materials, providing a practical route toward the investigation of the behavior of catalysts under high-coverage conditions.


Sign in / Sign up

Export Citation Format

Share Document