Chicago Crime Analysis using R Programming

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952173 ◽

2019 ◽

pp. 937-944

Author(s):

Monish N

Keyword(s):

Law Enforcement ◽

Data Science ◽

Nearest Neighbor ◽

Crime Rates ◽

K Nearest Neighbor ◽

Proposed Model ◽

The Us ◽

Steep Decline ◽

R Programming ◽

And Behavior

In recent years law enforcement have improved by taking better strategies, computer aided technology, efficient use of resource, etc. As a result of these over the couple of years there has been a steep decline in crime rate in the US (United States). Law enforcement have turned to data science for insights (ranging from reports, corrective analysis and behavior modelling). There has been an overall drop in crime rates in Chicago in recent years. In fact, these rates are at the lowest when compared to the previous decades. This paper uses the criminal dataset found at “data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2” to describe historical trends, insights, etc. in Chicago from 1965 to 2018 and not to assign any casual interpretation of the vanguards of crime rates during this period. Here K-Nearest Neighbor (KNN) classification is used for training and crime predication. Discussions on future investigation can also be found. The proposed model has an accuracy of 83.2%.

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text

Land Cover Classification Using the Proposed Texture Model and Fuzzy k-NN Classifier

Optimization Techniques for Problem Solving in Uncertainty - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5091-4.ch009 ◽

2018 ◽

pp. 226-261

Author(s):

Jenicka S.

Keyword(s):

Nearest Neighbor ◽

Confusion Matrix ◽

Texture Feature ◽

Image Texture ◽

Kappa Statistics ◽

Grey Level ◽

Classification Problems ◽

K Nearest Neighbor ◽

Texture Model ◽

Proposed Model

Texture feature is a decisive factor in pattern classification problems because texture features are not deduced from the intensity of current pixel but from the grey level intensity variations of current pixel with its neighbors. In this chapter, a new texture model called multivariate binary threshold pattern (MBTP) has been proposed with five discrete levels such as -9, -1, 0, 1, and 9 characterizing the grey level intensity variations of the center pixel with its neighbors in the local neighborhood of each band in a multispectral image. Texture-based classification has been performed with the proposed model using fuzzy k-nearest neighbor (fuzzy k-NN) algorithm on IRS-P6, LISS-IV data, and the results have been evaluated based on confusion matrix, classification accuracy, and Kappa statistics. From the experiments, it is found that the proposed model outperforms other chosen existing texture models.

Download Full-text

Crop Disease Detection Using Data Science Techniques

Advances in Wireless Technologies and Telecommunication - Evolution of Software-Defined Networking Foundations for IoT and 5G Mobile Networks ◽

10.4018/978-1-7998-4685-7.ch005 ◽

2021 ◽

pp. 80-97

Author(s):

Shakti Kumar

Keyword(s):

Data Science ◽

Nearest Neighbor ◽

Plant Diseases ◽

K Nearest Neighbor ◽

Classification Feature ◽

Crop Disease ◽

Using Data ◽

Agricultural Yields ◽

Day By Day

Plant disease is a mutilation of the normal state of a plant that changes its essential quality and prevents a plant from performing to its actual potential. Due to drastic environment changes, plant diseases are growing day by day, which results the higher losses in quantity of agricultural yields. To prevent the loss in the crop yield, the timely disease identification is necessary. Monitoring the plant diseases without any digital mean makes it difficult to identify the disease correctly and timely. It requires more amounts of work, time, and great experience in the plant diseases. Automatic approach of image processing and applying the different data science techniques to classify the disease correctly is a good idea for this which includes acquisition, classification, feature extraction, pre-processing, and segmentation all are performed on the leaf images. This chapter will briefly discuss the data science techniques used for the classification of the images like SVM, k-nearest neighbor, decision tree, ANN, and convolutional neural network (CNN).

Download Full-text

Behavior Analysis of Illegal Fishing in the Gulf of Mexico

Journal of Homeland Security and Emergency Management ◽

10.1515/jhsem-2016-0017 ◽

2018 ◽

Vol 15 (1) ◽

Author(s):

Ali Pala ◽

Jing Zhang ◽

Jun Zhuang ◽

Nathan Allen

Keyword(s):

Logistic Regression ◽

Gulf Of Mexico ◽

Wave Height ◽

Nearest Neighbor ◽

Moon Phase ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Illegal Fishing ◽

The Us ◽

K Nearest Neighbor Algorithm

Abstract Illegal fishing activities in the Gulf of Mexico pose a threat to the US national security, as well as damage to the economy. The US Coast Guard (USCG) estimates over 1100 incursions by Mexican fisherman into US regulated waters in the Gulf of Mexico annually. Fishermen enter the water borders to catch red snapper, which is one of the Gulf of Mexico’s signature and most valuable fish. There are a number of academic contributions which have sought to improve the understanding of the problem of illegal fishing, and to try to generate better solutions. In this study, we investigate the relationship between illegal fishing activities and environmental factors with one-year of historical sight, weather, and moon phase data. Descriptive analysis provides some interesting insights such as sight patterns depending on wave height, moon phase, and hours of a day. Also, we develop logistic regression models that shows wave height is negatively correlated with sight occurrences for all sight types. In addition, we oversample the data and develop two pre diction models using logistic regression and k-nearest neighbor algorithm and compare prediction accuracies. The results show that k-nearest neighbor algorithm performs better in most of the cases.

Download Full-text

Heart disease prediction model with k-nearest neighbor algorithm

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v10i3.pp225-230 ◽

2021 ◽

Vol 10 (3) ◽

pp. 225

Author(s):

Tssehay Admassu Assegie

Keyword(s):

Heart Disease ◽

Prediction Model ◽

Nearest Neighbor ◽

Predictive Performance ◽

Data Repository ◽

Disease Prediction ◽

K Nearest Neighbor ◽

Proposed Model ◽

K Nearest Neighbor Algorithm ◽

Learning Data

<span>In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%</span>

Download Full-text

Prediction of Train Arrival Delay Using Hybrid ELM-PSO Approach

Journal of Advanced Transportation ◽

10.1155/2021/7763126 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Xu Bao ◽

Yanqiu Li ◽

Jianmin Li ◽

Rui Shi ◽

Xin Ding

Keyword(s):

Nearest Neighbor ◽

Pso Algorithm ◽

Bayesian Optimization ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Delay Management ◽

Buffer Time ◽

Proposed Model ◽

Learning Machine ◽

Hidden Layer

In this study, a hybrid method combining extreme learning machine (ELM) and particle swarm optimization (PSO) is proposed to forecast train arrival delays that can be used for later delay management and timetable optimization. First, nine characteristics (e.g., buffer time, the train number, and station code) associated with train arrival delays are chosen and analyzed using extra trees classifier. Next, an ELM with one hidden layer is developed to predict train arrival delays by considering these characteristics mentioned before as input features. Furthermore, the PSO algorithm is chosen to optimize the hyperparameter of the ELM compared to Bayesian optimization and genetic algorithm solving the arduousness problem of manual regulating. Finally, a case is studied to confirm the advantage of the proposed model. Contrasted to four baseline models (k-nearest neighbor, categorical boosting, Lasso, and gradient boosting decision tree) across different metrics, the proposed model is demonstrated to be proficient and achieve the highest prediction accuracy. In addition, through a detailed analysis of the prediction error, it is found that our model possesses good robustness and correctness.

Download Full-text

A Research Travelogue on Classification Algorithms using R Programming

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d9014.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 9155-9158

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Statistical Tests ◽

Learning Task ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Domain Experts ◽

R Programming ◽

Training Examples

Classification is a machine learning task which consists in predicting the set association of unclassified examples, whose label is not known, by the properties of examples in a representation learned earlier as of training examples, that label was known. Classification tasks contain a huge assortment of domains and real world purpose: disciplines such as medical diagnosis, bioinformatics, financial engineering and image recognition between others, where domain experts can use the model erudite to sustain their decisions. All the Classification Approaches proposed in this paper were evaluate in an appropriate experimental framework in R Programming Language and the major emphasis is on k-nearest neighbor method which supports vector machines and decision trees over large number of data sets with varied dimensionality and by comparing their performance against other state-of-the-art methods. In this process the experimental results obtained have been verified by statistical tests which support the better performance of the methods. In this paper we have survey various classification techniques of Data Mining and then compared them by using diverse datasets from “University of California: Irvine (UCI) Machine Learning Repository” for acquiring the accurate calculations on Iris Data set.

Download Full-text

Prediction of Mechanical Properties of Steel using Data Science Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c3952.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 235-241 ◽

Cited By ~ 1

Keyword(s):

Mechanical Properties ◽

Tensile Strength ◽

Stainless Steel ◽

Data Science ◽

Nearest Neighbor ◽

Ensemble Methods ◽

Processing Parameters ◽

Support Vector ◽

K Nearest Neighbor ◽

Engineering Applications

Stainless steel is most extensively utilized material in all engineering applications, house hold products, constructions, because it is environment friendly and can be recycled. The principal purpose of this paper is to implement different data science algorithms for predicting stainless steel mechanical properties. Integrating Data science techniques in material science and engineering helps manufacturers, designers, researchers and students in understanding the selection, discovery and development of materials used for various engineering applications. Data science algorithms help to find out the properties of the material without performing any experiments. The Data Science techniques such as Random Forest, Neural Network, Linear regression, K- Nearest Neighbor, Support vector Machine, Decision Tree, and Ensemble methods are used for predicting Tensile Strength by specifying processing parameters of stainless steel like carbon content, sectional size, temperature, manufacturing process. The research here is developed as part of AICTE grant sanctioned under RPS scheme [19] and it aims to implement different data science algorithms for predicting Tensile strength of steel and identifying the algorithm with decent prediction accuracy.

Download Full-text

Supervised Learning in Game Data Science

10.1093/oso/9780192897879.003.0007 ◽

2021 ◽

pp. 179-218

Author(s):

Magy Seif El-Nasr ◽

Truong Huy Nguyen Dinh ◽

Alessandro Canossa ◽

Anders Drachen

Keyword(s):

Data Science ◽

Nearest Neighbor ◽

Support Vector ◽

Classification Methods ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Regression Methods ◽

Vector Machines ◽

Pros And Cons ◽

Classification And Regression

This chapter discusses several classification and regression methods that can be used with game data. Specifically, we will discuss regression methods, including Linear Regression, and classification methods, including K-Nearest Neighbor, Naïve Bayes, Logistic Regression, Linear Discriminant Analysis, Support Vector Machines, Decisions Trees, and Random Forests. We will discuss how you can setup the data to apply these algorithms, as well as how you can interpret the results and the pros and cons for each of the methods discussed. We will conclude the chapter with some remarks on the process of application of these methods to games and the expected outcomes. The chapter also includes practical labs to walk you through the process of applying these methods to real game data.

Download Full-text

K-Nearest Neighbor Based URL Identification Model for Phishing Attack Detection

Indian Journal of Artificial Intelligence and Neural Networking ◽

10.35940/ijainn.b1019.041221 ◽

2021 ◽

Vol 1 (2) ◽

pp. 18-21

Author(s):

Tsehay Admassu Assegie*

Keyword(s):

Nearest Neighbor ◽

Attack Detection ◽

Experimental Result ◽

Data Repository ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

K Value ◽

Proposed Model ◽

Public Data ◽

Public Data Repository

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.

Download Full-text