scholarly journals Chicago Crime Analysis using R Programming

Author(s):  
Monish N

In recent years law enforcement have improved by taking better strategies, computer aided technology, efficient use of resource, etc. As a result of these over the couple of years there has been a steep decline in crime rate in the US (United States). Law enforcement have turned to data science for insights (ranging from reports, corrective analysis and behavior modelling). There has been an overall drop in crime rates in Chicago in recent years. In fact, these rates are at the lowest when compared to the previous decades. This paper uses the criminal dataset found at “data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2” to describe historical trends, insights, etc. in Chicago from 1965 to 2018 and not to assign any casual interpretation of the vanguards of crime rates during this period. Here K-Nearest Neighbor (KNN) classification is used for training and crime predication. Discussions on future investigation can also be found. The proposed model has an accuracy of 83.2%.

Author(s):  
Irfan Ullah Khan ◽  
Nida Aslam ◽  
Malak Aljabri ◽  
Sumayh S. Aljameel ◽  
Mariam Moataz Aly Kamaleldin ◽  
...  

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.


Author(s):  
Jenicka S.

Texture feature is a decisive factor in pattern classification problems because texture features are not deduced from the intensity of current pixel but from the grey level intensity variations of current pixel with its neighbors. In this chapter, a new texture model called multivariate binary threshold pattern (MBTP) has been proposed with five discrete levels such as -9, -1, 0, 1, and 9 characterizing the grey level intensity variations of the center pixel with its neighbors in the local neighborhood of each band in a multispectral image. Texture-based classification has been performed with the proposed model using fuzzy k-nearest neighbor (fuzzy k-NN) algorithm on IRS-P6, LISS-IV data, and the results have been evaluated based on confusion matrix, classification accuracy, and Kappa statistics. From the experiments, it is found that the proposed model outperforms other chosen existing texture models.


Author(s):  
Shakti Kumar

Plant disease is a mutilation of the normal state of a plant that changes its essential quality and prevents a plant from performing to its actual potential. Due to drastic environment changes, plant diseases are growing day by day, which results the higher losses in quantity of agricultural yields. To prevent the loss in the crop yield, the timely disease identification is necessary. Monitoring the plant diseases without any digital mean makes it difficult to identify the disease correctly and timely. It requires more amounts of work, time, and great experience in the plant diseases. Automatic approach of image processing and applying the different data science techniques to classify the disease correctly is a good idea for this which includes acquisition, classification, feature extraction, pre-processing, and segmentation all are performed on the leaf images. This chapter will briefly discuss the data science techniques used for the classification of the images like SVM, k-nearest neighbor, decision tree, ANN, and convolutional neural network (CNN).


Author(s):  
Ali Pala ◽  
Jing Zhang ◽  
Jun Zhuang ◽  
Nathan Allen

Abstract Illegal fishing activities in the Gulf of Mexico pose a threat to the US national security, as well as damage to the economy. The US Coast Guard (USCG) estimates over 1100 incursions by Mexican fisherman into US regulated waters in the Gulf of Mexico annually. Fishermen enter the water borders to catch red snapper, which is one of the Gulf of Mexico’s signature and most valuable fish. There are a number of academic contributions which have sought to improve the understanding of the problem of illegal fishing, and to try to generate better solutions. In this study, we investigate the relationship between illegal fishing activities and environmental factors with one-year of historical sight, weather, and moon phase data. Descriptive analysis provides some interesting insights such as sight patterns depending on wave height, moon phase, and hours of a day. Also, we develop logistic regression models that shows wave height is negatively correlated with sight occurrences for all sight types. In addition, we oversample the data and develop two pre diction models using logistic regression and k-nearest neighbor algorithm and compare prediction accuracies. The results show that k-nearest neighbor algorithm performs better in most of the cases.


Author(s):  
Tssehay Admassu Assegie

<span>In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%</span>


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Xu Bao ◽  
Yanqiu Li ◽  
Jianmin Li ◽  
Rui Shi ◽  
Xin Ding

In this study, a hybrid method combining extreme learning machine (ELM) and particle swarm optimization (PSO) is proposed to forecast train arrival delays that can be used for later delay management and timetable optimization. First, nine characteristics (e.g., buffer time, the train number, and station code) associated with train arrival delays are chosen and analyzed using extra trees classifier. Next, an ELM with one hidden layer is developed to predict train arrival delays by considering these characteristics mentioned before as input features. Furthermore, the PSO algorithm is chosen to optimize the hyperparameter of the ELM compared to Bayesian optimization and genetic algorithm solving the arduousness problem of manual regulating. Finally, a case is studied to confirm the advantage of the proposed model. Contrasted to four baseline models (k-nearest neighbor, categorical boosting, Lasso, and gradient boosting decision tree) across different metrics, the proposed model is demonstrated to be proficient and achieve the highest prediction accuracy. In addition, through a detailed analysis of the prediction error, it is found that our model possesses good robustness and correctness.


2019 ◽  
Vol 8 (4) ◽  
pp. 9155-9158

Classification is a machine learning task which consists in predicting the set association of unclassified examples, whose label is not known, by the properties of examples in a representation learned earlier as of training examples, that label was known. Classification tasks contain a huge assortment of domains and real world purpose: disciplines such as medical diagnosis, bioinformatics, financial engineering and image recognition between others, where domain experts can use the model erudite to sustain their decisions. All the Classification Approaches proposed in this paper were evaluate in an appropriate experimental framework in R Programming Language and the major emphasis is on k-nearest neighbor method which supports vector machines and decision trees over large number of data sets with varied dimensionality and by comparing their performance against other state-of-the-art methods. In this process the experimental results obtained have been verified by statistical tests which support the better performance of the methods. In this paper we have survey various classification techniques of Data Mining and then compared them by using diverse datasets from “University of California: Irvine (UCI) Machine Learning Repository” for acquiring the accurate calculations on Iris Data set.


Stainless steel is most extensively utilized material in all engineering applications, house hold products, constructions, because it is environment friendly and can be recycled. The principal purpose of this paper is to implement different data science algorithms for predicting stainless steel mechanical properties. Integrating Data science techniques in material science and engineering helps manufacturers, designers, researchers and students in understanding the selection, discovery and development of materials used for various engineering applications. Data science algorithms help to find out the properties of the material without performing any experiments. The Data Science techniques such as Random Forest, Neural Network, Linear regression, K- Nearest Neighbor, Support vector Machine, Decision Tree, and Ensemble methods are used for predicting Tensile Strength by specifying processing parameters of stainless steel like carbon content, sectional size, temperature, manufacturing process. The research here is developed as part of AICTE grant sanctioned under RPS scheme [19] and it aims to implement different data science algorithms for predicting Tensile strength of steel and identifying the algorithm with decent prediction accuracy.


2021 ◽  
pp. 179-218
Author(s):  
Magy Seif El-Nasr ◽  
Truong Huy Nguyen Dinh ◽  
Alessandro Canossa ◽  
Anders Drachen

This chapter discusses several classification and regression methods that can be used with game data. Specifically, we will discuss regression methods, including Linear Regression, and classification methods, including K-Nearest Neighbor, Naïve Bayes, Logistic Regression, Linear Discriminant Analysis, Support Vector Machines, Decisions Trees, and Random Forests. We will discuss how you can setup the data to apply these algorithms, as well as how you can interpret the results and the pros and cons for each of the methods discussed. We will conclude the chapter with some remarks on the process of application of these methods to games and the expected outcomes. The chapter also includes practical labs to walk you through the process of applying these methods to real game data.


Author(s):  
Tsehay Admassu Assegie*

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.


Sign in / Sign up

Export Citation Format

Share Document