Weather Forecast through Data Mining

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217318 ◽

2021 ◽

pp. 90-95

Author(s):

Swati Pandey ◽

Shruti Sharma ◽

Shubham Kumar ◽

Kanchan Bhatt ◽

Dr. Rakesh Kumar Arora

Keyword(s):

Nearest Neighbor ◽

Weather Forecasting ◽

Regression Tree ◽

Weather Forecast ◽

Weather Conditions ◽

Classification And Regression Tree ◽

Support Vector ◽

K Nearest Neighbor ◽

Random Forest Classification ◽

Forest Classification

Weather Forecasting is the attempt to predict the weather conditions based on parameters such as temperature, wind, humidity and rainfall. These parameters will be considered for experimental analysis to give the desired results. Data used in this project has been collected from various government institution sites. The algorithm used to predict weather includes Neural Networks(NN), Random Forest, Classification and Regression tree (C &RT), Support Vector Machine, K-nearest neighbor. The correlation analysis of the parameters will help in predicting the future values. This web based application we will have its own chat bot where user can directly communicate about their query related to Weather Forecast and can have experience of two-way communication.

Download Full-text

Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i6.3547 ◽

2021 ◽

Vol 5 (6) ◽

pp. 1083-1089

Author(s):

Nur Ghaniaviyanto Ramadhan

Keyword(s):

Nearest Neighbor ◽

Online News ◽

Classification Model ◽

Support Vector ◽

The Internet ◽

K Nearest Neighbor ◽

K Value ◽

Random Forest Classification ◽

Forest Classification ◽

Survey Results

News is information disseminated by newspapers, radio, television, the internet, and other media. According to the survey results, there are many news titles from various topics spread on the internet. This of course makes newsreaders have difficulty when they want to find the desired news topic to read. These problems can be solved by grouping or so-called classification. The classification process is carried out of course by using a computerized process. This study aims to classify several news topics in Indonesian language using the KNN classification model and word2vec to convert words into vectors which aim to facilitate the classification process. The use of KNN in this study also determines the optimal K value to be used. In addition to using the classification model, this study also uses a word embedding-based model, namely word2vec. The results obtained using the word2vec and KNN models have an accuracy of 89.2% with a value of K=7. The word2vec and KNN models are also superior to the support vector machine, logistic regression, and random forest classification models.

Download Full-text

Use of data mining techniques to classify length of stay of emergency department patients

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2018-0044 ◽

2019 ◽

Vol 15 (1) ◽

Cited By ~ 1

Author(s):

Görkem Sariyer ◽

Ceren Öcal Taşar ◽

Gizem Ersoy Cepe

Keyword(s):

Length Of Stay ◽

Regression Tree ◽

Secondary Data ◽

Classification And Regression Tree ◽

Urban Hospital ◽

Data Set ◽

Random Forest Classification ◽

Forest Classification ◽

High Level ◽

Sensitivity Specificity

Abstract Emergency departments (EDs) are the largest departments of hospitals which encounter high variety of cases as well as high level of patient volumes. Thus, an efficient classification of those patients at the time of their registration is very important for the operations planning and management. Using secondary data from the ED of an urban hospital, we examine the significance of factors while classifying patients according to their length of stay. Random Forest, Classification and Regression Tree, Logistic Regression (LR), and Multilayer Perceptron (MLP) were adopted in the data set of July 2016, and these algorithms were tested in data set of August 2016. Besides adopting and testing the algorithms on the whole data set, patients in these sets were grouped into 21 based on the similarities in their diagnoses and the algorithms were also performed in these subgroups. Performances of the classifiers were evaluated based on the sensitivity, specificity, and accuracy. It was observed that sensitivity, specificity, and accuracy values of the classifiers were similar, where LR and MLP had somehow higher values. In addition, the average performance of the classifying patients within the subgroups outperformed the classifying based on the whole data set for each of the classifiers.

Download Full-text

Linking LiDAR with streamwater biogeochemistry in coastal temperate rainforest watersheds

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/cjfas-2016-0130 ◽

2017 ◽

Vol 74 (6) ◽

pp. 801-811 ◽

Cited By ~ 1

Author(s):

Jason B. Fellman ◽

Brian Buma ◽

Eran Hood ◽

Richard T. Edwards ◽

David V. D’Amore

Keyword(s):

Forest Biomass ◽

Regression Tree ◽

Classification And Regression Tree ◽

Temperate Rainforest ◽

Organic Matter Quality ◽

Watershed Characteristics ◽

Random Forest Classification ◽

Forest Classification ◽

Watershed Slope ◽

Stream Biogeochemistry

The goal of this study was to use watershed characteristics derived from light detection and ranging (LiDAR) data to predict stream biogeochemistry in Perhumid Coastal Temperate Rainforest (PCTR) watersheds. Over a 2-day period, we sampled 37 streams for concentrations of dissolved C, N, P, major cations, and measures of dissolved organic matter quality (specific ultraviolet absorbance, SUVA254) and bioavailability. Random forest – classification and regression tree analysis showed that aboveground biomass and structure and watershed characteristics, inclusive of mean watershed slope and elevation, watershed size, and topographic wetness, explained more than 60% of the variation in concentration for most measured constituents. These results indicate this approach may be particularly useful for predicting stream biogeochemistry in small forested watersheds where fine resolution is needed to resolve subtle differences in forest biomass, structure, and topography. Overall, we suggest that the use of LiDAR in many of the small and remote watersheds across the Southeast Alaskan PCTR as well as other forested regions could help inform land management decisions that have the potential to alter ecosystems services related to watershed biogeochemical fluxes.

Download Full-text

Prediction of Flash Flood using Rainfall by MLP Classifier

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9880.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 425-429

Keyword(s):

Prediction Model ◽

Nearest Neighbor ◽

Weather Forecasting ◽

Flash Flood ◽

Human Life ◽

Cost Effective ◽

Rainfall Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Flood Prediction

Flood are one of the unfavorable natural disasters. A flood can result in a huge loss of human lives and properties. It can also affect agricultural lands and destroy cultivated crops and trees. The flood can occur as a result of surface-runoff formed from melting snow, long-drawn-out rains, and derisory drainage of rainwater or collapse of dams. Today people have destroyed the rivers and lakes and have turned the natural water storage pools to buildings and construction lands. Flash floods can develop quickly within a few hours when compared with a regular flood. Research in prediction of flood has improved to reduce the loss of human life, property damages, and various problems related to the flood. Machine learning methods are widely used in building an efficient prediction model for weather forecasting. This advancement of the prediction system provides cost-effective solutions and better performance. In this paper, a prediction model is constructed using rainfall data to predict the occurrence of floods due to rainfall. The model predicts whether “flood may happen or not” based on the rainfall range for particular locations. Indian district rainfall data is used to build the prediction model. The dataset is trained with various algorithms like Linear Regression, K- Nearest Neighbor, Support Vector Machine, and Multilayer Perceptron. Among this, MLP algorithm performed efficiently with the highest accuracy of 97.40%. The MLP flash flood prediction model can be useful for the climate scientist to predict the flood during a heavy downpour with the highest accuracy.

Download Full-text

Snow Detection using In-Vehicle Video Camera with Texture-Based Image Features Utilizing K-Nearest Neighbor, Support Vector Machine, and Random Forest

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119842105 ◽

2019 ◽

Vol 2673 (8) ◽

pp. 221-232 ◽

Cited By ~ 7

Author(s):

Md Nasim Khan ◽

Mohamed M. Ahmed

Keyword(s):

Real Time ◽

Prediction Accuracy ◽

Nearest Neighbor ◽

Detection System ◽

Weather Conditions ◽

Video Camera ◽

Image Features ◽

Support Vector ◽

K Nearest Neighbor ◽

Weather Information

Snowfall negatively affects pavement and visibility conditions, making it one of the major causes of motor vehicle crashes in winter weather. Therefore, providing drivers with real-time roadway weather information during adverse weather is crucial for safe driving. Although road weather stations can provide weather information, these stations are expensive and often do not represent real-time trajectory-level weather information. The main motivation of this study was to develop an affordable in-vehicle snow detection system which can provide trajectory-level weather information in real time. The system utilized SHRP2 Naturalistic Driving Study video data and was based on machine learning techniques. To train the snow detection models, two texture-based image features including gray level co-occurrence matrix (GLCM) and local binary pattern (LBP), and three classification algorithms: support vector machine (SVM), k-nearest neighbor (K-NN), and random forest (RF) were used. The analysis was done on an image dataset consisting of three weather conditions: clear, light snow, and heavy snow. While the highest overall prediction accuracy of the models based on the GLCM features was found to be around 86%, the models considering the LBP based features provided a much higher prediction accuracy of 96%. The snow detection system proposed in this study is cost effective, does not require a lot of technical support, and only needs a single video camera. With the advances in smartphone cameras, simple mobile apps with proper data connectivity can effectively be used to detect roadway weather conditions in real time with reasonable accuracy.

Download Full-text

Analysis of Frequency Bands of Uterine Electromyography Signals for the Detection of Preterm Birth

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210165 ◽

2021 ◽

Author(s):

Vinothini Selvaraju ◽

P.A. Karthick ◽

Ramakrishnan Swaminathan

Keyword(s):

Preterm Birth ◽

Random Forest ◽

Nearest Neighbor ◽

Peak Frequency ◽

Classification Model ◽

K Nearest Neighbor ◽

Frequency Bands ◽

Random Forest Classification ◽

Forest Classification ◽

Uterine Electromyography

In this work, an attempt has been made to analyze the influence of the frequencies bands in uterine electromyography (uEMG) signals on the detection of preterm birth. The signals recorded from the women’s abdomen during pregnancy are considered in this study. The signals are subjected to preprocessing using digital bandpass Butterworth filter and decomposed into different frequency bands namely, 0.3-1.0 Hz (F1), 1.0-2.0 Hz (F2) and 2.0-3.0Hz (F3). Spectral features namely, peak magnitude, peak frequency, mean frequency and median frequency are extracted from the power spectrum. Classification models namely, k-nearest neighbor, support vector machine and random forest are employed to distinguish the term and preterm conditions. The results show that the features extracted from these frequency bands are able to differentiate term and preterm condition. Particularly, the frequency band F3 performs better than other frequency bands. The features associated with these frequencies along with random forest classification model achieves a maximum accuracy of 75.2%. Thus, these measures could be used to accurately detect the preterm birth well in advance.

Download Full-text

Fault Diagnosis of Permanent Magnet DC Motors Based on Multi-Segment Feature Extraction

Sensors ◽

10.3390/s21227505 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7505

Author(s):

Lixin Lu ◽

Weihao Wang

Keyword(s):

Feature Extraction ◽

Fault Diagnosis ◽

Permanent Magnet ◽

Nearest Neighbor ◽

Classification And Regression Tree ◽

Support Vector ◽

K Nearest Neighbor ◽

Single Segment ◽

Dc Motors ◽

Permanent Magnet Dc Motors

For permanent magnet DC motors (PMDCMs), the amplitude of the current signals gradually decreases after the motor starts. Only using the signal features of current in a single segment is not conducive to fault diagnosis for PMDCMs. In this work, multi-segment feature extraction is presented for improving the effect of fault diagnosis of PMDCMs. Additionally, a support vector machine (SVM), a classification and regression tree (CART), and the k-nearest neighbor algorithm (k-NN) are utilized for the construction of fault diagnosis models. The time domain features extracted from several successive segments of current signals make up a feature vector, which is adopted for fault diagnosis of PMDCMs. Experimental results show that multi-segment features have a better diagnostic effect than single-segment features; the average accuracy of fault diagnosis improves by 19.88%. This paper lays the foundation of fault diagnosis for PMDCMs through multi-segment feature extraction and provides a novel method for feature extraction.

Download Full-text

Optimizing Error Rate in Intrusion Detection System Using Artificial Neural Network Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i9.102 ◽

2018 ◽

Vol 6 (9) ◽

pp. 152

Author(s):

S. Vijaya Rani ◽

G. N. K. Suresh Babu

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Intrusion Detection ◽

Error Rate ◽

Learning Process ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Artificial Neural

The illegal hackers penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text