scholarly journals Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽  
2019 ◽  
Vol 19 (10) ◽  
pp. 2266 ◽  
Author(s):  
Nikolaos Sideris ◽  
Georgios Bardis ◽  
Athanasios Voulodimos ◽  
Georgios Miaoulis ◽  
Djamchid Ghazanfarpour

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

2022 ◽  
Vol 2161 (1) ◽  
pp. 012013
Author(s):  
Chiradeep Gupta ◽  
Athina Saha ◽  
N V Subba Reddy ◽  
U Dinesh Acharya

Abstract Diagnosis of cardiac disease requires being more accurate, precise, and reliable. The number of death cases due to cardiac attacks is increasing exponentially day by day. Thus, practical approaches for earlier diagnosis of cardiac or heart disease are done to achieve prompt management of the disease. Various supervised machine learning techniques like K-Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes, and Support Vector Machine (SVM) model are used for predicting cardiac disease using a dataset that was collected from the repository of the University of California, Irvine (UCI). The results depict that Logistic Regression was better than all other supervised classifiers in terms of the performance metrics. The model is also less risky since the number of false negatives is low as compared to other models as per the confusion matrix of all the models. In addition, ensemble techniques can be approached for the accuracy improvement of the classifier. Jupyter notebook is the best tool, for the implementation of Python Programming having many types of libraries, header files, for accurate and precise work.


Author(s):  
P Sai Teja

Unsolicited e-mail also known as Spam has become a huge concern for each e-mail user. In recent times, it is very difficult to filter spam emails as these emails are produced or created or written in a very special manner so that anti-spam filters cannot detect such emails. This paper compares and reviews performance metrics of certain categories of supervised machine learning techniques such as SVM (Support Vector Machine), Random Forest, Decision Tree, CNN, (Convolutional Neural Network), KNN(K Nearest Neighbor), MLP(Multi-Layer Perceptron), Adaboost (Adaptive Boosting) Naïve Bayes algorithm to predict or classify into spam emails. The objective of this study is to consider the details or content of the emails, learn a finite dataset available and to develop a classification model that will be able to predict or classify whether an e-mail is spam or not.


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


Author(s):  
V Umarani ◽  
A Julian ◽  
J Deepa

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.


2021 ◽  
Vol 297 ◽  
pp. 01073
Author(s):  
Sabyasachi Pramanik ◽  
K. Martin Sagayam ◽  
Om Prakash Jena

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.


Optimization algorithms are widely used for the identification of intrusion. This is attributable to the increasing number of audit data features and the decreasing performance of human-based smart Intrusion Detection Systems (IDS) regarding classification accuracy and training time. In this paper, an improved method for intrusion detection for binary classification was presented and discussed in detail. The proposed method combined the New Teaching-Learning-Based Optimization Algorithm (NTLBO), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Logistic Regression (LR) (feature selection and weighting) NTLBO algorithm with supervised machine learning techniques for Feature Subset Selection (FSS). The process of selecting the least number of features without any effect on the result accuracy in FSS was considered a multi-objective optimization problem. The NTLBO was proposed in this paper as an FSS mechanism; its algorithm-specific, parameter-less concept (which requires no parameter tuning during an optimization) was explored. The experiments were performed on the prominent intrusion machine-learning datasets (KDDCUP’99 and CICIDS 2017), where significant enhancements were observed with the suggested NTLBO algorithm as compared to the classical Teaching-Learning-Based Optimization algorithm (TLBO), NTLBO presented better results than TLBO and many existing works. The results showed that NTLBO reached 100% accuracy for KDDCUP’99 dataset and 97% for CICIDS dataset


2021 ◽  
pp. 1-15
Author(s):  
Savaridassan Pankajashan ◽  
G. Maragatham ◽  
T. Kirthiga Devi

Anomaly-based detection is coupled with recognizing the uncommon, to catch the unusual activity, and to find the strange action behind that activity. Anomaly-based detection has a wide scope of critical applications, from bank application security to regular sciences to medical systems to marketing apps. Anomaly-based detection adopted by various Machine Learning techniques is really a type of system that consists of artificial intelligence. With the ever-expanding volume and new sorts of information, for example, sensor information from an incontestably enormous amount of IoT devices and from network flow data from cloud computing, it is implicitly understood without surprise that there is a developing enthusiasm for having the option to deal with more conclusions automatically by means of AI and ML applications. But with respect to anomaly detection, many applications of the scheme are simply the passion for detection. In this paper, Machine Learning (ML) techniques, namely the SVM, Isolation forest classifiers experimented and with reference to Deep Learning (DL) techniques, the proposed DA-LSTM (Deep Auto-Encoder LSTM) model are adopted for preprocessing of log data and anomaly-based detection to get better performance measures of detection. An enhanced LSTM (long-short-term memory) model, optimizing for the suitable parameter using a genetic algorithm (GA), is utilized to recognize better the anomaly from the log data that is filtered, adopting a Deep Auto-Encoder (DA). The Deep Neural network models are utilized to change over unstructured log information to training ready features, which are reasonable for log classification in detecting anomalies. These models are assessed, utilizing two benchmark datasets, the Openstack logs, and CIDDS-001 intrusion detection OpenStack server dataset. The outcomes acquired show that the DA-LSTM model performs better than other notable ML techniques. We further investigated the performance metrics of the ML and DL models through the well-known indicator measurements, specifically, the F-measure, Accuracy, Recall, and Precision. The exploratory conclusion shows that the Isolation Forest, and Support vector machine classifiers perform roughly 81%and 79%accuracy with respect to the performance metrics measurement on the CIDDS-001 OpenStack server dataset while the proposed DA-LSTM classifier performs around 99.1%of improved accuracy than the familiar ML algorithms. Further, the DA-LSTM outcomes on the OpenStack log data-sets show better anomaly detection compared with other notable machine learning models.


2020 ◽  
Vol 10 (2) ◽  
pp. 1-26
Author(s):  
Naghmeh Moradpoor Sheykhkanloo ◽  
Adam Hall

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.


10.29007/tx1s ◽  
2018 ◽  
Author(s):  
Aditya Nori

Computing good specification and invariants is key to effectiveand efficient program verification. In this talk, I will describeour experiences in using machine learning techniques (Bayesianinference, SVMs) for computing specifications and invariantsuseful for program verification. The first project Merlin usesBayesian inference in order to automatically infer securityspecifications of programs. A novel feature of Merlin is that itcan infer specifications even when the code under analysis givesrise to conflicting constraints, a situation that typicallyoccurs when there are bugs. We have used Merlin to infer securityspecifications of 10 large business critical webapplications. Furthermore, we show that these specifications canbe used to detect new information flow security vulnerabilitiesin these applications.In the second project Interpol, we show how interpolants can beviewed as classifiers in supervised machine learning. This viewhas several advantages: First, we are able to use off-the-shelfclassification techniques, in particular support vectormachines (SVMs), for interpolation. Second, we show that SVMs canfind relevant predicates for a number of benchmarks. Sinceclassification algorithms are predictive, the interpolantscomputed via classification are likely to be relevant predicatesor invariants. Finally, the machine learning view also enables usto handle superficial non-linearities. Even if the underlyingproblem structure is linear, the symbolic constraints can give animpression that we are solving a non-linear problem. Sincelearning algorithms try to mine the underlying structuredirectly, we can discover the linear structure for suchproblems. We demonstrate the feasibility of Interpol viaexperiments over benchmarks from various papers on programverification.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3644 ◽  
Author(s):  
Philippe Desjardins-Proulx ◽  
Idaline Laigle ◽  
Timothée Poisot ◽  
Dominique Gravel

Species interactions are a key component of ecosystems but we generally have an incomplete picture of who-eats-who in a given community. Different techniques have been devised to predict species interactions using theoretical models or abundances. Here, we explore the K nearest neighbour approach, with a special emphasis on recommendation, along with a supervised machine learning technique. Recommenders are algorithms developed for companies like Netflix to predict whether a customer will like a product given the preferences of similar customers. These machine learning techniques are well-suited to study binary ecological interactions since they focus on positive-only data. By removing a prey from a predator, we find that recommenders can guess the missing prey around 50% of the times on the first try, with up to 881 possibilities. Traits do not improve significantly the results for the K nearest neighbour, although a simple test with a supervised learning approach (random forests) show we can predict interactions with high accuracy using only three traits per species. This result shows that binary interactions can be predicted without regard to the ecological community given only three variables: body mass and two variables for the species’ phylogeny. These techniques are complementary, as recommenders can predict interactions in the absence of traits, using only information about other species’ interactions, while supervised learning algorithms such as random forests base their predictions on traits only but do not exploit other species’ interactions. Further work should focus on developing custom similarity measures specialized for ecology to improve the KNN algorithms and using richer data to capture indirect relationships between species.


Sign in / Sign up

Export Citation Format

Share Document