The Classification of Noise-Afflicted Remotely Sensed Data Using Three Machine-Learning Techniques: Effect of Different Levels and Types of Noise on Accuracy

Sornkitja Boonprong; Chunxiang Cao; Wei Chen; Xiliang Ni; Min Xu; Bipin Acharya

doi:10.3390/ijgi7070274

The Classification of Noise-Afflicted Remotely Sensed Data Using Three Machine-Learning Techniques: Effect of Different Levels and Types of Noise on Accuracy

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7070274 ◽

2018 ◽

Vol 7 (7) ◽

pp. 274 ◽

Cited By ~ 6

Author(s):

Sornkitja Boonprong ◽

Chunxiang Cao ◽

Wei Chen ◽

Xiliang Ni ◽

Min Xu ◽

...

Keyword(s):

Machine Learning ◽

Satellite Image ◽

Back Propagation ◽

Speckle Noise ◽

Remotely Sensed ◽

Back Propagation Neural Network ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Remotely Sensed Data

Remotely sensed data are often adversely affected by many types of noise, which influences the classification result. Supervised machine-learning (ML) classifiers such as random forest (RF), support vector machine (SVM), and back-propagation neural network (BPNN) are broadly reported to improve robustness against noise. However, only a few comparative studies that may help investigate this robustness have been reported. An important contribution, going beyond previous studies, is that we perform the analyses by employing the most well-known and broadly implemented packages of the three classifiers and control their settings to represent users’ actual applications. This facilitates an understanding of the extent to which the noise types and levels in remotely sensed data impact classification accuracy using ML classifiers. By using those implementations, we classified the land cover data from a satellite image that was separately afflicted by seven-level zero-mean Gaussian, salt–pepper, and speckle noise. The modeling data and features were strictly controlled. Finally, we discussed how each noise type affects the accuracy obtained from each classifier and the robustness of the classifiers to noise in the data. This may enhance our understanding of the relationship between noises, the supervised ML classifiers, and remotely sensed data.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

Remote Sensing ◽

10.3390/rs13030368 ◽

2021 ◽

Vol 13 (3) ◽

pp. 368

Author(s):

Christopher A. Ramezan ◽

Timothy A. Warner ◽

Aaron E. Maxwell ◽

Bradley S. Price

Keyword(s):

Machine Learning ◽

Sample Size ◽

Remotely Sensed ◽

Training Data ◽

Supervised Machine Learning ◽

Sample Sizes ◽

Remotely Sensed Data ◽

Large Area ◽

Training Set ◽

Set Size

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text

Improved Intrusion Detection Algorithm based on TLBO and GA Algorithms

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/2/5 ◽

2021 ◽

Vol 18 (2) ◽

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Optimization Algorithm ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Feature Subset ◽

Teaching Learning Based Optimization ◽

Teaching Learning

Optimization algorithms are widely used for the identification of intrusion. This is attributable to the increasing number of audit data features and the decreasing performance of human-based smart Intrusion Detection Systems (IDS) regarding classification accuracy and training time. In this paper, an improved method for intrusion detection for binary classification was presented and discussed in detail. The proposed method combined the New Teaching-Learning-Based Optimization Algorithm (NTLBO), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Logistic Regression (LR) (feature selection and weighting) NTLBO algorithm with supervised machine learning techniques for Feature Subset Selection (FSS). The process of selecting the least number of features without any effect on the result accuracy in FSS was considered a multi-objective optimization problem. The NTLBO was proposed in this paper as an FSS mechanism; its algorithm-specific, parameter-less concept (which requires no parameter tuning during an optimization) was explored. The experiments were performed on the prominent intrusion machine-learning datasets (KDDCUP’99 and CICIDS 2017), where significant enhancements were observed with the suggested NTLBO algorithm as compared to the classical Teaching-Learning-Based Optimization algorithm (TLBO), NTLBO presented better results than TLBO and many existing works. The results showed that NTLBO reached 100% accuracy for KDDCUP’99 dataset and 97% for CICIDS dataset

Download Full-text

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text

Specification Inference and Invariant Generation: A Machine Learning Perspective

10.29007/tx1s ◽

2018 ◽

Author(s):

Aditya Nori

Keyword(s):

Machine Learning ◽

Program Verification ◽

Linear Structure ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Underlying Structure ◽

New Information ◽

Efficient Program ◽

Via Classification

Computing good specification and invariants is key to effectiveand efficient program verification. In this talk, I will describeour experiences in using machine learning techniques (Bayesianinference, SVMs) for computing specifications and invariantsuseful for program verification. The first project Merlin usesBayesian inference in order to automatically infer securityspecifications of programs. A novel feature of Merlin is that itcan infer specifications even when the code under analysis givesrise to conflicting constraints, a situation that typicallyoccurs when there are bugs. We have used Merlin to infer securityspecifications of 10 large business critical webapplications. Furthermore, we show that these specifications canbe used to detect new information flow security vulnerabilitiesin these applications.In the second project Interpol, we show how interpolants can beviewed as classifiers in supervised machine learning. This viewhas several advantages: First, we are able to use off-the-shelfclassification techniques, in particular support vectormachines (SVMs), for interpolation. Second, we show that SVMs canfind relevant predicates for a number of benchmarks. Sinceclassification algorithms are predictive, the interpolantscomputed via classification are likely to be relevant predicatesor invariants. Finally, the machine learning view also enables usto handle superficial non-linearities. Even if the underlyingproblem structure is linear, the symbolic constraints can give animpression that we are solving a non-linear problem. Sincelearning algorithms try to mine the underlying structuredirectly, we can discover the linear structure for suchproblems. We demonstrate the feasibility of Interpol viaexperiments over benchmarks from various papers on programverification.

Download Full-text

Improving AdaBoost Classifier to Predict Enterprise Performance after COVID-19

Mathematics ◽

10.3390/math9182215 ◽

2021 ◽

Vol 9 (18) ◽

pp. 2215

Author(s):

Jung-Kai Tsai ◽

Chih-Hsing Hung

Keyword(s):

Machine Learning ◽

Back Propagation ◽

Classification Problem ◽

Back Propagation Neural Network ◽

Future Research ◽

Support Vector ◽

Enterprise Performance ◽

Classification Problems ◽

Proposed Model ◽

Multi Class Classification

Because COVID-19 occurred in 2019, the behavioxr of humans has been changed and it will influence the business model of enterprise. Enterprise cannot predict its development according to past knowledge and experiment; so, it needs a new machine learning framework to predict enterprise performance. The goal of this research is to modify AdaBoost to reasonably predict the enterprise performance. In order to justify the usefulness of the proposed model, enterprise data will be collected and the proposed model can be used to predict the enterprise performance after COVID-19. The test data correct rate of the proposed model will be compared with some of the traditional machine learning models. Compared with the traditional AdaBoost, back propagation neural network (BPNN), regression classifier, support vector machine (SVM) and support vector regression (SVR), the proposed method possesses the better classification ability (average correct rate of the proposed method is 88.04%) in handling two classification problems. Compared with traditional AdaBoost, one-against-all SVM, one-against-one SVM, one-against-all SVR and one-against-one SVR, the classification ability of the proposed method is also relatively better for coping with the multi-class classification problem. Finally, some conclusions and future research will be discussed at the end.

Download Full-text

Improving Intrusion Detection System using an Extreme Learning Machine Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1043.0782s419 ◽

2019 ◽

Vol 8 (2S4) ◽

pp. 234-239

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Back Propagation ◽

Machine Learning Techniques ◽

Support Vector ◽

Extreme Learning Machines ◽

Learning Machines

An Intrusion Detection System (IDS) is a system, that checks the network or data for abnormal actions and when such activity is discovered it issues an alert. Numerous IDS techniques are in use these days but one major problem with all of them is their performance. Various works have been done on this issue using support vector machine and multilayer perceptron. Supervised learning models such as support vector machines with related learning algorithms are used to analyze the data which is used for regression analysis and also classification. The IDS is used in analyzing big data as there is huge traffic which has to be analyzed to check for suspicious activities, and also be successful in doing so. Hence, an efficient and fast classification algorithm is required. Machine learning techniques such as neural networks and extreme machine learning are used. Both of these techniques are highly regarded and are considered one of the best techniques. Extreme learning machines are feed forward neural networks which have one hidden layer and no back propagation used for classification. Once the intrusion is detected using IDS through ELM then we are also going to detect the type of intrusion using the Random Forest Technique (Multi class classification) efficiently with a higher rate of accuracy and precision. The NSL_KDD dataset which is very well-known used for the training as well as testing of these IDS algorithms. This work determines that compared to artificial neural network and logistic regression extreme learning machines provide a much better rate of intrusion detection, which is 93.96% and is also proven to be more efficient in terms of execution time of 38 seconds

Download Full-text

Framework for Providing Security in Private Cloud using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9121.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 7641-7645

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Learning Algorithms ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Cyber Attack ◽

Learning Techniques

The advancement in cyber-attack technologies have ushered in various new attacks which are difficult to detect using traditional intrusion detection systems (IDS).Existing IDS are trained to detect known patterns because of which newer attacks bypass the current IDS and go undetected. In this paper, a two level framework is proposed which can be used to detect unknown new attacks using machine learning techniques. In the first level the known types of classes for attacks are determined using supervised machine learning algorithms such as Support Vector Machine (SVM) and Neural networks (NN). The second level uses unsupervised machine learning algorithms such as K-means. The experimentation is carried out with four models with NSL- KDD dataset in Openstack cloud environment. The Model with Support Vector Machine for supervised machine learning, Gradual Feature Reduction (GFR) for feature selection and K-means for unsupervised algorithm provided the optimum efficiency of 94.56 %.

Download Full-text