Specification Inference and Invariant Generation: A Machine Learning Perspective

Mapping Intimacies ◽

10.29007/tx1s ◽

2018 ◽

Author(s):

Aditya Nori

Keyword(s):

Machine Learning ◽

Program Verification ◽

Linear Structure ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Underlying Structure ◽

New Information ◽

Efficient Program ◽

Via Classification

Computing good specification and invariants is key to effectiveand efficient program verification. In this talk, I will describeour experiences in using machine learning techniques (Bayesianinference, SVMs) for computing specifications and invariantsuseful for program verification. The first project Merlin usesBayesian inference in order to automatically infer securityspecifications of programs. A novel feature of Merlin is that itcan infer specifications even when the code under analysis givesrise to conflicting constraints, a situation that typicallyoccurs when there are bugs. We have used Merlin to infer securityspecifications of 10 large business critical webapplications. Furthermore, we show that these specifications canbe used to detect new information flow security vulnerabilitiesin these applications.In the second project Interpol, we show how interpolants can beviewed as classifiers in supervised machine learning. This viewhas several advantages: First, we are able to use off-the-shelfclassification techniques, in particular support vectormachines (SVMs), for interpolation. Second, we show that SVMs canfind relevant predicates for a number of benchmarks. Sinceclassification algorithms are predictive, the interpolantscomputed via classification are likely to be relevant predicatesor invariants. Finally, the machine learning view also enables usto handle superficial non-linearities. Even if the underlyingproblem structure is linear, the symbolic constraints can give animpression that we are solving a non-linear problem. Sincelearning algorithms try to mine the underlying structuredirectly, we can discover the linear structure for suchproblems. We demonstrate the feasibility of Interpol viaexperiments over benchmarks from various papers on programverification.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text

Improved Intrusion Detection Algorithm based on TLBO and GA Algorithms

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/2/5 ◽

2021 ◽

Vol 18 (2) ◽

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Optimization Algorithm ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Feature Subset ◽

Teaching Learning Based Optimization ◽

Teaching Learning

Optimization algorithms are widely used for the identification of intrusion. This is attributable to the increasing number of audit data features and the decreasing performance of human-based smart Intrusion Detection Systems (IDS) regarding classification accuracy and training time. In this paper, an improved method for intrusion detection for binary classification was presented and discussed in detail. The proposed method combined the New Teaching-Learning-Based Optimization Algorithm (NTLBO), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Logistic Regression (LR) (feature selection and weighting) NTLBO algorithm with supervised machine learning techniques for Feature Subset Selection (FSS). The process of selecting the least number of features without any effect on the result accuracy in FSS was considered a multi-objective optimization problem. The NTLBO was proposed in this paper as an FSS mechanism; its algorithm-specific, parameter-less concept (which requires no parameter tuning during an optimization) was explored. The experiments were performed on the prominent intrusion machine-learning datasets (KDDCUP’99 and CICIDS 2017), where significant enhancements were observed with the suggested NTLBO algorithm as compared to the classical Teaching-Learning-Based Optimization algorithm (TLBO), NTLBO presented better results than TLBO and many existing works. The results showed that NTLBO reached 100% accuracy for KDDCUP’99 dataset and 97% for CICIDS dataset

Download Full-text

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text

Framework for Providing Security in Private Cloud using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9121.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 7641-7645

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Learning Algorithms ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Cyber Attack ◽

Learning Techniques

The advancement in cyber-attack technologies have ushered in various new attacks which are difficult to detect using traditional intrusion detection systems (IDS).Existing IDS are trained to detect known patterns because of which newer attacks bypass the current IDS and go undetected. In this paper, a two level framework is proposed which can be used to detect unknown new attacks using machine learning techniques. In the first level the known types of classes for attacks are determined using supervised machine learning algorithms such as Support Vector Machine (SVM) and Neural networks (NN). The second level uses unsupervised machine learning algorithms such as K-means. The experimentation is carried out with four models with NSL- KDD dataset in Openstack cloud environment. The Model with Support Vector Machine for supervised machine learning, Gradual Feature Reduction (GFR) for feature selection and K-means for unsupervised algorithm provided the optimum efficiency of 94.56 %.

Download Full-text

A Classification Approach for Predicting COVID-19 Patient Survival Outcome with Machine Learning Techniques

10.1101/2020.08.02.20129767 ◽

2020 ◽

Author(s):

Abdulhameed Ado Osi ◽

Hussaini Garba Dikko ◽

Mannir Abdu ◽

Auwalu Ibrahim ◽

Lawan Adamu Isma'il ◽

...

Keyword(s):

Machine Learning ◽

Survival Outcome ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

P Value ◽

Kappa Index ◽

Quality Health Care ◽

Linear Discriminant ◽

Learning Techniques

COVID-19 is an infectious disease discovered after the outbreak began in Wuhan, China, in December 2019. COVID-19 is still becoming an increasing global threat to public health. The virus has been escalated to many countries across the globe. This paper analyzed and compared the performance of three different supervised machine learning techniques; Linear Discriminant Analysis (LDA), Random Forest (RF), and Support Vector Machine (SVM) on COVID-19 dataset. The best level of accuracy between these three algorithms was determined by comparison of some metrics for assessing predictive performance such as accuracy, sensitivity, specificity, F-score, Kappa index, and ROC. From the analysis results, RF was found to be the best algorithm with 100% prediction accuracy in comparison with LDA and SVM with 95.2% and 90.9% respectively. Our analysis shows that out of these three classification models RF predicts COVID-19 patient's survival outcome with the highest accuracy. Chi-square test reveals that all the seven features except sex were significantly correlated with the COVID-19 patient's outcome (P-value < 0.005). Therefore, RF was recommended for COVID-19 patient outcome prediction that will help in early identification of possible sensitive cases for quick provision of quality health care, support and supervision.

Download Full-text

A Supervised Machine Learning Approach to Detect the On/Off State in Parkinson’s Disease Using Wearable Based Gait Signals

Diagnostics ◽

10.3390/diagnostics10060421 ◽

2020 ◽

Vol 10 (6) ◽

pp. 421

Author(s):

Satyabrata Aich ◽

Jinyoung Youn ◽

Sabyasachi Chakraborty ◽

Pyari Mohan Pradhan ◽

Jin-han Park ◽

...

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Wearable Devices ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Healthcare Applications ◽

Reported Data

Fluctuations in motor symptoms are mostly observed in Parkinson’s disease (PD) patients. This characteristic is inevitable, and can affect the quality of life of the patients. However, it is difficult to collect precise data on the fluctuation characteristics using self-reported data from PD patients. Therefore, it is necessary to develop a suitable technology that can detect the medication state, also termed the “On”/“Off” state, automatically using wearable devices; at the same time, this could be used in the home environment. Recently, wearable devices, in combination with powerful machine learning techniques, have shown the potential to be effectively used in critical healthcare applications. In this study, an algorithm is proposed that can detect the medication state automatically using wearable gait signals. A combination of features that include statistical features and spatiotemporal gait features are used as inputs to four different classifiers such as random forest, support vector machine, K nearest neighbour, and Naïve Bayes. In total, 20 PD subjects with definite motor fluctuations have been evaluated by comparing the performance of the proposed algorithm in association with the four aforementioned classifiers. It was found that random forest outperformed the other classifiers with an accuracy of 96.72%, a recall of 97.35%, and a precision of 96.92%.

Download Full-text

Classification of Agriculture Farm Machinery Using Machine Learning and Internet of Things

Symmetry ◽

10.3390/sym13030403 ◽

2021 ◽

Vol 13 (3) ◽

pp. 403

Author(s):

Muhammad Waleed ◽

Tai-Won Um ◽

Tariq Kamal ◽

Syed Muhammad Usman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Farm Machinery ◽

Learning Techniques

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.

Download Full-text

Application of Raw Accelerometer Data and Machine-Learning Techniques to Characterize Human Movement Behavior: A Systematic Scoping Review

Journal of Physical Activity and Health ◽

10.1123/jpah.2019-0088 ◽

2020 ◽

Vol 17 (3) ◽

pp. 360-383 ◽

Cited By ~ 4

Author(s):

Anantha Narayanan ◽

Farzanah Desai ◽

Tom Stewart ◽

Scott Duncan ◽

Lisa Mackay

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Scoping Review ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Movement Behavior ◽

Support Vector ◽

Accelerometer Data ◽

Learning Techniques

Background: Application of machine learning for classifying human behavior is increasingly common as access to raw accelerometer data improves. The aims of this scoping review are (1) to examine if machine-learning techniques can accurately identify human activity behaviors from raw accelerometer data and (2) to summarize the practical implications of these machine-learning techniques for future work. Methods: Keyword searches were performed in Scopus, Web of Science, and EBSCO databases in 2018. Studies that applied supervised machine-learning techniques to raw accelerometer data and estimated components of physical activity were included. Information on study characteristics, machine-learning techniques, and key study findings were extracted from included studies. Results: Of the 53 studies included in the review, 75% were published in the last 5 years. Most studies predicted postures and activity type, rather than intensity, and were conducted in controlled environments using 1 or 2 devices. The most common models were support vector machine, random forest, and artificial neural network. Overall, classification accuracy ranged from 62% to 99.8%, although nearly 80% of studies achieved an overall accuracy above 85%. Conclusions: Machine-learning algorithms demonstrate good accuracy when predicting physical activity components; however, their application to free-living settings is currently uncertain.

Download Full-text