Classification of Oncologic Data with Genetic Programming

A machine learning based method for classification of fractal features of forearm sEMG using Twin Support vector machines

2010 Annual International Conference of the IEEE Engineering in Medicine and Biology ◽

10.1109/iembs.2010.5627902 ◽

2010 ◽

Cited By ~ 12

Author(s):

S P Arjunan ◽

D K Kumar ◽

G R Naik

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Support Vector ◽

Twin Support Vector Machines ◽

Vector Machines

Download Full-text

Using Remote Monitoring and Machine Learning to Classify Slam Events of Wave Piercing Catamarans

The International Journal of Maritime Engineering ◽

10.5750/ijme.v163ia3.797 ◽

2021 ◽

Vol 163 (A3) ◽

Author(s):

B Shabani ◽

J Ali-Lavroff ◽

D S Holloway ◽

S Penev ◽

D Dessi ◽

...

Keyword(s):

Machine Learning ◽

Monitoring System ◽

Probability Distributions ◽

Feature Space ◽

Support Vector ◽

Acceleration Threshold ◽

Vector Machines ◽

Structural Responses ◽

As Stress

An onboard monitoring system can measure features such as stress cycles counts and provide warnings due to slamming. Considering current technology trends there is the opportunity of incorporating machine learning methods into monitoring systems. A hull monitoring system has been developed and installed on a 111 m wave piercing catamaran (Hull 091) to remotely monitor the ship kinematics and hull structural responses. Parallel to that, an existing dataset of a similar vessel (Hull 061) was analysed using unsupervised and supervised learning models; these were found to be beneficial for the classification of bow entry events according to key kinematic parameters. A comparison of different algorithms including linear support vector machines, naïve Bayes and decision tree for the bow entry classification were conducted. In addition, using empirical probability distributions, the likelihood of wet-deck slamming was estimated given a vertical bow acceleration threshold of 1 in head seas, clustering the feature space with the approximate probabilities of 0.001, 0.030 and 0.25.

Download Full-text

Enhanced Changeover Detection in Industry 4.0 Environments with Machine Learning

Sensors ◽

10.3390/s21175896 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5896

Author(s):

Eddi Miller ◽

Vladyslav Borysenko ◽

Moritz Heusinger ◽

Niklas Niedner ◽

Bastian Engelmann ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Binary Classification ◽

Model Performance ◽

Support Vector ◽

Milling Machine ◽

Vector Machines ◽

Changeover Times ◽

Flow Power

Changeover times are an important element when evaluating the Overall Equipment Effectiveness (OEE) of a production machine. The article presents a machine learning (ML) approach that is based on an external sensor setup to automatically detect changeovers in a shopfloor environment. The door statuses, coolant flow, power consumption, and operator indoor GPS data of a milling machine were used in the ML approach. As ML methods, Decision Trees, Support Vector Machines, (Balanced) Random Forest algorithms, and Neural Networks were chosen, and their performance was compared. The best results were achieved with the Random Forest ML model (97% F1 score, 99.72% AUC score). It was also carried out that model performance is optimal when only a binary classification of a changeover phase and a production phase is considered and less subphases of the changeover process are applied.

Download Full-text

Classification of Benign and Malignant Breast Masses on Mammograms for Large Datasets using Core Vector Machines

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405615666190801121506 ◽

2020 ◽

Vol 16 (6) ◽

pp. 703-710 ◽

Cited By ~ 1

Author(s):

Jebasonia Jebamony ◽

Dheeba Jacob

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Detection ◽

Breast Cancer Detection ◽

Misclassification Rate ◽

Support Vector ◽

Mass Detection ◽

Training Process ◽

Vector Machines

Background: Breast cancer is one of the most leading causes of cancer deaths among women. Early detection of cancer increases the survival rate of the affected women. Machine learning approaches that are used for classification of breast cancer usually takes a lot of processing time during the training process. This paper attempts to propose a Machine Learning approach for breast cancer detection in mammograms, which does not depend on the number of training samples. Objective: The paper aims to develop a core vector machine-based diagnosis system for breast cancer detection using the date from MIAS. The main motivation behind using this system is to reduce the computational and memory requirement for large training data and to improve the classification accuracy. Methods: The proposed method has four stages: 1) Pre-processing is done to extract the breast region using global thresholding and enhancement using histogram equalization; 2) identification of potential mass using Otsu thresholding; 3) feature extraction using Laws Texture energy measures; and 4) mass detection is done using Core vector machine (CVM) classifier. Results: Comparative analysis was done with different existing algorithms: Artificial Neural Network (ANN), Support Vector Machine (SVM), and Fuzzy Support Vector Machines (FSVM). The results illustrate that the proposed Core Vector Machine (CVM) classifier produced a promising result in terms of sensitivity (96.9%), misclassification rate (0.0443) and accuracy (95.89%). The time taken for training process is 0.0443, which is less when compared with other machine learning algorithms. Conclusion: Performance analysis shows that CVM classifier is superior to other classifiers like ANN, SVM and FSVM. The computational time of the CVM classifier during the training process was also analysed and found to be better than other discussed algorithms. The results achieved show that CVM classifier is the best algorithm for breast mass detection in mammograms.

Download Full-text

Evaluation of a machine learning classifier for metamodels

Software & Systems Modeling ◽

10.1007/s10270-021-00913-x ◽

2021 ◽

Author(s):

Phuong T. Nguyen ◽

Juri Di Rocco ◽

Ludovico Iovino ◽

Davide Di Ruscio ◽

Alfonso Pierantonio

Keyword(s):

Machine Learning ◽

Practical Knowledge ◽

Support Vector ◽

Learning Classifier ◽

Vector Machines ◽

Boosted Decision Tree ◽

High Degree ◽

Manual Classification ◽

Use Of Models

AbstractModeling is a ubiquitous activity in the process of software development. In recent years, such an activity has reached a high degree of intricacy, guided by the heterogeneity of the components, data sources, and tasks. The democratized use of models has led to the necessity for suitable machinery for mining modeling repositories. Among others, the classification of metamodels into independent categories facilitates personalized searches by boosting the visibility of metamodels. Nevertheless, the manual classification of metamodels is not only a tedious but also an error-prone task. According to our observation, misclassification is the norm which leads to a reduction in reachability as well as reusability of metamodels. Handling such complexity requires suitable tooling to leverage raw data into practical knowledge that can help modelers with their daily tasks. In our previous work, we proposed AURORA as a machine learning classifier for metamodel repositories. In this paper, we present a thorough evaluation of the system by taking into consideration different settings as well as evaluation metrics. More importantly, we improve the original AURORA tool by changing its internal design. Experimental results demonstrate that the proposed amendment is beneficial to the classification of metamodels. We also compared our approach with two baseline algorithms, namely gradient boosted decision tree and support vector machines. Eventually, we see that AURORA outperforms the baselines with respect to various quality metrics.

Download Full-text

Website-Based Application for Classification of Diabetes Using Logistic Regression Method

Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi) ◽

10.24843/jim.2021.v09.i01.p03 ◽

2021 ◽

pp. 23

Author(s):

Muhamad Soleh ◽

Naufal Ammar ◽

Indrati Sukmadi

Keyword(s):

Machine Learning ◽

Regression Method ◽

Training Data ◽

Support Vector ◽

Application Development ◽

Science Field ◽

Logistics Regression ◽

Vector Machines ◽

Logistic Regression Method

Machine learning is a one of computer science field, machine-learning studies how computers are able to learn from data to improve their intelligence. Machine learning consists of many classification methods, including Neural Networks, Support Vector Machines, Logistics Regression, and others. In this study, a classification process carried out using the Logistics Regression method for cases of Diabetes. Diabetes is an increase in glucose in the bloodstream due to a lack of insulin, which is responsible for the transfer of glucose from the blood to tissues or cells. This study created with the aim of improving previous paper. The data used in this study are the same data as previous studies published by the Pima Indian Diabetes Dataset. In this study, several stages used, those are pre-processing, processing, evaluation, and website-based application development. The data in this study divided into two, 75% for training data, and 25% for testing data. This study produces an evaluation with an accuracy 80%, which means it is better than the previous paper, which is 75, 97%.

Download Full-text

BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-335-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 335-340 ◽

Cited By ~ 4

Author(s):

F. Pirotti ◽

F. Sunar ◽

M. Piragnolo

Keyword(s):

Machine Learning ◽

Land Cover ◽

Cross Validation ◽

Training Dataset ◽

Support Vector ◽

Machine Learning Methods ◽

Control Dataset ◽

Vector Machines ◽

Sentinel 2

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

Download Full-text

BENCHMARK OF MACHINE LEARNING METHODS FOR CLASSIFICATION OF A SENTINEL-2 IMAGE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-335-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 335-340 ◽

Cited By ~ 12

Author(s):

F. Pirotti ◽

F. Sunar ◽

M. Piragnolo

Keyword(s):

Machine Learning ◽

Land Cover ◽

Cross Validation ◽

Training Dataset ◽

Support Vector ◽

Machine Learning Methods ◽

Control Dataset ◽

Vector Machines ◽

Sentinel 2

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

Download Full-text

EEG-Based Classification of the Driver Alertness State

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-3091 ◽

2020 ◽

Vol 6 (3) ◽

pp. 353-356

Author(s):

Martin Golz ◽

Sebastian Thomas ◽

Adolf Schenka

Keyword(s):

Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Weighting Matrix ◽

Machine Learning Methods ◽

Young Drivers ◽

Eeg Data ◽

Vector Machines ◽

Generalized Matrix

AbstractGMLVQ (Generalized Matrix Relevance Learning Vector Quantization) is a method of machine learning with an adaptive metric. While training, the prototype vectors as well as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with other machine learning methods employing a fixed metric. It was investigated how accurately the methods can assign the 6-channel EEG of 25 young drivers, who drove overnight in the simulation lab, to the two classes of mild and severe drowsiness. Results of cross-validation show that GMLVQ is at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting machines (GBM) and cannot exploit the potential of learning adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from the weighting matrix.

Download Full-text

Classification of RNA-Seq Data via Bagging Support Vector Machines

10.1101/007526 ◽

2014 ◽

Cited By ~ 4

Author(s):

Gokmen Zararsiz ◽

Dincer Goksuluk ◽

Selcuk Korkmaz ◽

Vahap Eldem ◽

Izzet Parug Duru ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Negative Binomial ◽

Majority Voting ◽

Support Vector ◽

Rna Seq ◽

Linear Discriminant ◽

Vector Machines ◽

Cart Algorithm

Background RNA sequencing (RNA-Seq) is a powerful technique for transcriptome profiling of the organisms that uses the capabilities of next-generation sequencing (NGS) technologies. Recent advances in NGS let to measure the expression levels of tens to thousands of transcripts simultaneously. Using such information, developing expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of disease. Here, we present the bagging support vector machines (bagSVM), a machine learning approach and bagged ensembles of support vector machines (SVM), for classification of RNA-Seq data. The bagSVM basically uses bootstrap technique and trains each single SVM separately; next it combines the results of each SVM model using majority-voting technique. Results We demonstrate the performance of the bagSVM on simulated and real datasets. Simulated datasets are generated from negative binomial distribution under different scenarios and real datasets are obtained from publicly available resources. A deseq normalization and variance stabilizing transformation (vst) were applied to all datasets. We compared the results with several classifiers including Poisson linear discriminant analysis (PLDA), single SVM, classification and regression trees (CART), and random forests (RF). In slightly overdispersed data, all methods, except CART algorithm, performed well. Performance of PLDA seemed to be best and RF as second best for very slightly and substantially overdispersed datasets. While data become more spread, bagSVM turned out to be the best classifier. In overall results, bagSVM and PLDA had the highest accuracies. Conclusions According to our results, bagSVM algorithm after vst transformation can be a good choice of classifier for RNA-Seq datasets mostly for overdispersed ones. Thus, we recommend researchers to use bagSVM algorithm for the purpose of classification of RNA-Seq data. PLDA algorithm should be a method of choice for slight and moderately overdispersed datasets. An R/BIOCONDUCTOR package MLSeq with a vignette is freely available at http://www.bioconductor.org/packages/2.14/bioc/html/MLSeq.html Keywords: Bagging, machine learning, RNA-Seq classification, support vector machines, transcriptomics

Download Full-text