scholarly journals Classification of Oncologic Data with Genetic Programming

2009 ◽  
Vol 2009 ◽  
pp. 1-13 ◽  
Author(s):  
Leonardo Vanneschi ◽  
Francesco Archetti ◽  
Mauro Castelli ◽  
Ilaria Giordani

Discovering the models explaining the hidden relationship between genetic material and tumor pathologies is one of the most important open challenges in biology and medicine. Given the large amount of data made available by the DNA Microarray technique, Machine Learning is becoming a popular tool for this kind of investigations. In the last few years, we have been particularly involved in the study of Genetic Programming for mining large sets of biomedical data. In this paper, we present a comparison between four variants of Genetic Programming for the classification of two different oncologic datasets: the first one contains data from healthy colon tissues and colon tissues affected by cancer; the second one contains data from patients affected by two kinds of leukemia (acute myeloid leukemia and acute lymphoblastic leukemia). We report experimental results obtained using two different fitness criteria: the receiver operating characteristic and the percentage of correctly classified instances. These results, and their comparison with the ones obtained by three nonevolutionary Machine Learning methods (Support Vector Machines, MultiBoosting, and Random Forests) on the same data, seem to hint that Genetic Programming is a promising technique for this kind of classification.

2021 ◽  
Vol 163 (A3) ◽  
Author(s):  
B Shabani ◽  
J Ali-Lavroff ◽  
D S Holloway ◽  
S Penev ◽  
D Dessi ◽  
...  

An onboard monitoring system can measure features such as stress cycles counts and provide warnings due to slamming. Considering current technology trends there is the opportunity of incorporating machine learning methods into monitoring systems. A hull monitoring system has been developed and installed on a 111 m wave piercing catamaran (Hull 091) to remotely monitor the ship kinematics and hull structural responses. Parallel to that, an existing dataset of a similar vessel (Hull 061) was analysed using unsupervised and supervised learning models; these were found to be beneficial for the classification of bow entry events according to key kinematic parameters. A comparison of different algorithms including linear support vector machines, naïve Bayes and decision tree for the bow entry classification were conducted. In addition, using empirical probability distributions, the likelihood of wet-deck slamming was estimated given a vertical bow acceleration threshold of 1  in head seas, clustering the feature space with the approximate probabilities of 0.001, 0.030 and 0.25.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5896
Author(s):  
Eddi Miller ◽  
Vladyslav Borysenko ◽  
Moritz Heusinger ◽  
Niklas Niedner ◽  
Bastian Engelmann ◽  
...  

Changeover times are an important element when evaluating the Overall Equipment Effectiveness (OEE) of a production machine. The article presents a machine learning (ML) approach that is based on an external sensor setup to automatically detect changeovers in a shopfloor environment. The door statuses, coolant flow, power consumption, and operator indoor GPS data of a milling machine were used in the ML approach. As ML methods, Decision Trees, Support Vector Machines, (Balanced) Random Forest algorithms, and Neural Networks were chosen, and their performance was compared. The best results were achieved with the Random Forest ML model (97% F1 score, 99.72% AUC score). It was also carried out that model performance is optimal when only a binary classification of a changeover phase and a production phase is considered and less subphases of the changeover process are applied.


Author(s):  
Jebasonia Jebamony ◽  
Dheeba Jacob

Background: Breast cancer is one of the most leading causes of cancer deaths among women. Early detection of cancer increases the survival rate of the affected women. Machine learning approaches that are used for classification of breast cancer usually takes a lot of processing time during the training process. This paper attempts to propose a Machine Learning approach for breast cancer detection in mammograms, which does not depend on the number of training samples. Objective: The paper aims to develop a core vector machine-based diagnosis system for breast cancer detection using the date from MIAS. The main motivation behind using this system is to reduce the computational and memory requirement for large training data and to improve the classification accuracy. Methods: The proposed method has four stages: 1) Pre-processing is done to extract the breast region using global thresholding and enhancement using histogram equalization; 2) identification of potential mass using Otsu thresholding; 3) feature extraction using Laws Texture energy measures; and 4) mass detection is done using Core vector machine (CVM) classifier. Results: Comparative analysis was done with different existing algorithms: Artificial Neural Network (ANN), Support Vector Machine (SVM), and Fuzzy Support Vector Machines (FSVM). The results illustrate that the proposed Core Vector Machine (CVM) classifier produced a promising result in terms of sensitivity (96.9%), misclassification rate (0.0443) and accuracy (95.89%). The time taken for training process is 0.0443, which is less when compared with other machine learning algorithms. Conclusion: Performance analysis shows that CVM classifier is superior to other classifiers like ANN, SVM and FSVM. The computational time of the CVM classifier during the training process was also analysed and found to be better than other discussed algorithms. The results achieved show that CVM classifier is the best algorithm for breast mass detection in mammograms.


Author(s):  
Phuong T. Nguyen ◽  
Juri Di Rocco ◽  
Ludovico Iovino ◽  
Davide Di Ruscio ◽  
Alfonso Pierantonio

AbstractModeling is a ubiquitous activity in the process of software development. In recent years, such an activity has reached a high degree of intricacy, guided by the heterogeneity of the components, data sources, and tasks. The democratized use of models has led to the necessity for suitable machinery for mining modeling repositories. Among others, the classification of metamodels into independent categories facilitates personalized searches by boosting the visibility of metamodels. Nevertheless, the manual classification of metamodels is not only a tedious but also an error-prone task. According to our observation, misclassification is the norm which leads to a reduction in reachability as well as reusability of metamodels. Handling such complexity requires suitable tooling to leverage raw data into practical knowledge that can help modelers with their daily tasks. In our previous work, we proposed AURORA as a machine learning classifier for metamodel repositories. In this paper, we present a thorough evaluation of the system by taking into consideration different settings as well as evaluation metrics. More importantly, we improve the original AURORA tool by changing its internal design. Experimental results demonstrate that the proposed amendment is beneficial to the classification of metamodels. We also compared our approach with two baseline algorithms, namely gradient boosted decision tree and support vector machines. Eventually, we see that AURORA outperforms the baselines with respect to various quality metrics.


Author(s):  
Muhamad Soleh ◽  
Naufal Ammar ◽  
Indrati Sukmadi

Machine learning is a one of computer science field, machine-learning studies how computers are able to learn from data to improve their intelligence. Machine learning consists of many classification methods, including Neural Networks, Support Vector Machines, Logistics Regression, and others. In this study, a classification process carried out using the Logistics Regression method for cases of Diabetes. Diabetes is an increase in glucose in the bloodstream due to a lack of insulin, which is responsible for the transfer of glucose from the blood to tissues or cells. This study created with the aim of improving previous paper. The data used in this study are the same data as previous studies published by the Pima Indian Diabetes Dataset. In this study, several stages used, those are pre-processing, processing, evaluation, and website-based application development. The data in this study divided into two, 75% for training data, and 25% for testing data. This study produces an evaluation with an accuracy 80%, which means it is better than the previous paper, which is 75, 97%.


Author(s):  
F. Pirotti ◽  
F. Sunar ◽  
M. Piragnolo

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. <br><br> In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km<sup>2</sup>, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. <br><br> Validation is carried out using three different approaches: (i) using pixels from the training dataset (<i>train</i>), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (<i>kfold</i>) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (<i>full</i>) and with k-fold cross-validation (<i>kfold</i>) with ten folds. Results from validation of predictions of the whole dataset (<i>full</i>) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.


Author(s):  
F. Pirotti ◽  
F. Sunar ◽  
M. Piragnolo

Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. &lt;br&gt;&lt;br&gt; In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km&lt;sup&gt;2&lt;/sup&gt;, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. &lt;br&gt;&lt;br&gt; Validation is carried out using three different approaches: (i) using pixels from the training dataset (&lt;i&gt;train&lt;/i&gt;), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (&lt;i&gt;kfold&lt;/i&gt;) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (&lt;i&gt;full&lt;/i&gt;) and with k-fold cross-validation (&lt;i&gt;kfold&lt;/i&gt;) with ten folds. Results from validation of predictions of the whole dataset (&lt;i&gt;full&lt;/i&gt;) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.


2020 ◽  
Vol 6 (3) ◽  
pp. 353-356
Author(s):  
Martin Golz ◽  
Sebastian Thomas ◽  
Adolf Schenka

AbstractGMLVQ (Generalized Matrix Relevance Learning Vector Quantization) is a method of machine learning with an adaptive metric. While training, the prototype vectors as well as the weight matrix of the metric are adapted simultaneously. The method is presented in more detail and compared with other machine learning methods employing a fixed metric. It was investigated how accurately the methods can assign the 6-channel EEG of 25 young drivers, who drove overnight in the simulation lab, to the two classes of mild and severe drowsiness. Results of cross-validation show that GMLVQ is at 81.7 ± 1.3 % mean classification accuracy. It is not as accurate as support-vector machines (SVM) and gradient boosting machines (GBM) and cannot exploit the potential of learning adaptive metrics in the case of EEG data. However, information is provided on the relevance of each signal feature from the weighting matrix.


2014 ◽  
Author(s):  
Gokmen Zararsiz ◽  
Dincer Goksuluk ◽  
Selcuk Korkmaz ◽  
Vahap Eldem ◽  
Izzet Parug Duru ◽  
...  

Background RNA sequencing (RNA-Seq) is a powerful technique for transcriptome profiling of the organisms that uses the capabilities of next-generation sequencing (NGS) technologies. Recent advances in NGS let to measure the expression levels of tens to thousands of transcripts simultaneously. Using such information, developing expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of disease. Here, we present the bagging support vector machines (bagSVM), a machine learning approach and bagged ensembles of support vector machines (SVM), for classification of RNA-Seq data. The bagSVM basically uses bootstrap technique and trains each single SVM separately; next it combines the results of each SVM model using majority-voting technique. Results We demonstrate the performance of the bagSVM on simulated and real datasets. Simulated datasets are generated from negative binomial distribution under different scenarios and real datasets are obtained from publicly available resources. A deseq normalization and variance stabilizing transformation (vst) were applied to all datasets. We compared the results with several classifiers including Poisson linear discriminant analysis (PLDA), single SVM, classification and regression trees (CART), and random forests (RF). In slightly overdispersed data, all methods, except CART algorithm, performed well. Performance of PLDA seemed to be best and RF as second best for very slightly and substantially overdispersed datasets. While data become more spread, bagSVM turned out to be the best classifier. In overall results, bagSVM and PLDA had the highest accuracies. Conclusions According to our results, bagSVM algorithm after vst transformation can be a good choice of classifier for RNA-Seq datasets mostly for overdispersed ones. Thus, we recommend researchers to use bagSVM algorithm for the purpose of classification of RNA-Seq data. PLDA algorithm should be a method of choice for slight and moderately overdispersed datasets. An R/BIOCONDUCTOR package MLSeq with a vignette is freely available at http://www.bioconductor.org/packages/2.14/bioc/html/MLSeq.html Keywords: Bagging, machine learning, RNA-Seq classification, support vector machines, transcriptomics


Sign in / Sign up

Export Citation Format

Share Document