scholarly journals Genetic Algorithm Based Approach in Attribute Weighting for a Medical Data Set

2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Kirsi Varpa ◽  
Kati Iltanen ◽  
Martti Juhola

Genetic algorithms have been utilized in many complex optimization and simulation tasks because of their powerful search method. In this research we studied whether the classification performance of the attribute weighted methods based on the nearest neighbour search can be improved when using the genetic algorithm in the evolution of attribute weighting. The attribute weights in the starting population were based on the weights set by the application area experts and machine learning methods instead of random weight setting. The genetic algorithm improved the total classification accuracy and the median true positive rate of the attribute weighted k-nearest neighbour method using neighbour’s class-based attribute weighting. With other methods, the changes after genetic algorithm were moderate.

2014 ◽  
Author(s):  
Andreas Tuerk ◽  
Gregor Wiktorin ◽  
Serhat Güler

Quantification of RNA transcripts with RNA-Seq is inaccurate due to positional fragment bias, which is not represented appropriately by current statistical models of RNA-Seq data. This article introduces the Mix2(rd. "mixquare") model, which uses a mixture of probability distributions to model the transcript specific positional fragment bias. The parameters of the Mix2model can be efficiently trained with the Expectation Maximization (EM) algorithm resulting in simultaneous estimates of the transcript abundances and transcript specific positional biases. Experiments are conducted on synthetic data and the Universal Human Reference (UHR) and Brain (HBR) sample from the Microarray quality control (MAQC) data set. Comparing the correlation between qPCR and FPKM values to state-of-the-art methods Cufflinks and PennSeq we obtain an increase in R2value from 0.44 to 0.6 and from 0.34 to 0.54. In the detection of differential expression between UHR and HBR the true positive rate increases from 0.44 to 0.71 at a false positive rate of 0.1. Finally, the Mix2model is used to investigate biases present in the MAQC data. This reveals 5 dominant biases which deviate from the common assumption of a uniform fragment distribution. The Mix2software is available at http://www.lexogen.com/fileadmin/uploads/bioinfo/mix2model.tgz.


Author(s):  
Lawrence Hall ◽  
Dmitry Goldgof ◽  
Rahul Paul ◽  
Gregory M. Goldgof

<p>Testing for COVID-19 has been unable to keep up with the demand. Further, the false negative rate is projected to be as high as 30% and test results can take some time to obtain. X-ray machines are widely available and provide images for diagnosis quickly. This paper explores how useful chest X-ray images can be in diagnosing COVID-19 disease. We have obtained 135 chest X-rays of COVID-19 and 320 chest X-rays of viral and bacterial pneumonia. </p><p> A pre-trained deep convolutional neural network, Resnet50 was tuned on 102 COVID-19 cases and 102 other pneumonia cases in a 10-fold cross validation. The results were </p><p> an overall accuracy of 89.2% with a COVID-19 true positive rate of 0.8039 and an AUC of 0.95. Pre-trained Resnet50 and VGG16 plus our own small CNN were tuned or trained on a balanced set of COVID-19 and pneumonia chest X-rays. An ensemble of the three types of CNN classifiers was applied to a test set of 33 unseen COVID-19 and 218 pneumonia cases. The overall accuracy was 91.24% with the true positive rate for COVID-19 of 0.7879 with 6.88% false positives for a true negative rate of 0.9312 and AUC of 0.94. </p><p> This preliminary study has flaws, most critically a lack of information about where in the disease process the COVID-19 cases were and the small data set size. More COVID-19 case images at good resolution will enable a better answer to the question of how useful chest X-rays can be for diagnosing COVID-19.</p>


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3005
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Al-Sakib Khan Pathan

While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have applied our previously proposed CC-based FS with random feature grouping (CCFSRFG) to a benchmark cybersecurity dataset as the preprocessing step. The dataset with original features and the dataset with a reduced number of features were used for infrequent pattern detection. Experimental analysis was performed and evaluated using 10 unsupervised anomaly detection techniques. Therefore, the proposed infrequent pattern detection is termed Unsupervised Infrequent Pattern Detection (UIPD). Then, we compared the experimental results with and without FS in terms of true positive rate (TPR). Experimental analysis indicates that the highest rate of TPR improvement was by cluster-based local outlier factor (CBLOF) of the backdoor infrequent pattern detection, and it was 385.91% when using FS. Furthermore, the highest overall infrequent pattern detection TPR was improved by 61.47% for all infrequent patterns using clustering-based multivariate Gaussian outlier score (CMGOS) with FS.


Author(s):  
Cansu Görürgöz ◽  
Kaan Orhan ◽  
Ibrahim Sevki Bayrakdar ◽  
Özer Çelik ◽  
Elif Bilgir ◽  
...  

Objectives: The present study aimed to evaluate the performance of a Faster Region-based Convolutional Neural Network (R-CNN) algorithm for tooth detection and numbering on periapical images. Methods: The data sets of 1686 randomly selected periapical radiographs of patients were collected retrospectively. A pre-trained model (GoogLeNet Inception v3 CNN) was employed for pre-processing, and transfer learning techniques were applied for data set training. The algorithm consisted of: (1) the Jaw classification model, (2) Region detection models, and (3) the Final algorithm using all models. Finally, an analysis of the latest model has been integrated alongside the others. The sensitivity, precision, true-positive rate, and false-positive/negative rate were computed to analyze the performance of the algorithm using a confusion matrix. Results: An artificial intelligence algorithm (CranioCatch, Eskisehir-Turkey) was designed based on R-CNN inception architecture to automatically detect and number the teeth on periapical images. Of 864 teeth in 156 periapical radiographs, 668 were correctly numbered in the test data set. The F1 score, precision, and sensitivity were 0.8720, 0.7812, and 0.9867, respectively. Conclusion: The study demonstrated the potential accuracy and efficiency of the CNN algorithm for detecting and numbering teeth. The deep learning-based methods can help clinicians reduce workloads, improve dental records, and reduce turnaround time for urgent cases. This architecture might also contribute to forensic science.


Author(s):  
Hongxing He ◽  
◽  
Simon Hawkins ◽  
Warwick Graco ◽  
Xin Yao ◽  
...  

In the k-Nearest Neighbour (kNN) algorithm, the classification of a new sample is determined by the class of its k nearest neighbours. The performance of the kNN algorithm is influenced by three main factors: (1) the distance metric used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample. Using k = 1, 3, or 5 nearest neighbours, this study uses a Genetic Algorithm (GA) to find the optimal non-Euclidean distance metric in the kNN algorithm and examines two alternative methods (Majority Rule and Bayes Rule) to derive a classification from the k nearest neighbours. This modified algorithm was evaluated on two real-world medical fraud problems. The General Practitioner (GP) database is a 2-class problem in which GPs are classified as either practising appropriately or inappropriately. The ’.Doctor-Shoppers’ database is a 5-class problem in which patients are classified according to the likelihood that they are ’doctor-shoppers’. Doctor-shoppers are patients who consult many physicians in order to obtain multiple prescriptions of drugs of addiction in excess of their own therapeutic need. In both applications, classification accuracy was improved by optimising the distance metric in the kNN algorithm. The agreement rate on the GP dataset improved from around 70% (using Euclidean distance) to 78 % (using an optimised distance metric), and from about 55% to 82% on the Doctor Shopper’s dataset. Differences in either the decision rule or the number of nearest neighbours had little or no impact on the classification performance of the kNN algorithm. The excellent performance of the kNN algorithm when the distance metric is optimised using a genetic algorithm paves the way for its application in the real world fraud detection problems faced by the Health Insurance Commission (HIC).


2019 ◽  
Vol 14 (3) ◽  
pp. 628-661 ◽  
Author(s):  
Bikash Kanti Sarkar ◽  
Shib Sankar Sana

Purpose The purpose of this study is to alleviate the specified issues to a great extent. To promote patients’ health via early prediction of diseases, knowledge extraction using data mining approaches shows an integral part of e-health system. However, medical databases are highly imbalanced, voluminous, conflicting and complex in nature, and these can lead to erroneous diagnosis of diseases (i.e. detecting class-values of diseases). In literature, numerous standard disease decision support system (DDSS) have been proposed, but most of them are disease specific. Also, they usually suffer from several drawbacks like lack of understandability, incapability of operating rare cases, inefficiency in making quick and correct decision, etc. Design/methodology/approach Addressing the limitations of the existing systems, the present research introduces a two-step framework for designing a DDSS, in which the first step (data-level optimization) deals in identifying an optimal data-partition (Popt) for each disease data set and then the best training set for Popt in parallel manner. On the other hand, the second step explores a generic predictive model (integrating C4.5 and PRISM learners) over the discovered information for effective diagnosis of disease. The designed model is a generic one (i.e. not disease specific). Findings The empirical results (in terms of top three measures, namely, accuracy, true positive rate and false positive rate) obtained over 14 benchmark medical data sets (collected from https://archive.ics.uci.edu/ml) demonstrate that the hybrid model outperforms the base learners in almost all cases for initial diagnosis of the diseases. After all, the proposed DDSS may work as an e-doctor to detect diseases. Originality/value The model designed in this study is original, and the necessary parallelized methods are implemented in C on Cluster HPC machine (FUJITSU) with total 256 cores (under one Master node).


Author(s):  
Lawrence Hall ◽  
Dmitry Goldgof ◽  
Rahul Paul ◽  
Gregory M. Goldgof

<p>esting for COVID-19 has been unable to keep up with the demand. Further, the false negative rate is projected to be as high as 30\ and test results can take some time to obtain. X-ray machines are widely available and provide images for diagnosis quickly. This paper explores how useful chest X-ray images can be in diagnosing COVID-19 disease. We have obtained 135 chest X-rays of COVID-19 and 320 chest X-rays of viral and bacterial pneumonia. </p><p> A pre-trained deep convolutional neural network, Resnet50 was tuned on 102 COVID-19 cases and 102 other pneumonia cases in a 10-fold cross validation. The results were </p><p> an overall accuracy of 90.7% with a COVID-19 true positive rate of 0.83 and an AUC of 0.987,</p><p> Pre-trained Resnet50 and VGG16 plus our own small CNN were tuned or trained on a balanced set of COVID-19 and pneumonia chest X-rays. An ensemble of the three types of CNN classifiers was applied to a test set of 33 unseen COVID-19 and 208 pneumonia cases. The overall accuracy was 94.4% with the true positive rate for COVID-19 of 0.969 with 6% false positives for a true negative rate of 0.94 and AUC of 0.99. </p><p> </p><p> This preliminary study has flaws, most critically a lack of information about where in the disease process the COVID-19 cases were and the small data set size. More COVID-19 case images at good resolution will enable a better answer to the question of how useful chest X-rays can be for diagnosing COVID-19. </p><p> </p><p> Note an earlier version of this work inadvertently used chest X-rays of viral and bacterial pneumonia that came from a dataset of children under 5 years old and those results should be ignored. </p>


2019 ◽  
Vol 8 (4) ◽  
pp. 7455-7458

Our aims are to find the accuracy of classification with the normalisation in different types and the features in the techniques of selection on Diabetic Mellitus and the Pima Indian Diabetic dataset. Data Mining is the process of extraction. It extracts the previous unknown, valid and important information from the large amount of the data bases and can make the crucial decisions using the information. The classification methods are K-Nearest Neighbour and J48 decision tree can be applied to the data set of original and as well as the dataset with the pre-processed dataset. All the process of pre-processing can be applied to Pima Indian Diabetic Dataset to analyse the classification performance in terms of accuracy rate. The performance metrics is used to identify the accuracy classification is Recall, F-measure, Sensitivity and specificity, Precision, and Accuracy. The simulation is done by R tool.


Author(s):  
Lawrence Hall ◽  
Dmitry Goldgof ◽  
Rahul Paul ◽  
Gregory M. Goldgof

<p>Testing for COVID-19 has been unable to keep up with the demand. Further, the false negative rate is projected to be as high as 30% and test results can take some time to obtain. X-ray machines are widely available and provide images for diagnosis quickly. This paper explores how useful chest X-ray images can be in diagnosing COVID-19 disease. We have obtained 135 chest X-rays of COVID-19 and 320 chest X-rays of viral and bacterial pneumonia. </p><p> A pre-trained deep convolutional neural network, Resnet50 was tuned on 102 COVID-19 cases and 102 other pneumonia cases in a 10-fold cross validation. The results were </p><p> an overall accuracy of 89.2% with a COVID-19 true positive rate of 0.8039 and an AUC of 0.95. Pre-trained Resnet50 and VGG16 plus our own small CNN were tuned or trained on a balanced set of COVID-19 and pneumonia chest X-rays. An ensemble of the three types of CNN classifiers was applied to a test set of 33 unseen COVID-19 and 218 pneumonia cases. The overall accuracy was 91.24% with the true positive rate for COVID-19 of 0.7879 with 6.88% false positives for a true negative rate of 0.9312 and AUC of 0.94. </p><p> This preliminary study has flaws, most critically a lack of information about where in the disease process the COVID-19 cases were and the small data set size. More COVID-19 case images at good resolution will enable a better answer to the question of how useful chest X-rays can be for diagnosing COVID-19.</p>


2018 ◽  
Vol 164 ◽  
pp. 01023 ◽  
Author(s):  
Kanyanut Homsapaya ◽  
Ohm Sornil

Classification performance is adversely impacted by noisy data .Selecting features relevant to the problem is thus a critical step in classification and difficult to achieve accurate solution, especially when applied to a large data set. In this article, we propose a novel filter-based floating search technique for feature selection to select an optimal set of features for classification purposes. A genetic algorithm is utilized to increase the quality of features selected at each iteration. A criterion function is applied to choose relevant and high-quality features which can improve classification accuracy. The method is evaluated using 20 standard machine learning datasets of various sizes and complexities. Experimental results with the datasets show that the proposed method is effective and performs well in comparison with previously reported techniques.


Sign in / Sign up

Export Citation Format

Share Document