Genetic Algorithm Based Approach in Attribute Weighting for a Medical Data Set

Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates

10.1101/011767 ◽

2014 ◽

Author(s):

Andreas Tuerk ◽

Gregor Wiktorin ◽

Serhat Güler

Keyword(s):

Probability Distributions ◽

False Positive Rate ◽

Synthetic Data ◽

True Positive Rate ◽

Rna Seq ◽

Microarray Quality Control ◽

Data Set ◽

Rna Transcripts ◽

Positive Rate ◽

Fragment Distribution

Quantification of RNA transcripts with RNA-Seq is inaccurate due to positional fragment bias, which is not represented appropriately by current statistical models of RNA-Seq data. This article introduces the Mix2(rd. "mixquare") model, which uses a mixture of probability distributions to model the transcript specific positional fragment bias. The parameters of the Mix2model can be efficiently trained with the Expectation Maximization (EM) algorithm resulting in simultaneous estimates of the transcript abundances and transcript specific positional biases. Experiments are conducted on synthetic data and the Universal Human Reference (UHR) and Brain (HBR) sample from the Microarray quality control (MAQC) data set. Comparing the correlation between qPCR and FPKM values to state-of-the-art methods Cufflinks and PennSeq we obtain an increase in R2value from 0.44 to 0.6 and from 0.34 to 0.54. In the detection of differential expression between UHR and HBR the true positive rate increases from 0.44 to 0.71 at a false positive rate of 0.1. Finally, the Mix2model is used to investigate biases present in the MAQC data. This reveals 5 dominant biases which deviate from the common assumption of a uniform fragment distribution. The Mix2software is available at http://www.lexogen.com/fileadmin/uploads/bioinfo/mix2model.tgz.

Download Full-text

Finding COVID-19 from Chest X-rays using Deep Learning on a Small Dataset

10.36227/techrxiv.12083964 ◽

2020 ◽

Cited By ~ 1

Author(s):

Lawrence Hall ◽

Dmitry Goldgof ◽

Rahul Paul ◽

Gregory M. Goldgof

Keyword(s):

False Negative ◽

True Positive Rate ◽

Small Data ◽

Good Resolution ◽

X Rays ◽

True Positive ◽

Data Set ◽

X Ray ◽

Negative Rate ◽

Positive Rate

Testing for COVID-19 has been unable to keep up with the demand. Further, the false negative rate is projected to be as high as 30% and test results can take some time to obtain. X-ray machines are widely available and provide images for diagnosis quickly. This paper explores how useful chest X-ray images can be in diagnosing COVID-19 disease. We have obtained 135 chest X-rays of COVID-19 and 320 chest X-rays of viral and bacterial pneumonia. A pre-trained deep convolutional neural network, Resnet50 was tuned on 102 COVID-19 cases and 102 other pneumonia cases in a 10-fold cross validation. The results were an overall accuracy of 89.2% with a COVID-19 true positive rate of 0.8039 and an AUC of 0.95. Pre-trained Resnet50 and VGG16 plus our own small CNN were tuned or trained on a balanced set of COVID-19 and pneumonia chest X-rays. An ensemble of the three types of CNN classifiers was applied to a test set of 33 unseen COVID-19 and 218 pneumonia cases. The overall accuracy was 91.24% with the true positive rate for COVID-19 of 0.7879 with 6.88% false positives for a true negative rate of 0.9312 and AUC of 0.94. This preliminary study has flaws, most critically a lack of information about where in the disease process the COVID-19 cases were and the small data set size. More COVID-19 case images at good resolution will enable a better answer to the question of how useful chest X-rays can be for diagnosing COVID-19.

Download Full-text

Infrequent Pattern Detection for Reliable Network Traffic Analysis Using Robust Evolutionary Computation

Sensors ◽

10.3390/s21093005 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3005

Author(s):

A. N. M. Bazlur Rashid ◽

Mohiuddin Ahmed ◽

Al-Sakib Khan Pathan

Keyword(s):

Anomaly Detection ◽

Experimental Analysis ◽

True Positive Rate ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Pattern Detection ◽

Detection Techniques ◽

Feature Grouping ◽

Positive Rate ◽

Unsupervised Anomaly Detection

While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have applied our previously proposed CC-based FS with random feature grouping (CCFSRFG) to a benchmark cybersecurity dataset as the preprocessing step. The dataset with original features and the dataset with a reduced number of features were used for infrequent pattern detection. Experimental analysis was performed and evaluated using 10 unsupervised anomaly detection techniques. Therefore, the proposed infrequent pattern detection is termed Unsupervised Infrequent Pattern Detection (UIPD). Then, we compared the experimental results with and without FS in terms of true positive rate (TPR). Experimental analysis indicates that the highest rate of TPR improvement was by cluster-based local outlier factor (CBLOF) of the backdoor infrequent pattern detection, and it was 385.91% when using FS. Furthermore, the highest overall infrequent pattern detection TPR was improved by 61.47% for all infrequent patterns using clustering-based multivariate Gaussian outlier score (CMGOS) with FS.

Download Full-text

Performance of a convolutional neural network algorithm for tooth detection and numbering on periapical radiographs

Dentomaxillofacial Radiology ◽

10.1259/dmfr.20210246 ◽

2021 ◽

Author(s):

Cansu Görürgöz ◽

Kaan Orhan ◽

Ibrahim Sevki Bayrakdar ◽

Özer Çelik ◽

Elif Bilgir ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Confusion Matrix ◽

True Positive Rate ◽

Turnaround Time ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Region Detection ◽

Positive Rate

Objectives: The present study aimed to evaluate the performance of a Faster Region-based Convolutional Neural Network (R-CNN) algorithm for tooth detection and numbering on periapical images. Methods: The data sets of 1686 randomly selected periapical radiographs of patients were collected retrospectively. A pre-trained model (GoogLeNet Inception v3 CNN) was employed for pre-processing, and transfer learning techniques were applied for data set training. The algorithm consisted of: (1) the Jaw classification model, (2) Region detection models, and (3) the Final algorithm using all models. Finally, an analysis of the latest model has been integrated alongside the others. The sensitivity, precision, true-positive rate, and false-positive/negative rate were computed to analyze the performance of the algorithm using a confusion matrix. Results: An artificial intelligence algorithm (CranioCatch, Eskisehir-Turkey) was designed based on R-CNN inception architecture to automatically detect and number the teeth on periapical images. Of 864 teeth in 156 periapical radiographs, 668 were correctly numbered in the test data set. The F1 score, precision, and sensitivity were 0.8720, 0.7812, and 0.9867, respectively. Conclusion: The study demonstrated the potential accuracy and efficiency of the CNN algorithm for detecting and numbering teeth. The deep learning-based methods can help clinicians reduce workloads, improve dental records, and reduce turnaround time for urgent cases. This architecture might also contribute to forensic science.

Download Full-text

Application of Genetic Algorithm and K-Nearest Neighbour Method in Real World Medical Fraud Detection Problem

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2000.p0130 ◽

2000 ◽

Vol 4 (2) ◽

pp. 130-137 ◽

Cited By ~ 11

Author(s):

Hongxing He ◽

◽

Simon Hawkins ◽

Warwick Graco ◽

Xin Yao ◽

...

Keyword(s):

Genetic Algorithm ◽

Decision Rule ◽

Real World ◽

Euclidean Distance ◽

Fraud Detection ◽

Classification Performance ◽

Alternative Methods ◽

Nearest Neighbour ◽

Distance Metric ◽

Nearest Neighbours

In the k-Nearest Neighbour (kNN) algorithm, the classification of a new sample is determined by the class of its k nearest neighbours. The performance of the kNN algorithm is influenced by three main factors: (1) the distance metric used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample. Using k = 1, 3, or 5 nearest neighbours, this study uses a Genetic Algorithm (GA) to find the optimal non-Euclidean distance metric in the kNN algorithm and examines two alternative methods (Majority Rule and Bayes Rule) to derive a classification from the k nearest neighbours. This modified algorithm was evaluated on two real-world medical fraud problems. The General Practitioner (GP) database is a 2-class problem in which GPs are classified as either practising appropriately or inappropriately. The ’.Doctor-Shoppers’ database is a 5-class problem in which patients are classified according to the likelihood that they are ’doctor-shoppers’. Doctor-shoppers are patients who consult many physicians in order to obtain multiple prescriptions of drugs of addiction in excess of their own therapeutic need. In both applications, classification accuracy was improved by optimising the distance metric in the kNN algorithm. The agreement rate on the GP dataset improved from around 70% (using Euclidean distance) to 78 % (using an optimised distance metric), and from about 55% to 82% on the Doctor Shopper’s dataset. Differences in either the decision rule or the number of nearest neighbours had little or no impact on the classification performance of the kNN algorithm. The excellent performance of the kNN algorithm when the distance metric is optimised using a genetic algorithm paves the way for its application in the real world fraud detection problems faced by the Health Insurance Commission (HIC).

Download Full-text

An e-healthcare system for disease prediction using hybrid data mining technique

Journal of Modelling in Management ◽

10.1108/jm2-05-2018-0069 ◽

2019 ◽

Vol 14 (3) ◽

pp. 628-661 ◽

Cited By ~ 1

Author(s):

Bikash Kanti Sarkar ◽

Shib Sankar Sana

Keyword(s):

Data Mining ◽

False Positive Rate ◽

True Positive Rate ◽

Data Partition ◽

Data Mining Technique ◽

Data Set ◽

Content Type ◽

Positive Rate ◽

Effective Diagnosis ◽

Disease Specific

Purpose The purpose of this study is to alleviate the specified issues to a great extent. To promote patients’ health via early prediction of diseases, knowledge extraction using data mining approaches shows an integral part of e-health system. However, medical databases are highly imbalanced, voluminous, conflicting and complex in nature, and these can lead to erroneous diagnosis of diseases (i.e. detecting class-values of diseases). In literature, numerous standard disease decision support system (DDSS) have been proposed, but most of them are disease specific. Also, they usually suffer from several drawbacks like lack of understandability, incapability of operating rare cases, inefficiency in making quick and correct decision, etc. Design/methodology/approach Addressing the limitations of the existing systems, the present research introduces a two-step framework for designing a DDSS, in which the first step (data-level optimization) deals in identifying an optimal data-partition (Popt) for each disease data set and then the best training set for Popt in parallel manner. On the other hand, the second step explores a generic predictive model (integrating C4.5 and PRISM learners) over the discovered information for effective diagnosis of disease. The designed model is a generic one (i.e. not disease specific). Findings The empirical results (in terms of top three measures, namely, accuracy, true positive rate and false positive rate) obtained over 14 benchmark medical data sets (collected from https://archive.ics.uci.edu/ml) demonstrate that the hybrid model outperforms the base learners in almost all cases for initial diagnosis of the diseases. After all, the proposed DDSS may work as an e-doctor to detect diseases. Originality/value The model designed in this study is original, and the necessary parallelized methods are implemented in C on Cluster HPC machine (FUJITSU) with total 256 cores (under one Master node).

Download Full-text

Finding COVID-19 from Chest X-rays using Deep Learning on a Small Dataset

10.36227/techrxiv.12083964.v3 ◽

2020 ◽

Cited By ~ 1

Author(s):

Lawrence Hall ◽

Dmitry Goldgof ◽

Rahul Paul ◽

Gregory M. Goldgof

Keyword(s):

Bacterial Pneumonia ◽

True Positive Rate ◽

Small Data ◽

Good Resolution ◽

X Rays ◽

True Positive ◽

Data Set ◽

X Ray ◽

Negative Rate ◽

Positive Rate

esting for COVID-19 has been unable to keep up with the demand. Further, the false negative rate is projected to be as high as 30\ and test results can take some time to obtain. X-ray machines are widely available and provide images for diagnosis quickly. This paper explores how useful chest X-ray images can be in diagnosing COVID-19 disease. We have obtained 135 chest X-rays of COVID-19 and 320 chest X-rays of viral and bacterial pneumonia. A pre-trained deep convolutional neural network, Resnet50 was tuned on 102 COVID-19 cases and 102 other pneumonia cases in a 10-fold cross validation. The results were an overall accuracy of 90.7% with a COVID-19 true positive rate of 0.83 and an AUC of 0.987, Pre-trained Resnet50 and VGG16 plus our own small CNN were tuned or trained on a balanced set of COVID-19 and pneumonia chest X-rays. An ensemble of the three types of CNN classifiers was applied to a test set of 33 unseen COVID-19 and 208 pneumonia cases. The overall accuracy was 94.4% with the true positive rate for COVID-19 of 0.969 with 6% false positives for a true negative rate of 0.94 and AUC of 0.99. This preliminary study has flaws, most critically a lack of information about where in the disease process the COVID-19 cases were and the small data set size. More COVID-19 case images at good resolution will enable a better answer to the question of how useful chest X-rays can be for diagnosing COVID-19. Note an earlier version of this work inadvertently used chest X-rays of viral and bacterial pneumonia that came from a dataset of children under 5 years old and those results should be ignored.

Download Full-text

An Aggregate Model for Prognosticate Diabetic Disease using Dissimilar Feature Selections with Upright Classification Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d5318.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 7455-7458

Keyword(s):

Performance Metrics ◽

Classification Performance ◽

Nearest Neighbour ◽

Classification Methods ◽

Diabetic Mellitus ◽

Aggregate Model ◽

Accuracy Rate ◽

Data Bases ◽

Data Set ◽

Different Types

Our aims are to find the accuracy of classification with the normalisation in different types and the features in the techniques of selection on Diabetic Mellitus and the Pima Indian Diabetic dataset. Data Mining is the process of extraction. It extracts the previous unknown, valid and important information from the large amount of the data bases and can make the crucial decisions using the information. The classification methods are K-Nearest Neighbour and J48 decision tree can be applied to the data set of original and as well as the dataset with the pre-processed dataset. All the process of pre-processing can be applied to Pima Indian Diabetic Dataset to analyse the classification performance in terms of accuracy rate. The performance metrics is used to identify the accuracy classification is Recall, F-measure, Sensitivity and specificity, Precision, and Accuracy. The simulation is done by R tool.

Download Full-text

Finding COVID-19 from Chest X-rays using Deep Learning on a Small Dataset

10.36227/techrxiv.12083964.v4 ◽

2020 ◽

Cited By ~ 6

Author(s):

Lawrence Hall ◽

Dmitry Goldgof ◽

Rahul Paul ◽

Gregory M. Goldgof

Keyword(s):

False Negative ◽

True Positive Rate ◽

Small Data ◽

Good Resolution ◽

X Rays ◽

True Positive ◽

Data Set ◽

X Ray ◽

Negative Rate ◽

Positive Rate

Testing for COVID-19 has been unable to keep up with the demand. Further, the false negative rate is projected to be as high as 30% and test results can take some time to obtain. X-ray machines are widely available and provide images for diagnosis quickly. This paper explores how useful chest X-ray images can be in diagnosing COVID-19 disease. We have obtained 135 chest X-rays of COVID-19 and 320 chest X-rays of viral and bacterial pneumonia. A pre-trained deep convolutional neural network, Resnet50 was tuned on 102 COVID-19 cases and 102 other pneumonia cases in a 10-fold cross validation. The results were an overall accuracy of 89.2% with a COVID-19 true positive rate of 0.8039 and an AUC of 0.95. Pre-trained Resnet50 and VGG16 plus our own small CNN were tuned or trained on a balanced set of COVID-19 and pneumonia chest X-rays. An ensemble of the three types of CNN classifiers was applied to a test set of 33 unseen COVID-19 and 218 pneumonia cases. The overall accuracy was 91.24% with the true positive rate for COVID-19 of 0.7879 with 6.88% false positives for a true negative rate of 0.9312 and AUC of 0.94. This preliminary study has flaws, most critically a lack of information about where in the disease process the COVID-19 cases were and the small data set size. More COVID-19 case images at good resolution will enable a better answer to the question of how useful chest X-rays can be for diagnosing COVID-19.

Download Full-text

Modified Floating Search Feature Selection Based on Genetic Algorithm

MATEC Web of Conferences ◽

10.1051/matecconf/201816401023 ◽

2018 ◽

Vol 164 ◽

pp. 01023 ◽

Cited By ~ 1

Author(s):

Kanyanut Homsapaya ◽

Ohm Sornil

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Large Data ◽

Classification Performance ◽

Accurate Solution ◽

Search Technique ◽

Data Set ◽

Search Feature ◽

Optimal Set

Classification performance is adversely impacted by noisy data .Selecting features relevant to the problem is thus a critical step in classification and difficult to achieve accurate solution, especially when applied to a large data set. In this article, we propose a novel filter-based floating search technique for feature selection to select an optimal set of features for classification purposes. A genetic algorithm is utilized to increase the quality of features selected at each iteration. A criterion function is applied to choose relevant and high-quality features which can improve classification accuracy. The method is evaluated using 20 standard machine learning datasets of various sizes and complexities. Experimental results with the datasets show that the proposed method is effective and performs well in comparison with previously reported techniques.

Download Full-text