correlation based feature selection
Recently Published Documents


TOTAL DOCUMENTS

130
(FIVE YEARS 31)

H-INDEX

12
(FIVE YEARS 0)

2021 ◽  
Vol 14 (1) ◽  
pp. 16
Author(s):  
Chandrashekar Jatoth ◽  
Rishabh Jain ◽  
Ugo Fiore ◽  
Subrahmanyam Chatharasupalli

Although the blockchain technology is gaining a widespread adoption across multiple sectors, its most popular application is in cryptocurrency. The decentralized and anonymous nature of transactions in a cryptocurrency blockchain has attracted a multitude of participants, and now significant amounts of money are being exchanged by the day. This raises the need of analyzing the blockchain to discover information related to the nature of participants in transactions. This study focuses on the identification for risky and non-risky blocks in a blockchain. In this paper, the proposed approach is to use ensemble learning with or without feature selection using correlation-based feature selection. Ensemble learning yielded good results in the experiments, but class-wise analysis reveals that ensemble learning with feature selection improves even further. After training Machine Learning classifiers on the dataset, we observe an improvement in accuracy of 2–3% and in F-score of 7–8%.


Author(s):  
Samreen Naeem ◽  
Aqib Ali ◽  
Jamal Abdul Nasir ◽  
Arooj Fatima ◽  
Farrukh Jamal ◽  
...  

The purpose of this learning is to detect the Corn Seed Fusarium Disease using Hybrid Feature Space and Conventional machine learning (ML) approaches. A novel machine learning approach is employed for the classification of a total of six types of corn seed are collected which contain Infected Fusarium (moniliforme, graminearum, gibberella, verticillioides, kernel) as well as healthy corn seed, based on a multi-feature dataset, which is the grouping of geometric, texture and histogram features extracted from digital images. For each corn seed image, a total of twenty-five multi-features have been developed on every area of interest (AOI), sizes (50 × 50), (100 × 100), (150 × 150), and (200 × 200). A total of seven optimized features were selected by using a machine learning-based algorithm named “Correlation-based Feature Selection”. For experimentation, “Random forest”, “BayesNet” and “LogitBoost” have been employed using an optimized multi-feature user-supplied dataset divided with 70% training and 30 % testing. A comparative analysis of three ML classifiers RF, BN, and LB have been used and a considerably very high classification ratio of 96.67 %, 97.22 %, and 97.78 % have been achieved respectively when the AOI size (200×200) have been deployed to the classifiers.


2021 ◽  
Vol 11 (23) ◽  
pp. 11400
Author(s):  
Andra-Maria Mircea-Vicoveanu ◽  
Elena Rezuș ◽  
Florin Leon ◽  
Silvia Curteanu

This study is based on the consideration that the patients with rheumatoid arthritis and ankylosing spondylitis undergoing biological therapy have a higher risk of developing tuberculosis. The QuantiFERON-TB Gold test result was the output of the models and a series of features related to the patients and their treatments were chosen as inputs. A distribution of patients by gender and biological therapy, followed at the time of inclusion in the study, and at the end of the study, is made for both rheumatoid arthritis and ankylosing spondylitis. A series of classification algorithms (random forest, nearest neighbor, k-nearest neighbors, C4.5 decision trees, non-nested generalized exemplars, and support vector machines) and attribute selection algorithms (ReliefF, InfoGain, and correlation-based feature selection) were successfully applied. Useful information was obtained regarding the influence of biological and classical treatments on tuberculosis risk, and most of them agreed with medical studies.


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2984
Author(s):  
Masurah Mohamad ◽  
Ali Selamat ◽  
Ondrej Krejcar ◽  
Ruben Gonzalez Crespo ◽  
Enrique Herrera-Viedma ◽  
...  

This study proposes an alternate data extraction method that combines three well-known feature selection methods for handling large and problematic datasets: the correlation-based feature selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods. This study aims to enhance the classifier’s performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks. Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve the computational time complexity and increase the classification accuracy. Several datasets with various characteristics and volumes were used in the experimental process to evaluate the proposed method’s credibility. The method’s performance was validated using standard evaluation measures and benchmarked with other established methods such as deep learning (DL). Overall, the proposed work proved that it could assist the classifier in returning a significant result, with an accuracy rate of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM), which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical result indicates that the proposed method is an alternative extraction tool for those with difficulties acquiring expensive big data analysis tools and those who are new to the data analysis field.


2021 ◽  
Author(s):  
Karna Vishnu Vardhana Reddy ◽  
Irraivan Elamvazuthi ◽  
Azrina Abd Aziz ◽  
Sivajothi Paramasivam ◽  
Hui Na Chua ◽  
...  

2021 ◽  
Author(s):  
Ahmed Ali Dawud ◽  
Bheema Lingaiah ◽  
Towfik Jemal

Abstract Background: Now a day, cardiovascular diseases have been a major cause of death in the world. The heart sound is still the primary tool used for screening and diagnosing many pathological conditions of the human heart. The abnormality in the heart sounds starts appearing much earlier than the symptoms of the disease. In this study, the Phonocardiography signal has been studied and classified into three classes, namely normal signal, murmur signal and extra sound signal. A total of 15 features from different domains have been extracted and then reduced to 7 features. The features have been selected on the basis of correlation based feature selection technique. The selected features are used to classify the signal into the predefined classes using multi- class SVM classifier. The performance of the proposed denoising algorithm is evaluated using the signal to noise ratio, percentage root means square difference, and root mean square error. For this work a publically available database for researchers, Partnership Among South Carolina Academic Libraries (PASCAL) and MATLAB 2018a was used to develop the proposed algorithm.Results: Our experimental result shows that the 4th level of decomposition for the Db10 wavelets shows the highest SNR values when using the soft and hard thresholding. The overall accuracy, Sensitivity and Specificity of the developed algorithm is 97.96%, 97.92 % and of 98.0% respectively.Conclusion: even if the proposed algorithm is useful for murmur detection mainly valve-related diseases and the efficiency of the proposed study is increased, future work will intend to generalize the algorithm by using hybrid classifiers on a larger dataset. Since all experiments used the PASCAL datasets, additional experiments will be needed using new datasets to be implemented using the latest mobile phones which can work as an electronic stethoscope or phonocardiogram. In addition, the case of continuous murmur and types of murmur has been included for classification in further studies.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 6997
Author(s):  
Mahsa Sadat Afzali Afzali Arani ◽  
Diego Elias Costa ◽  
Emad Shihab

Inertial sensors are widely used in the field of human activity recognition (HAR), since this source of information is the most informative time series among non-visual datasets. HAR researchers are actively exploring other approaches and different sources of signals to improve the performance of HAR systems. In this study, we investigate the impact of combining bio-signals with a dataset acquired from inertial sensors on recognizing human daily activities. To achieve this aim, we used the PPG-DaLiA dataset consisting of 3D-accelerometer (3D-ACC), electrocardiogram (ECG), photoplethysmogram (PPG) signals acquired from 15 individuals while performing daily activities. We extracted hand-crafted time and frequency domain features, then, we applied a correlation-based feature selection approach to reduce the feature-set dimensionality. After introducing early fusion scenarios, we trained and tested random forest models with subject-dependent and subject-independent setups. Our results indicate that combining features extracted from the 3D-ACC signal with the ECG signal improves the classifier’s performance F1-scores by 2.72% and 3.00% (from 94.07% to 96.80%, and 83.16% to 86.17%) for subject-dependent and subject-independent approaches, respectively.


Sign in / Sign up

Export Citation Format

Share Document