scholarly journals Dysphonic Voice Pattern Analysis of Patients in Parkinson’s Disease Using Minimum Interclass Probability Risk Feature Selection and Bagging Ensemble Learning Methods

2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Yunfeng Wu ◽  
Pinnan Chen ◽  
Yuchen Yao ◽  
Xiaoquan Ye ◽  
Yugui Xiao ◽  
...  

Analysis of quantified voice patterns is useful in the detection and assessment of dysphonia and related phonation disorders. In this paper, we first study the linear correlations between 22 voice parameters of fundamental frequency variability, amplitude variations, and nonlinear measures. The highly correlated vocal parameters are combined by using the linear discriminant analysis method. Based on the probability density functions estimated by the Parzen-window technique, we propose an interclass probability risk (ICPR) method to select the vocal parameters with small ICPR values as dominant features and compare with the modified Kullback-Leibler divergence (MKLD) feature selection approach. The experimental results show that the generalized logistic regression analysis (GLRA), support vector machine (SVM), and Bagging ensemble algorithm input with the ICPR features can provide better classification results than the same classifiers with the MKLD selected features. The SVM is much better at distinguishing normal vocal patterns with a specificity of 0.8542. Among the three classification methods, the Bagging ensemble algorithm with ICPR features can identify 90.77% vocal patterns, with the highest sensitivity of 0.9796 and largest area value of 0.9558 under the receiver operating characteristic curve. The classification results demonstrate the effectiveness of our feature selection and pattern analysis methods for dysphonic voice detection and measurement.

2020 ◽  
Vol 16 (2) ◽  
pp. 155014772090523
Author(s):  
ZhenLong Li ◽  
HaoXin Wang ◽  
YaoWei Zhang ◽  
XiaoHua Zhao

A method for drunk driving detection using Feature Selection based on the Random Forest was proposed. First, driving behavior data were collected using a driving simulator at Beijing University of Technology. Second, the features were selected according to the Feature Importance in the random forest. Third, a dummy variable was introduced to encode the geometric characteristics of different roads so that drunk driving under different road conditions can be detected with the same classifier based on the random forest. Finally, the linear discriminant analysis, support vector machine, and AdaBoost classifiers were used and compared with the random forest. The accuracy, F1 score, receiver operating characteristic curve, and area under the curve value were used to evaluate the performance of the classifiers. The results show that Accelerator Depth, Speed, Distance to the Center of the Lane, Acceleration, Engine Revolution, Brake Depth, and Steering Angle have important influences on identifying the drivers’ states and can be used to detect drunk driving. Specifically, the classifiers with Accelerator Depth outperformed the other classifiers without Accelerator Depth. This means that Accelerator Depth is an important feature. Both the AdaBoost and random forest classifiers have an accuracy of 81.48%, which verified the effectiveness of the proposed method.


2021 ◽  
Vol 11 ◽  
Author(s):  
Qi Wan ◽  
Jiaxuan Zhou ◽  
Xiaoying Xia ◽  
Jianfeng Hu ◽  
Peng Wang ◽  
...  

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.


Author(s):  
Ahmed Abdullah Farid ◽  
Gamal Selim ◽  
Hatem Khater

Breast cancer is a significant health issue across the world. Breast cancer is the most widely-diagnosed cancer in women; early-stage diagnosis of disease and therapies increase patient safety. This paper proposes a synthetic model set of features focused on the optimization of the genetic algorithm (CHFS-BOGA) to forecast breast cancer. This hybrid feature selection approach combines the advantages of three filter feature selection approaches with an optimize Genetic Algorithm (OGA) to select the best features to improve the performance of the classification process and scalability. We propose OGA by improving the initial population generating and genetic operators using the results of filter approaches as some prior information with using the C4.5 decision tree classifier as a fitness function instead of probability and random selection. The authors collected available updated data from Wisconsin UCI machine learning with a total of 569 rows and 32 columns. The dataset evaluated using an explorer set of weka data mining open-source software for the analysis purpose. The results show that the proposed hybrid feature selection approach significantly outperforms the single filter approaches and principal component analysis (PCA) for optimum feature selection. These characteristics are good indicators for the return prediction. The highest accuracy achieved with the proposed system before (CHFS-BOGA) using the support vector machine (SVM) classifiers was 97.3%. The highest accuracy after (CHFS-BOGA-SVM) was 98.25% on split 70.0% train, remainder test, and 100% on the full training set. Moreover, the receiver operating characteristic (ROC) curve was equal to 1.0. The results showed that the proposed (CHFS-BOGA-SVM) system was able to accurately classify the type of breast tumor, whether malignant or benign.


Author(s):  
A. Gaspar-Cunha ◽  
F. Mendes ◽  
J. Duarte ◽  
A. Vieira ◽  
B. Ribeiro ◽  
...  

In this work a Multi-Objective Evolutionary Algorithm (MOEA) was applied for feature selection in the problem of bankruptcy prediction. This algorithm maximizes the accuracy of the classifier while keeping the number of features low. A two-objective problem, that is minimization of the number of features and accuracy maximization, was fully analyzed using the Logistic Regression (LR) and Support Vector Machines (SVM) classifiers. Simultaneously, the parameters required by both classifiers were also optimized, and the validity of the methodology proposed was tested using a database containing financial statements of 1200 medium sized private French companies. Based on extensive tests, it is shown that MOEA is an efficient feature selection approach. Best results were obtained when both the accuracy and the classifiers parameters are optimized. The proposed method can provide useful information for decision makers in characterizing the financial health of a company.


Author(s):  
Gaddam Venu Gopal ◽  
Gatram Rama Mohan Babu

Feature selection is a process of identifying relevant feature subset that leads to the machine learning algorithm in a well-defined manner. In this paper, anovel ensemble feature selection approach that comprises of Relief  Attribute Evaluation and hybrid kernel-based support vector machine (HK-SVM) approach is proposed as a feature selection method for network intrusion detection system (NIDS). A Hybrid approach along with the combination of Gaussian and Polynomial methods is used as a kernel for support vector machine (SVM). The key issue is to select a feature subset that yields good accuracy at a minimal computational cost. The proposed approach is implemented and compared with classical SVM and simple kernel. Kyoto2006+, a bench mark intrusion detection dataset,is used for experimental evaluation and then observations are drawn.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1255 ◽  
Author(s):  
Malik Yousef ◽  
Burcu Bakir-Gungor ◽  
Amhar Jabeer ◽  
Gokhan Goy ◽  
Rehman Qureshi ◽  
...  

In our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. The usefulness of SVM-RCE-R is further supported by development of maTE tool, which uses a similar approach to identify microRNA (miRNA) targets. We have now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to apply and to make it more accessible to the biomedical community. The use of SVM-RCE-R in Knime is simple and intuitive, allowing researchers to immediately begin their data analysis without having to consult an information technology specialist. The input for the Knime tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in the previous version. One of these features is a user-specific ranking function that enables the user to provide the weights of the accuracy, sensitivity, specificity, f-measure, area under curve and precision in the ranking function, allowing the user to select for greater sensitivity or greater specificity as needed. The results show that the ranking function has an impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics. This finding motivates future studies to suggest the optimal ranking function.


Sign in / Sign up

Export Citation Format

Share Document