Software Fault Proneness Prediction with Group Lasso Regression: On Factors that Affect Classification Performance

Author(s):  
Katerina Goseva-Popstojanova ◽  
Mohammad Ahmad ◽  
Yasser Alshehri
2018 ◽  
Vol 14 (5) ◽  
pp. 530-539 ◽  
Author(s):  
Gaia T Koster ◽  
T Truc My Nguyen ◽  
Erik W van Zwet ◽  
Bjarty L Garcia ◽  
Hannah R Rowling ◽  
...  

Background A clinical large anterior vessel occlusion (LAVO)-prediction scale could reduce treatment delays by allocating intra-arterial thrombectomy (IAT)-eligible patients directly to a comprehensive stroke center. Aim To subtract, validate and compare existing LAVO-prediction scales, and develop a straightforward decision support tool to assess IAT-eligibility. Methods We performed a systematic literature search to identify LAVO-prediction scales. Performance was compared in a prospective, multicenter validation cohort of the Dutch acute Stroke study (DUST) by calculating area under the receiver operating curves (AUROC). With group lasso regression analysis, we constructed a prediction model, incorporating patient characteristics next to National Institutes of Health Stroke Scale (NIHSS) items. Finally, we developed a decision tree algorithm based on dichotomized NIHSS items. Results We identified seven LAVO-prediction scales. From DUST, 1316 patients (35.8% LAVO-rate) from 14 centers were available for validation. FAST-ED and RACE had the highest AUROC (both >0.81, p < 0.01 for comparison with other scales). Group lasso analysis revealed a LAVO-prediction model containing seven NIHSS items (AUROC 0.84). With the GACE (Gaze, facial Asymmetry, level of Consciousness, Extinction/inattention) decision tree, LAVO is predicted (AUROC 0.76) for 61% of patients with assessment of only two dichotomized NIHSS items, and for all patients with four items. Conclusion External validation of seven LAVO-prediction scales showed AUROCs between 0.75 and 0.83. Most scales, however, appear too complex for Emergency Medical Services use with prehospital validation generally lacking. GACE is the first LAVO-prediction scale using a simple decision tree as such increasing feasibility, while maintaining high accuracy. Prehospital prospective validation is planned.


2020 ◽  
Vol 19 (03) ◽  
pp. 2040009
Author(s):  
Abhijeet R Patil ◽  
Bong-Jin Choi ◽  
Sangjin Kim

The high-throughput correlated DNA methylation (DNAmeth) dataset generated from Illumina Infinium Human Methylation 27 (IIHM 27K) BeadChip assay. In the DNAmeth data, there are several CpG sites for every gene, and these grouped CpG sites are highly correlated. Most of the current filtering-based ranking (FBR) methods do not consider the group correlation structures. Obtaining the significant features with the FBR methods and applying these features to the classifiers to attain the best classification accuracy in highly correlated DNAmeth data is a challenging task. In this research, we introduce a resampling of group least absolute shrinkage and selection operator (glasso) FBR method capable of ignoring the unrelated features in the data considering the group correlation among the features. The various classifiers, such as random forests (RF), Naive Bayes (NB), and support vector machines (SVM) with the significant CpGs obtained from the proposed resampling of group lasso-based ranking (RGLR) method helped to boost the classification accuracy. Through simulated and experimental prostate DNAmeth data, we showed that higher performance of accuracy, sensitivity, specificity, and geometric mean is achieved by ignoring the unimportant CpG sites through the RGLR method.


Sign in / Sign up

Export Citation Format

Share Document