A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models

Author(s):  
Baljinder Ghotra ◽  
Shane McIntosh ◽  
Ahmed E. Hassan
2018 ◽  
Vol 97 (4) ◽  
pp. 375-377
Author(s):  
Irina V. Egorysheva

The article is devoted to the participation of the outstanding dental hygienist F. F. Erisman in the development of the Moscow low territorial sanitary organization. Under his leadership, there was carried out a large-scale study of the impact of conditions of the work and life on the health of plant workers, served as a model for similar types of sanitary-statistical research in a number of rural provinces. F. F. Erisman actively participated in the work of the sanitary organization of the Moscow gubernia Zemstvo, the creation of the first district sanitary Bureau.


2020 ◽  
Author(s):  
Jan Grau ◽  
Florian Schmidt ◽  
Marcel H. Schulz

AbstractSeveral studies suggested that transcription factor (TF) binding to DNA may be impaired or enhanced by DNA methylation. We present MeDeMo, a toolbox for TF motif analysis that combines information about DNA methylation with models capturing intra-motif dependencies. In a large-scale study using ChIP-seq data for 335 TFs, we identify novel TFs that are affected by DNA methylation. Overall, we find that CpG methylation decreases the likelihood of binding for the majority of TFs. For a considerable subset of TFs, we show that intra-motif dependencies are pivotal for accurately modelling the impact of DNA methylation on TF binding.


Author(s):  
Abhishek Bhattacharya ◽  
Radha Tamal Goswami ◽  
Kuntal Mukherjee ◽  
Nhu Gia Nguyen

Each Android application requires accumulations of permissions in installation time and they are considered as the features which can be utilized in permission-based identification of Android malwares. Recently, ensemble feature selection techniques have received increasing attention over conventional techniques in different applications. In this work, a cluster based voted ensemble voted feature selection technique combining five base wrapper approaches of R libraries is projected for identifying most prominent set of features in the predictive modeling of Android malwares. The proposed method preserves both the desirable features of an ensemble feature selector, accuracy and diversity. Moreover, in this work, five different data partitioning ratios are considered and the impact of those ratios on predictive model are measured using coefficient of determination (r-square) and root mean square error. The proposed strategy has created significant better outcome in term of the number of selected features and classification accuracy.


2021 ◽  
Vol 15 (1) ◽  
pp. 1-15
Author(s):  
Behrooz Abbaszadeh ◽  
Cesar Alexandre Domingues Teixeira ◽  
Mustapha C.E. Yagoub

Background: Because about 30% of epileptic patients suffer from refractory epilepsy, an efficient automatic seizure prediction tool is in great demand to improve their life quality. Methods: In this work, time-domain discriminating preictal and interictal features were efficiently extracted from the intracranial electroencephalogram of twelve patients, i.e., six with temporal and six with frontal lobe epilepsy. The performance of three types of feature selection methods was compared using Matthews’s correlation coefficient (MCC). Results: Kruskal Wallis, a non-parametric approach, was found to perform better than the other approaches due to a simple and less resource consuming strategy as well as maintaining the highest MCC score. The impact of dividing the electroencephalogram signals into various sub-bands was investigated as well. The highest performance of Kruskal Wallis may suggest considering the importance of univariate features like complexity and interquartile ratio (IQR), along with autoregressive (AR) model parameters and the maximum (MAX) cross-correlation to efficiently predict epileptic seizures. Conclusion: The proposed approach has the potential to be implemented on a low power device by considering a few simple time domain characteristics for a specific sub-band. It should be noted that, as there is not a great deal of literature on frontal lobe epilepsy, the results of this work can be considered promising.


2015 ◽  
Vol 3 (1) ◽  
pp. 187 ◽  
Author(s):  
Pablo Beneitone ◽  
Maria Yarosh

Deusto International Tuning Academy is undertaking a large-scale study to analyse the impact Tuning projects may have had in participating universities. More particularly, the study hopes to provide an unambiguous answer regarding the presence or absence of the implementation of a competence-based student-centred approach in the different world regions where Tuning projects have taken place. The present article focuses only on Latin America where two Tuning projects have been developed. It describes the findings of the first two stages of the study. After reporting the data, the authors argue that there is evidence of a Tuning impact in each of three intended impact domains: (1) understanding of the importance of a shift from content- to competence-based education; (2) provision of institutional support necessary to facilitate this change; and (3) appropriate teaching, learning and assessment within the general framework of the study plans and degree profiles.


2011 ◽  
Vol 2011 ◽  
pp. 1-9 ◽  
Author(s):  
G. Doquire ◽  
G. de Lannoy ◽  
D. François ◽  
M. Verleysen

Supervised and interpatient classification of heart beats is primordial in many applications requiring long-term monitoring of the cardiac function. Several classification models able to cope with the strong class unbalance and a large variety of feature sets have been proposed for this task. In practice, over 200 features are often considered, and the features retained in the final model are either chosen using domain knowledge or an exhaustive search in the feature sets without evaluating the relevance of each individual feature included in the classifier. As a consequence, the results obtained by these models can be suboptimal and difficult to interpret. In this work, feature selection techniques are considered to extract optimal feature subsets for state-of-the-art ECG classification models. The performances are evaluated on real ambulatory recordings and compared to previously reported feature choices using the same models. Results indicate that a small number of individual features actually serve the classification and that better performances can be achieved by removing useless features.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Bryce Hallmark ◽  
Jing Dong

Inclement winter weather such as snow, sleet, and freezing rain significantly impacts roadway safety. To assess the safety implications of winter weather, maintenance operations, and traffic operations, various crash frequency models have been developed. In this study, several datasets, including for weather, snowplow operations, and traffic information, were combined to develop a robust crash frequency model for winter weather conditions. When developing statistical models using such large-scale multivariate datasets, one of the challenges is to determine which explanatory variables should be included in the model. This paper presents a feature selection framework using a machine-learning algorithm known as the Boruta algorithm and exhaustive search to select a list of variables to be included in a negative binomial crash frequency model. This paper’s proposed feature selection framework generates consistent and intuitive results because the feature selection process reduces the complexity of interactions among different variables in the dataset. This enables our crash frequency model to better help agencies identify effective ways to improve roadway safety via winter maintenance operations. For example, increased plowing operations before the start of storms are associated with a decrease in crash rates. Thus, pretreatment operations can play a significant role in mitigating the impact of winter storms.


Sign in / Sign up

Export Citation Format

Share Document