Mutual information based input feature selection for classification problems

2012 ◽  
Vol 54 (1) ◽  
pp. 691-698 ◽  
Author(s):  
Shuang Cang ◽  
Hongnian Yu
Author(s):  
M. Vidyasagar

The objectives of this Perspective paper are to review some recent advances in sparse feature selection for regression and classification, as well as compressed sensing, and to discuss how these might be used to develop tools to advance personalized cancer therapy. As an illustration of the possibilities, a new algorithm for sparse regression is presented and is applied to predict the time to tumour recurrence in ovarian cancer. A new algorithm for sparse feature selection in classification problems is presented, and its validation in endometrial cancer is briefly discussed. Some open problems are also presented.


2018 ◽  
Vol 45 (1) ◽  
pp. 53-67 ◽  
Author(s):  
Néstor Barraza ◽  
Sérgio Moro ◽  
Marcelo Ferreyra ◽  
Adolfo de la Peña

Feature selection is a highly relevant task in any data-driven knowledge discovery project. The present research focuses on analysing the advantages and disadvantages of using mutual information (MI) and data-based sensitivity analysis (DSA) for feature selection in classification problems, by applying both to a bank telemarketing case. A logistic regression model is built on the tuned set of features identified by each of the two techniques as the most influencing set of features on the success of a telemarketing contact, in a total of 13 features for MI and 9 for DSA. The latter performs better for lower values of false positives while the former is slightly better for a higher false-positive ratio. Thus, MI becomes a better choice if the intention is reducing slightly the cost of contacts without risking losing a high number of successes. However, DSA achieved good prediction results with less features.


Sign in / Sign up

Export Citation Format

Share Document