Identifying and Fusing Duplicate Features for Data Mining
Keyword(s):
This work addresses the problem of identifying and fusing duplicate features in machine learning datasets. Our goal is to evaluate the hypothesis that fusing duplicate features can improve the predictive power of the data while reducing training time. We propose a simple method for duplicate detection and fusion based on a small set of features. An evaluation comparing the duplicate detection against a manually generated ground truth obtained F1 of 0.91. Then,the effects of fusion were measured on a mortality prediction test. The results were inferior to the ones obtained with the original dataset. Thus we concluded that the investigated hypothesis does not hold.