New Online Streaming Feature Selection Based on Neighborhood Rough Set for Medical Data

Dingfei Lei; Pei Liang; Junhua Hu; Yuan Yuan

doi:10.3390/sym12101635

New Online Streaming Feature Selection Based on Neighborhood Rough Set for Medical Data

Symmetry ◽

10.3390/sym12101635 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1635

Author(s):

Dingfei Lei ◽

Pei Liang ◽

Junhua Hu ◽

Yuan Yuan

Keyword(s):

Feature Selection ◽

Rough Set ◽

Rough Set Theory ◽

Imbalanced Data ◽

Computation Method ◽

Feature Subset ◽

Neighborhood Rough Set ◽

Neighborhood Relation ◽

Neighborhood Relations ◽

Online Streaming

Not all features in many real-world applications, such as medical diagnosis and fraud detection, are available from the start. They are formed and individually flow over time. Online streaming feature selection (OSFS) has recently attracted much attention due to its ability to select the best feature subset with growing features. Rough set theory is widely used as an effective tool for feature selection, specifically the neighborhood rough set. However, the two main neighborhood relations, namely k-neighborhood and neighborhood, cannot efficiently deal with the uneven distribution of data. The traditional method of dependency calculation does not take into account the structure of neighborhood covering. In this study, a novel neighborhood relation combined with k-neighborhood and neighborhood relations is initially defined. Then, we propose a weighted dependency degree computation method considering the structure of the neighborhood relation. In addition, we propose a new OSFS approach named OSFS-KW considering the challenge of learning class imbalanced data. OSFS-KW has no adjustable parameters and pretraining requirements. The experimental results on 19 datasets demonstrate that OSFS-KW not only outperforms traditional methods but, also, exceeds the state-of-the-art OSFS approaches.

Download Full-text

Intuitionistic Fuzzy Neighborhood Rough Set Model for Feature Selection

International Journal of Fuzzy System Applications ◽

10.4018/ijfsa.2018040104 ◽

2018 ◽

Vol 7 (2) ◽

pp. 75-84 ◽

Cited By ~ 3

Author(s):

Shivam Shreevastava ◽

Anoop Kumar Tiwari ◽

Tanmoy Som

Keyword(s):

Feature Selection ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Continuous Data ◽

Feature Subset ◽

Data Set ◽

Intuitionistic Fuzzy ◽

Neighborhood Models ◽

Neighborhood Rough Set

Feature selection is one of the widely used pre-processing techniques to deal with large data sets. In this context, rough set theory has been successfully implemented for feature selection of discrete data set but in case of continuous data set it requires discretization, which may cause information loss. Fuzzy rough set theory approaches have also been used successfully to resolve this issue as it can handle continuous data directly. Moreover, almost all feature selection techniques are used to handle homogeneous data set. In this article, the center of attraction is on heterogeneous feature subset reduction. A novel intuitionistic fuzzy neighborhood models have been proposed by combining intuitionistic fuzzy sets and neighborhood rough set models by taking an appropriate pair of lower and upper approximations and generalize it for feature selection, supported with theory and its validation. An appropriate algorithm along with application to a data set has been added.

Download Full-text

Rough Set-Based Feature Selection

Rough Computing ◽

10.4018/978-1-59904-552-8.ch003 ◽

2011 ◽

pp. 70-107 ◽

Cited By ~ 17

Author(s):

Richard Jensen

Keyword(s):

Feature Selection ◽

Rough Set ◽

Rough Sets ◽

Rough Set Theory ◽

Hill Climbing ◽

Feature Subset ◽

Selection Methods ◽

Data Dependencies ◽

Additional Information ◽

Related Feature

Feature selection aims to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. Rough set theory (RST) has been used as such a tool with much success. RST enables the discovery of data dependencies and the reduction of the number of attributes contained in a dataset using the data alone, requiring no additional information. This chapter describes the fundamental ideas behind RST-based approaches and reviews related feature selection methods that build on these ideas. Extensions to the traditional rough set approach are discussed, including recent selection methods based on tolerance rough sets, variable precision rough sets and fuzzy-rough sets. Alternative search mechanisms are also highly important in rough set feature selection. The chapter includes the latest developments in this area, including RST strategies based on hill-climbing, genetic algorithms and ant colony optimization.

Download Full-text

Online streaming feature selection using adapted Neighborhood Rough Set

Information Sciences ◽

10.1016/j.ins.2018.12.074 ◽

2019 ◽

Vol 481 ◽

pp. 258-279 ◽

Cited By ~ 11

Author(s):

Peng Zhou ◽

Xuegang Hu ◽

Peipei Li ◽

Xindong Wu

Keyword(s):

Feature Selection ◽

Rough Set ◽

Neighborhood Rough Set ◽

Online Streaming

Download Full-text

Data Reduction with Rough Sets

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch087 ◽

2011 ◽

pp. 556-560 ◽

Cited By ~ 1

Author(s):

Richard Jensen

Keyword(s):

Feature Selection ◽

Knowledge Discovery ◽

Set Theory ◽

Rough Set ◽

Data Reduction ◽

Rough Sets ◽

Rough Set Theory ◽

Feature Subset ◽

Data Dependencies ◽

Additional Information

Data reduction is an important step in knowledge discovery from data. The high dimensionality of databases can be reduced using suitable techniques, depending on the requirements of the data mining processes. These techniques fall in to one of the following categories: those that transform the underlying meaning of the data features and those that are semantics-preserving. Feature selection (FS) methods belong to the latter category, where a smaller set of the original features is chosen based on a subset evaluation function. The process aims to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In knowledge discovery, feature selection methods are particularly desirable as they facilitate the interpretability of the resulting knowledge. For this, rough set theory has been successfully used as a tool that enables the discovery of data dependencies and the reduction of the number of features contained in a dataset using the data alone, while requiring no additional information.

Download Full-text

A Fast Feature Selection Algorithm by Accelerating Computation of Fuzzy Rough Set-Based Information Entropy

Entropy ◽

10.3390/e20100788 ◽

2018 ◽

Vol 20 (10) ◽

pp. 788 ◽

Cited By ~ 4

Author(s):

Xiao Zhang ◽

Xia Liu ◽

Yanyan Yang

Keyword(s):

Feature Selection ◽

Set Theory ◽

Rough Set ◽

Information Entropy ◽

Fast Algorithm ◽

Rough Set Theory ◽

Computer Applications ◽

Feature Subset ◽

Fuzzy Rough Set ◽

Effective Measure

The information entropy developed by Shannon is an effective measure of uncertainty in data, and the rough set theory is a useful tool of computer applications to deal with vagueness and uncertainty data circumstances. At present, the information entropy has been extensively applied in the rough set theory, and different information entropy models have also been proposed in rough sets. In this paper, based on the existing feature selection method by using a fuzzy rough set-based information entropy, a corresponding fast algorithm is provided to achieve efficient implementation, in which the fuzzy rough set-based information entropy taking as the evaluation measure for selecting features is computed by an improved mechanism with lower complexity. The essence of the acceleration algorithm is to use iterative reduced instances to compute the lambda-conditional entropy. Numerical experiments are further conducted to show the performance of the proposed fast algorithm, and the results demonstrate that the algorithm acquires the same feature subset to its original counterpart, but with significantly less time.

Download Full-text

Online streaming feature selection based on neighborhood rough set

Applied Soft Computing ◽

10.1016/j.asoc.2021.108025 ◽

2021 ◽

pp. 108025

Author(s):

Shuangjie Li ◽

Kaixiang Zhang ◽

Yali Li ◽

Shuqin Wang ◽

Shaoqiang Zhang

Keyword(s):

Feature Selection ◽

Rough Set ◽

Neighborhood Rough Set ◽

Online Streaming

Download Full-text

Efficient feature selection for inconsistent heterogeneous information systems based on a grey wolf optimizer and rough set theory

Soft Computing ◽

10.1007/s00500-021-06375-z ◽

2021 ◽

Author(s):

Ahmed Hamed ◽

Hamed Nassar

Keyword(s):

Feature Selection ◽

Information Systems ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Grey Wolf Optimizer ◽

Grey Wolf ◽

Heterogeneous Information ◽

Selection For ◽

Heterogeneous Information Systems

Download Full-text

Redefining core preliminary concepts of classic Rough Set Theory for feature selection

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2017.08.003 ◽

2017 ◽

Vol 65 ◽

pp. 375-387 ◽

Cited By ~ 5

Author(s):

Muhammad Summair Raza ◽

Usman Qamar

Keyword(s):

Feature Selection ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory

Download Full-text

An Ensemble Classification Method for High-Dimensional Data Using Neighborhood Rough Set

Complexity ◽

10.1155/2021/8358921 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Jing Zhang ◽

Guang Lu ◽

Jiaquan Li ◽

Chuanwen Li

Keyword(s):

Feature Selection ◽

Rough Set ◽

Small Sample Size ◽

High Dimensional Data ◽

Classification Performance ◽

Small Sample ◽

Ensemble Classification ◽

High Dimensional ◽

Sample Classification ◽

Neighborhood Rough Set

Mining useful knowledge from high-dimensional data is a hot research topic. Efficient and effective sample classification and feature selection are challenging tasks due to high dimensionality and small sample size of microarray data. Feature selection is necessary in the process of constructing the model to reduce time and space consumption. Therefore, a feature selection model based on prior knowledge and rough set is proposed. Pathway knowledge is used to select feature subsets, and rough set based on intersection neighborhood is then used to select important feature in each subset, since it can select features without redundancy and deals with numerical features directly. In order to improve the diversity among base classifiers and the efficiency of classification, it is necessary to select part of base classifiers. Classifiers are grouped into several clusters by k-means clustering using the proposed combination distance of Kappa-based diversity and accuracy. The base classifier with the best classification performance in each cluster will be selected to generate the final ensemble model. Experimental results on three Arabidopsis thaliana stress response datasets showed that the proposed method achieved better classification performance than existing ensemble models.

Download Full-text

Online early terminated streaming feature selection based on Rough Set theory

Applied Soft Computing ◽

10.1016/j.asoc.2021.107993 ◽

2021 ◽

pp. 107993

Author(s):

Peng Zhou ◽

Peipei Li ◽

Shu Zhao ◽

Yanping Zhang

Keyword(s):

Feature Selection ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory

Download Full-text