Ensemble and Quick Strategy for Searching Reduct: A Hybrid Mechanism

Wangwang Yan; Yan Chen; Jinlong Shi; Hualong Yu; Xibei Yang

doi:10.3390/info12010025

Ensemble and Quick Strategy for Searching Reduct: A Hybrid Mechanism

Information ◽

10.3390/info12010025 ◽

2021 ◽

Vol 12 (1) ◽

pp. 25

Author(s):

Wangwang Yan ◽

Yan Chen ◽

Jinlong Shi ◽

Hualong Yu ◽

Xibei Yang

Keyword(s):

Rough Set ◽

High Stability ◽

State Of The Art ◽

Attribute Reduction ◽

Experimental Results ◽

Data Sets ◽

Time Consumption ◽

Fitness Functions ◽

Hybrid Mechanism

Attribute reduction is commonly referred to as the key topic in researching rough set. Concerning the strategies for searching reduct, though various heuristics based forward greedy searchings have been developed, most of them were designed for pursuing one and only one characteristic which is closely related to the performance of reduct. Nevertheless, it is frequently expected that a justifiable searching should explicitly involves three main characteristics: (1) the process of obtaining reduct with low time consumption; (2) generate reduct with high stability; (3) acquire reduct with competent classification ability. To fill such gap, a hybrid based searching mechanism is designed, which takes the above characteristics into account. Such a mechanism not only adopts multiple fitness functions to evaluate the candidate attributes, but also queries the distance between attributes for determining whether two or more attributes can be added into the reduct simultaneously. The former may be useful in deriving reduct with higher stability and competent classification ability, and the latter may contribute to the lower time consumption of deriving reduct. By comparing with 5 state-of-the-art algorithms for searching reduct, the experimental results over 20 UCI data sets demonstrate the effectiveness of our new mechanism. This study suggests a new trend of attribute reduction for achieving a balance among various characteristics.

Download Full-text

Neighborhood Attribute Reduction: A Multicriterion Strategy Based on Sample Selection

Information ◽

10.3390/info9110282 ◽

2018 ◽

Vol 9 (11) ◽

pp. 282 ◽

Cited By ~ 2

Author(s):

Yuan Gao ◽

Xiangjian Chen ◽

Xibei Yang ◽

Pingxin Wang

Keyword(s):

Rough Set ◽

Heuristic Algorithm ◽

Sample Selection ◽

Attribute Reduction ◽

Classification Performance ◽

Multiple Criteria ◽

Experimental Results ◽

Time Consumption ◽

Redundant Data ◽

Selection For

In the rough-set field, the objective of attribute reduction is to regulate the variations of measures by reducing redundant data attributes. However, most of the previous concepts of attribute reductions were designed by one and only one measure, which indicates that the obtained reduct may fail to meet the constraints given by other measures. In addition, the widely used heuristic algorithm for computing a reduct requires to scan all samples in data, and then time consumption may be too high to be accepted if the size of the data is too large. To alleviate these problems, a framework of attribute reduction based on multiple criteria with sample selection is proposed in this paper. Firstly, cluster centroids are derived from data, and then samples that are far away from the cluster centroids can be selected. This step completes the process of sample selection for reducing data size. Secondly, multiple criteria-based attribute reduction was designed, and the heuristic algorithm was used over the selected samples for computing reduct in terms of multiple criteria. Finally, the experimental results over 12 UCI datasets show that the reducts obtained by our framework not only satisfy the constraints given by multiple criteria, but also provide better classification performance and less time consumption.

Download Full-text

Combined Accelerator for Attribute Reduction: A Sample Perspective

Mathematical Problems in Engineering ◽

10.1155/2020/2350627 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Yan Chen ◽

Jingjing Song ◽

Keyu Liu ◽

Yaojin Lin ◽

Xibei Yang

Keyword(s):

Rough Set ◽

Attribute Reduction ◽

Data Sets ◽

Multiple Views ◽

Time Consumption ◽

Elapsed Time ◽

Rough Approximation ◽

Neighborhood Rough Set ◽

Speed Up ◽

Neighborhood Relation

In the field of neighborhood rough set, attribute reduction is considered as a key topic. Neighborhood relation and rough approximation play crucial roles in the process of obtaining the reduct. Presently, many strategies have been proposed to accelerate such process from the viewpoint of samples. However, these methods speed up the process of obtaining the reduct only from binary relation or rough approximation, and then the obtained results in time consumption may not be fully improved. To fill such a gap, a combined acceleration strategy based on compressing the scanning space of both neighborhood and lower approximation is proposed, which aims to further reduce the time consumption of obtaining the reduct. In addition, 15 UCI data sets have been selected, and the experimental results show us the following: (1) our proposed approach significantly reduces the elapsed time of obtaining the reduct; (2) compared with previous approaches, our combined acceleration strategy will not change the result of the reduct. This research suggests a new trend of attribute reduction using the multiple views.

Download Full-text

Pseudolabel Decision-Theoretic Rough Set

Mathematical Problems in Engineering ◽

10.1155/2019/6810796 ◽

2019 ◽

Vol 2019 ◽

pp. 1-16 ◽

Cited By ~ 2

Author(s):

Xun Wang ◽

Wendong Zhang ◽

Dun Liu ◽

Hualong Yu ◽

Xibei Yang ◽

...

Keyword(s):

Data Analysis ◽

Rough Set ◽

Heuristic Algorithm ◽

Rough Sets ◽

Attribute Reduction ◽

Experimental Results ◽

Data Sets ◽

Binary Relations ◽

Sensitivity Problem ◽

Neighborhood Relation

In decision-theoretic rough set (DTRS), the decision costs are used to generate the thresholds for characterizing the probabilistic approximations. Similar to other rough sets, many generalized DTRS can also be formed by using different binary relations. Nevertheless, it should be noticed that most of the processes for calculating binary relations do not take the labels of samples into account, which may lead to the lower discrimination; for example, samples with different labels are regarded as indistinguishable. To fill such gap, the main contribution of this paper is to propose a pseudolabel strategy for constructing new DTRS. Firstly, a pseudolabel neighborhood relation is presented, which can differentiate samples by not only the neighborhood technique but also the pseudolabels of samples. Immediately, the pseudolabel neighborhood decision-theoretic rough set (PLNDTRS) can be constructed. Secondly, the problem of attribute reduction is explored, which aims to further reduce the PLNDTRS related decision costs. A heuristic algorithm is also designed to find such reduct. Finally, the clustering technique is employed to generate the pseudolabels of samples; the experimental results over 15 UCI data sets tell us that PLNDTRS is superior to DTRS without using pseudolabels because the former can generate lower decision costs. Moreover, the proposed heuristic algorithm is also effective in providing satisfied reducts. This study suggests new trends concerning cost sensitivity problem in rough data analysis.

Download Full-text

The Incremental Knowledge Acquisition Based on Hash Algorithm

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488516500173 ◽

2016 ◽

Vol 24 (03) ◽

pp. 347-366 ◽

Cited By ~ 3

Author(s):

Qing-Hua Zhang ◽

Long-Yang Yao ◽

Guan-Sheng Zhang ◽

Yu-Ke Xin

Keyword(s):

Knowledge Acquisition ◽

Set Theory ◽

Rough Set ◽

Granular Computing ◽

Rough Set Theory ◽

Attribute Reduction ◽

Algorithm Analysis ◽

Data Sets ◽

Hash Algorithm ◽

Acquisition Method

In this paper, a new incremental knowledge acquisition method is proposed based on rough set theory, decision tree and granular computing. In order to effectively process dynamic data, describing the data by rough set theory, computing equivalence classes and calculating positive region with hash algorithm are analyzed respectively at first. Then, attribute reduction, value reduction and the extraction of rule set by hash algorithm are completed efficiently. Finally, for each new additional data, the incremental knowledge acquisition method is proposed and used to update the original rules. Both algorithm analysis and experiments show that for processing the dynamic information systems, compared with the traditional algorithms and the incremental knowledge acquisition algorithms based on granular computing, the time complexity of the proposed algorithm is lower due to the efficiency of hash algorithm and also this algorithm is more effective when it is used to deal with the huge data sets.

Download Full-text

Research of Improved Attribute Reduction Algorithm Based on Data Mining of Rough Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2120 ◽

2014 ◽

Vol 644-650 ◽

pp. 2120-2123 ◽

Cited By ~ 2

Author(s):

De Zhi An ◽

Guang Li Wu ◽

Jun Lu

Keyword(s):

Data Mining ◽

Rough Set ◽

Rough Set Theory ◽

Attribute Reduction ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Reduction Algorithm ◽

The Core ◽

Rules Extraction

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.

Download Full-text

A Heuristic Attribute Reduction Based on Multi-Granularity Rough Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.973 ◽

2014 ◽

Vol 513-517 ◽

pp. 973-977

Author(s):

Zhi Li Pei ◽

Jian Hong Qi ◽

Li Sha Liu ◽

Qing Hu Wang ◽

Ming Yang Jiang ◽

...

Keyword(s):

Rough Set ◽

Classification Accuracy ◽

Attribute Reduction ◽

Experimental Results ◽

Attribute Importance

In 2012, Wang Zuofei built up granularity-function and applied it to the measure of attribute importance and attribute reduction. On this basis, granularity-function based upon pessimistic and optimistic multi-granularity rough set is constructed. It is applied to the calculation of attribute importance and attribute reduction. According to the experimental results, the method can reduce the dimension of features and obviously improve the classification accuracy and efficiency.

Download Full-text

About a Fuzzy Distance between Two Fuzzy Partitions and Application in Attribute Reduction Problem

Cybernetics and Information Technologies ◽

10.1515/cait-2016-0064 ◽

2016 ◽

Vol 16 (4) ◽

pp. 13-28 ◽

Cited By ~ 1

Author(s):

Cao Chinh Nghia ◽

Demetrovics Janos ◽

Nguyen Long Giang ◽

Vu Duc Thi

Keyword(s):

Rough Set ◽

Rough Set Theory ◽

Attribute Reduction ◽

Data Sets ◽

Theory Approach ◽

Fuzzy Rough Set ◽

Fuzzy Distance ◽

Reduction Methods ◽

Fuzzy Partitions ◽

Decision Tables

Abstract According to traditional rough set theory approach, attribute reduction methods are performed on the decision tables with the discretized value domain, which are decision tables obtained by discretized data methods. In recent years, researches have proposed methods based on fuzzy rough set approach to solve the problem of attribute reduction in decision tables with numerical value domain. In this paper, we proposeafuzzy distance between two partitions and an attribute reduction method in numerical decision tables based on proposed fuzzy distance. Experiments on data sets show that the classification accuracy of proposed method is more efficient than the ones based fuzzy entropy.

Download Full-text

Authorship Attribution with Topic Models

Computational Linguistics ◽

10.1162/coli_a_00173 ◽

2014 ◽

Vol 40 (2) ◽

pp. 269-310 ◽

Cited By ~ 31

Author(s):

Yanir Seroussi ◽

Ingrid Zukerman ◽

Fabian Bohnert

Keyword(s):

State Of The Art ◽

Topic Models ◽

Experimental Results ◽

Authorship Attribution ◽

Data Sets ◽

New Model ◽

On Line ◽

Art Performance

Authorship attribution deals with identifying the authors of anonymous texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to texts generated by on-line users, such as e-mails and blogs. Authorship attribution of such on-line texts is a more challenging task than traditional authorship attribution, because such texts tend to be short, and the number of candidate authors is often larger than in traditional settings. We address this challenge by using topic models to obtain author representations. In addition to exploring novel ways of applying two popular topic models to this task, we test our new model that projects authors and documents to two disjoint topic spaces. Utilizing our model in authorship attribution yields state-of-the-art performance on several data sets, containing either formal texts written by a few authors or informal texts generated by tens to thousands of on-line users. We also present experimental results that demonstrate the applicability of topical author representations to two other problems: inferring the sentiment polarity of texts, and predicting the ratings that users would give to items such as movies.

Download Full-text

Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500037 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550003 ◽

Cited By ~ 1

Author(s):

Armin Daneshpazhouh ◽

Ashkan Sami

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Fuzzy Clustering ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Real World Applications

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.

Download Full-text

Restart and Random Walk in Local Search for Maximum Vertex Weight Cliques with Evaluations in Clustering Aggregation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/87 ◽

2017 ◽

Cited By ~ 4

Author(s):

Yi Fan ◽

Nan Li ◽

Chengqian Li ◽

Zongjie Ma ◽

Longin Jan Latecki ◽

...

Keyword(s):

Random Walk ◽

Local Search ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

Data Sets ◽

Vertex Weight ◽

Real World Applications ◽

Clustering Aggregation

The Maximum Vertex Weight Clique (MVWC) problem is NP-hard and also important in real-world applications. In this paper we propose to use the restart and the random walk strategies to improve local search for MVWC. If a solution is revisited in some particular situation, the search will restart. In addition, when the local search has no other options except dropping vertices, it will use random walk. Experimental results show that our solver outperforms state-of-the-art solvers in DIMACS and finds a new best-known solution. Also it is the unique solver which is comparable with state-of-the-art methods on both BHOSLIB and large crafted graphs. Furthermore we evaluated our solver in clustering aggregation. Experimental results on a number of real data sets demonstrate that our solver outperforms the state-of-the-art for solving the derived MVWC problem and helps improve the final clustering results.

Download Full-text