scholarly journals Ensemble and Quick Strategy for Searching Reduct: A Hybrid Mechanism

Information ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 25
Author(s):  
Wangwang Yan ◽  
Yan Chen ◽  
Jinlong Shi ◽  
Hualong Yu ◽  
Xibei Yang

Attribute reduction is commonly referred to as the key topic in researching rough set. Concerning the strategies for searching reduct, though various heuristics based forward greedy searchings have been developed, most of them were designed for pursuing one and only one characteristic which is closely related to the performance of reduct. Nevertheless, it is frequently expected that a justifiable searching should explicitly involves three main characteristics: (1) the process of obtaining reduct with low time consumption; (2) generate reduct with high stability; (3) acquire reduct with competent classification ability. To fill such gap, a hybrid based searching mechanism is designed, which takes the above characteristics into account. Such a mechanism not only adopts multiple fitness functions to evaluate the candidate attributes, but also queries the distance between attributes for determining whether two or more attributes can be added into the reduct simultaneously. The former may be useful in deriving reduct with higher stability and competent classification ability, and the latter may contribute to the lower time consumption of deriving reduct. By comparing with 5 state-of-the-art algorithms for searching reduct, the experimental results over 20 UCI data sets demonstrate the effectiveness of our new mechanism. This study suggests a new trend of attribute reduction for achieving a balance among various characteristics.

Information ◽  
2018 ◽  
Vol 9 (11) ◽  
pp. 282 ◽  
Author(s):  
Yuan Gao ◽  
Xiangjian Chen ◽  
Xibei Yang ◽  
Pingxin Wang

In the rough-set field, the objective of attribute reduction is to regulate the variations of measures by reducing redundant data attributes. However, most of the previous concepts of attribute reductions were designed by one and only one measure, which indicates that the obtained reduct may fail to meet the constraints given by other measures. In addition, the widely used heuristic algorithm for computing a reduct requires to scan all samples in data, and then time consumption may be too high to be accepted if the size of the data is too large. To alleviate these problems, a framework of attribute reduction based on multiple criteria with sample selection is proposed in this paper. Firstly, cluster centroids are derived from data, and then samples that are far away from the cluster centroids can be selected. This step completes the process of sample selection for reducing data size. Secondly, multiple criteria-based attribute reduction was designed, and the heuristic algorithm was used over the selected samples for computing reduct in terms of multiple criteria. Finally, the experimental results over 12 UCI datasets show that the reducts obtained by our framework not only satisfy the constraints given by multiple criteria, but also provide better classification performance and less time consumption.


2020 ◽  
Vol 2020 ◽  
pp. 1-13 ◽  
Author(s):  
Yan Chen ◽  
Jingjing Song ◽  
Keyu Liu ◽  
Yaojin Lin ◽  
Xibei Yang

In the field of neighborhood rough set, attribute reduction is considered as a key topic. Neighborhood relation and rough approximation play crucial roles in the process of obtaining the reduct. Presently, many strategies have been proposed to accelerate such process from the viewpoint of samples. However, these methods speed up the process of obtaining the reduct only from binary relation or rough approximation, and then the obtained results in time consumption may not be fully improved. To fill such a gap, a combined acceleration strategy based on compressing the scanning space of both neighborhood and lower approximation is proposed, which aims to further reduce the time consumption of obtaining the reduct. In addition, 15 UCI data sets have been selected, and the experimental results show us the following: (1) our proposed approach significantly reduces the elapsed time of obtaining the reduct; (2) compared with previous approaches, our combined acceleration strategy will not change the result of the reduct. This research suggests a new trend of attribute reduction using the multiple views.


2019 ◽  
Vol 2019 ◽  
pp. 1-16 ◽  
Author(s):  
Xun Wang ◽  
Wendong Zhang ◽  
Dun Liu ◽  
Hualong Yu ◽  
Xibei Yang ◽  
...  

In decision-theoretic rough set (DTRS), the decision costs are used to generate the thresholds for characterizing the probabilistic approximations. Similar to other rough sets, many generalized DTRS can also be formed by using different binary relations. Nevertheless, it should be noticed that most of the processes for calculating binary relations do not take the labels of samples into account, which may lead to the lower discrimination; for example, samples with different labels are regarded as indistinguishable. To fill such gap, the main contribution of this paper is to propose a pseudolabel strategy for constructing new DTRS. Firstly, a pseudolabel neighborhood relation is presented, which can differentiate samples by not only the neighborhood technique but also the pseudolabels of samples. Immediately, the pseudolabel neighborhood decision-theoretic rough set (PLNDTRS) can be constructed. Secondly, the problem of attribute reduction is explored, which aims to further reduce the PLNDTRS related decision costs. A heuristic algorithm is also designed to find such reduct. Finally, the clustering technique is employed to generate the pseudolabels of samples; the experimental results over 15 UCI data sets tell us that PLNDTRS is superior to DTRS without using pseudolabels because the former can generate lower decision costs. Moreover, the proposed heuristic algorithm is also effective in providing satisfied reducts. This study suggests new trends concerning cost sensitivity problem in rough data analysis.


Author(s):  
Qing-Hua Zhang ◽  
Long-Yang Yao ◽  
Guan-Sheng Zhang ◽  
Yu-Ke Xin

In this paper, a new incremental knowledge acquisition method is proposed based on rough set theory, decision tree and granular computing. In order to effectively process dynamic data, describing the data by rough set theory, computing equivalence classes and calculating positive region with hash algorithm are analyzed respectively at first. Then, attribute reduction, value reduction and the extraction of rule set by hash algorithm are completed efficiently. Finally, for each new additional data, the incremental knowledge acquisition method is proposed and used to update the original rules. Both algorithm analysis and experiments show that for processing the dynamic information systems, compared with the traditional algorithms and the incremental knowledge acquisition algorithms based on granular computing, the time complexity of the proposed algorithm is lower due to the efficiency of hash algorithm and also this algorithm is more effective when it is used to deal with the huge data sets.


2014 ◽  
Vol 644-650 ◽  
pp. 2120-2123 ◽  
Author(s):  
De Zhi An ◽  
Guang Li Wu ◽  
Jun Lu

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.


2014 ◽  
Vol 513-517 ◽  
pp. 973-977
Author(s):  
Zhi Li Pei ◽  
Jian Hong Qi ◽  
Li Sha Liu ◽  
Qing Hu Wang ◽  
Ming Yang Jiang ◽  
...  

In 2012, Wang Zuofei built up granularity-function and applied it to the measure of attribute importance and attribute reduction. On this basis, granularity-function based upon pessimistic and optimistic multi-granularity rough set is constructed. It is applied to the calculation of attribute importance and attribute reduction. According to the experimental results, the method can reduce the dimension of features and obviously improve the classification accuracy and efficiency.


2016 ◽  
Vol 16 (4) ◽  
pp. 13-28 ◽  
Author(s):  
Cao Chinh Nghia ◽  
Demetrovics Janos ◽  
Nguyen Long Giang ◽  
Vu Duc Thi

Abstract According to traditional rough set theory approach, attribute reduction methods are performed on the decision tables with the discretized value domain, which are decision tables obtained by discretized data methods. In recent years, researches have proposed methods based on fuzzy rough set approach to solve the problem of attribute reduction in decision tables with numerical value domain. In this paper, we proposeafuzzy distance between two partitions and an attribute reduction method in numerical decision tables based on proposed fuzzy distance. Experiments on data sets show that the classification accuracy of proposed method is more efficient than the ones based fuzzy entropy.


2014 ◽  
Vol 40 (2) ◽  
pp. 269-310 ◽  
Author(s):  
Yanir Seroussi ◽  
Ingrid Zukerman ◽  
Fabian Bohnert

Authorship attribution deals with identifying the authors of anonymous texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to texts generated by on-line users, such as e-mails and blogs. Authorship attribution of such on-line texts is a more challenging task than traditional authorship attribution, because such texts tend to be short, and the number of candidate authors is often larger than in traditional settings. We address this challenge by using topic models to obtain author representations. In addition to exploring novel ways of applying two popular topic models to this task, we test our new model that projects authors and documents to two disjoint topic spaces. Utilizing our model in authorship attribution yields state-of-the-art performance on several data sets, containing either formal texts written by a few authors or informal texts generated by tens to thousands of on-line users. We also present experimental results that demonstrate the applicability of topical author representations to two other problems: inferring the sentiment polarity of texts, and predicting the ratings that users would give to items such as movies.


2015 ◽  
Vol 24 (03) ◽  
pp. 1550003 ◽  
Author(s):  
Armin Daneshpazhouh ◽  
Ashkan Sami

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.


Author(s):  
Yi Fan ◽  
Nan Li ◽  
Chengqian Li ◽  
Zongjie Ma ◽  
Longin Jan Latecki ◽  
...  

The Maximum Vertex Weight Clique (MVWC) problem is NP-hard and also important in real-world applications. In this paper we propose to use the restart and the random walk strategies to improve local search for MVWC. If a solution is revisited in some particular situation, the search will restart. In addition, when the local search has no other options except dropping vertices, it will use random walk. Experimental results show that our solver outperforms state-of-the-art solvers in DIMACS and finds a new best-known solution. Also it is the unique solver which is comparable with state-of-the-art methods on both BHOSLIB and large crafted graphs. Furthermore we evaluated our solver in clustering aggregation. Experimental results on a number of real data sets demonstrate that our solver outperforms the state-of-the-art for solving the derived MVWC problem and helps improve the final clustering results.


Sign in / Sign up

Export Citation Format

Share Document