scholarly journals Heuristic Approaches to Attribute Reduction for Generalized Decision Preservation

2019 ◽  
Vol 9 (14) ◽  
pp. 2841 ◽  
Author(s):  
Nan Zhang ◽  
Xueyi Gao ◽  
Tianyou Yu

Attribute reduction is a challenging problem in rough set theory, which has been applied in many research fields, including knowledge representation, machine learning, and artificial intelligence. The main objective of attribute reduction is to obtain a minimal attribute subset that can retain the same classification or discernibility properties as the original information system. Recently, many attribute reduction algorithms, such as positive region preservation, generalized decision preservation, and distribution preservation, have been proposed. The existing attribute reduction algorithms for generalized decision preservation are mainly based on the discernibility matrix and are, thus, computationally very expensive and hard to use in large-scale and high-dimensional data sets. To overcome this problem, we introduce the similarity degree for generalized decision preservation. On this basis, the inner and outer significance measures are proposed. By using heuristic strategies, we develop two quick reduction algorithms for generalized decision preservation. Finally, theoretical and experimental results show that the proposed heuristic reduction algorithms are effective and efficient.

Author(s):  
Hao Ge ◽  
Chuanjian Yang ◽  
Longshu Li

Attribute reduction is one of key issues in rough set theory, and positive region reduct is a classical type of reducts. However, a lot of reduction algorithms have more high time expenses when dealing with high-volume and high-dimensional data sets. To overcome this shortcoming, in this paper, a relative discernibility reduction method based on the simplified decision table of the original decision table is researched for obtaining positive region reduct. Moreover, to further improve performance of reduction algorithm, we develop an accelerator for attribute reduction, which reduces the radix sort times of the reduction process to raise algorithm efficiency. By the accelerator, two positive region reduction algorithms, i.e., FARA-RS and BARA-RS, based on the relative discernibility are designed. FARA-RS simultaneously reduce the size of the universe and the number of radix sort to achieve speedup and BARA-RS only reduce the number of radix sort to achieve acceleration. The experimental results show that the proposed reduction algorithms are effective and feasible for high dimensional and large data sets.


2014 ◽  
Vol 1 (1) ◽  
pp. 1-14 ◽  
Author(s):  
Sharmistha Bhattacharya Halder

The concept of rough set was first developed by Pawlak (1982). After that it has been successfully applied in many research fields, such as pattern recognition, machine learning, knowledge acquisition, economic forecasting and data mining. But the original rough set model cannot effectively deal with data sets which have noisy data and latent useful knowledge in the boundary region may not be fully captured. In order to overcome such limitations, some extended rough set models have been put forward which combine with other available soft computing technologies. Many researchers were motivated to investigate probabilistic approaches to rough set theory. Variable precision rough set model (VPRSM) is one of the most important extensions. Bayesian rough set model (BRSM) (Slezak & Ziarko, 2002), as the hybrid development between rough set theory and Bayesian reasoning, can deal with many practical problems which could not be effectively handled by original rough set model. Based on Bayesian decision procedure with minimum risk, Yao (1990) puts forward a new model called decision theoretic rough set model (DTRSM) which brings new insights into the probabilistic approaches to rough set theory. Throughout this paper, the concept of decision theoretic rough set is studied and also a new concept of Bayesian decision theoretic rough set is introduced. Lastly a comparative study is done between Bayesian decision theoretic rough set and Rough set defined by Pawlak (1982).


Author(s):  
Qing-Hua Zhang ◽  
Long-Yang Yao ◽  
Guan-Sheng Zhang ◽  
Yu-Ke Xin

In this paper, a new incremental knowledge acquisition method is proposed based on rough set theory, decision tree and granular computing. In order to effectively process dynamic data, describing the data by rough set theory, computing equivalence classes and calculating positive region with hash algorithm are analyzed respectively at first. Then, attribute reduction, value reduction and the extraction of rule set by hash algorithm are completed efficiently. Finally, for each new additional data, the incremental knowledge acquisition method is proposed and used to update the original rules. Both algorithm analysis and experiments show that for processing the dynamic information systems, compared with the traditional algorithms and the incremental knowledge acquisition algorithms based on granular computing, the time complexity of the proposed algorithm is lower due to the efficiency of hash algorithm and also this algorithm is more effective when it is used to deal with the huge data sets.


2014 ◽  
Vol 644-650 ◽  
pp. 2120-2123 ◽  
Author(s):  
De Zhi An ◽  
Guang Li Wu ◽  
Jun Lu

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.


2016 ◽  
Vol 16 (4) ◽  
pp. 13-28 ◽  
Author(s):  
Cao Chinh Nghia ◽  
Demetrovics Janos ◽  
Nguyen Long Giang ◽  
Vu Duc Thi

Abstract According to traditional rough set theory approach, attribute reduction methods are performed on the decision tables with the discretized value domain, which are decision tables obtained by discretized data methods. In recent years, researches have proposed methods based on fuzzy rough set approach to solve the problem of attribute reduction in decision tables with numerical value domain. In this paper, we proposeafuzzy distance between two partitions and an attribute reduction method in numerical decision tables based on proposed fuzzy distance. Experiments on data sets show that the classification accuracy of proposed method is more efficient than the ones based fuzzy entropy.


Author(s):  
Rana Aamir Raza

In the area of fuzzy rough set theory (FRST), researchers have gained much interest in handling the high-dimensional data. Rough set theory (RST) is one of the important tools used to pre-process the data and helps to obtain a better predictive model, but in RST, the process of discretization may loss useful information. Therefore, fuzzy rough set theory contributes well with the real-valued data. In this paper, an efficient technique is presented based on Fuzzy rough set theory (FRST) to pre-process the large-scale data sets to increase the efficacy of the predictive model. Therefore, a fuzzy rough set-based feature selection (FRSFS) technique is associated with a Random weight neural network (RWNN) classifier to obtain the better generalization ability. Results on different dataset show that the proposed technique performs well and provides better speed and accuracy when compared by associating FRSFS with other machine learning classifiers (i.e., KNN, Naive Bayes, SVM, decision tree and backpropagation neural network).


Author(s):  
DIANXUN SHUAI ◽  
XUE FANGLIANG

Data clustering has been widely used in many areas, such as data mining, statistics, machine learning and so on. A variety of clustering approaches have been proposed so far, but most of them are not qualified to quickly cluster a large-scale high-dimensional database. This paper is devoted to a novel data clustering approach based on a generalized particle model (GPM). The GPM transforms the data clustering process into a stochastic process over the configuration space on a GPM array. The proposed approach is characterized by the self-organizing clustering and many advantages in terms of the insensitivity to noise, quality robustness to clustered data, suitability for high-dimensional and massive data sets, learning ability, openness and easier hardware implementation with the VLSI systolic technology. The analysis and simulations have shown the effectiveness and good performance of the proposed GPM approach to data clustering.


2016 ◽  
Vol 16 (2) ◽  
pp. 3-15 ◽  
Author(s):  
Demetrovics Janos ◽  
Nguyen Thi Lan Huong ◽  
Vu Duc Thi ◽  
Nguyen Long Giang

Abstract Feature selection is a vital problem which needs to be effectively solved in knowledge discovery in databases and pattern recognition due to two basic reasons: minimizing costs and accurately classifying data. Feature selection using rough set theory is also called attribute reduction. It has attracted a lot of attention from researchers and numerous potential results have been gained. However, most of them are applied on static data and attribute reduction in dynamic databases is still in its early stages. This paper focuses on developing incremental methods and algorithms to derive reducts, employing a distance measure when decision systems vary in condition attribute set. We also conduct experiments on UCI data sets and the experimental results show that the proposed algorithms are better in terms of time consumption and reducts’ cardinality in comparison with non-incremental heuristic algorithm and the incremental approach using information entropy proposed by authors in [17].


2021 ◽  
Author(s):  
Tung Dang ◽  
Kie Kumaishi ◽  
Erika Usui ◽  
Shungo Kobori ◽  
Takumi Sato ◽  
...  

AbstractBackgroundThe rapid and accurate identification of a minimal-size core set of representative microbial species plays an important role in the clustering of microbial community data and interpretation of the clustering results. However, the huge dimensionality of microbial metagenomics data sets is a major challenge for the existing methods such as Dirichlet multinomial mixture (DMM) models. In the framework of the existing methods, computational burdens for identifying a small number of representative species from a huge number of observed species remain a challenge.ResultsWe proposed a novel framework to improve the performance of the widely used DMM approach by combining three ideas: (i) We extended the finite DMM model to the infinite case, via the consideration of Dirichlet process mixtures and estimate the number of clusters as a random variables. (ii) We proposed an indicator variable to identify representative operational taxonomic units that substantially contribute to the differentiation among clusters. (iii) To address the computational burdens of the high-dimensional microbiome data, we proposed are a stochastic variational inference, which approximates the posterior distribution using a controllable distribution called variational distribution, and stochastic optimization algorithms for fast computation. With the proposed method named stochastic variational variable selection (SVVS), we analyzed the root microbiome data collected in our soybean field experiment and the human gut microbiome data from three published data sets of large-scale case-control studies.ConclusionsSVVS demonstrated a better performance and significantly faster computation than existing methods in all cases of testing data sets. In particular, SVVS is the only method that can analyze the massive high-dimensional microbial data with above 50,000 microbial species and 1,000 samples. Furthermore, it was suggested that microbial species selected as a core set played important roles in the recent microbiome studies.


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 155 ◽  
Author(s):  
Lin Sun ◽  
Xiaoyu Zhang ◽  
Jiucheng Xu ◽  
Shiguang Zhang

Attribute reduction as an important preprocessing step for data mining, and has become a hot research topic in rough set theory. Neighborhood rough set theory can overcome the shortcoming that classical rough set theory may lose some useful information in the process of discretization for continuous-valued data sets. In this paper, to improve the classification performance of complex data, a novel attribute reduction method using neighborhood entropy measures, combining algebra view with information view, in neighborhood rough sets is proposed, which has the ability of dealing with continuous data whilst maintaining the classification information of original attributes. First, to efficiently analyze the uncertainty of knowledge in neighborhood rough sets, by combining neighborhood approximate precision with neighborhood entropy, a new average neighborhood entropy, based on the strong complementarity between the algebra definition of attribute significance and the definition of information view, is presented. Then, a concept of decision neighborhood entropy is investigated for handling the uncertainty and noisiness of neighborhood decision systems, which integrates the credibility degree with the coverage degree of neighborhood decision systems to fully reflect the decision ability of attributes. Moreover, some of their properties are derived and the relationships among these measures are established, which helps to understand the essence of knowledge content and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is proposed to improve the classification performance of complex data sets. The experimental results under an instance and several public data sets demonstrate that the proposed method is very effective for selecting the most relevant attributes with great classification performance.


Sign in / Sign up

Export Citation Format

Share Document