scholarly journals A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing

Author(s):  
Zaineb Chelly Dagdia ◽  
Christine Zarges ◽  
Gael Beck ◽  
Hanene Azzag ◽  
Mustapha Lebbah
2021 ◽  
Vol 182 (2) ◽  
pp. 111-179
Author(s):  
Zaineb Chelly Dagdia ◽  
Christine Zarges

In the context of big data, granular computing has recently been implemented by some mathematical tools, especially Rough Set Theory (RST). As a key topic of rough set theory, feature selection has been investigated to adapt the related granular concepts of RST to deal with large amounts of data, leading to the development of the distributed RST version. However, despite of its scalability, the distributed RST version faces a key challenge tied to the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. Therefore, in this manuscript, we propose a new distributed RST version based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data feature selection. LSH-dRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more efficient way. More precisely, in this paper, we perform a detailed analysis of the performance of LSH-dRST by comparing it to the standard distributed RST version, which is based on a random partitioning of the universe. We demonstrate that our LSH-dRST is scalable when dealing with large amounts of data. We also demonstrate that LSH-dRST ensures the partitioning of the high dimensional feature search space in a more reliable way; hence better preserving data dependency in the distributed environment and ensuring a lower computational cost.


2021 ◽  
pp. 08-20
Author(s):  
Manal .. ◽  
◽  
◽  
Ahmed N. Al Al-Masri

In recent years, with the rapid development of the domestic economy, the concept of sustainable development has been paid more and more attention. Ecological environment protection is more and more important, and the ecological environment is closely related to economic development. How to measure the relationship between the two is very important. Whether it is to build ecological environment protection or to ensure sustainable development of the economy, we should take the green development concept as a guiding concept, promote ecological economic development, and study the integration of ecological data is of great significance for solving these problems. The research of this thesis studies the multi-source heterogeneous (MSH) ecological big data (BD)adaptive fusion based (FM) based on symmetric encryption. This paper sets up a comparative experiment, multi-sensor (MS) data fusion based (DFM) based on Rough set theory, MSH data fusion based on data information conversion. The method is compared with the symmetric fusion MSH BD adaptive FM proposed in this paper. The results show that the MSH DFM based on Rough set theory has the highest confidence of 0.812; the MSH DFM based on data information conversion has the highest confidence of 0.68; based on symmetric encryption MSH BD The fusion confidence of the adaptive FM is up to 0.965, and the MSH ecological BD adaptive FM based on symmetric encryption is superior.


2020 ◽  
Vol 62 (8) ◽  
pp. 3321-3386
Author(s):  
Zaineb Chelly Dagdia ◽  
Christine Zarges ◽  
Gaël Beck ◽  
Mustapha Lebbah

Author(s):  
Vamsidhar Talasila ◽  
◽  
Kotakonda Madhubabu ◽  
Meghana Mahadasyam ◽  
Naga Atchala ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document