scholarly journals Geographic spatiotemporal big data correlation analysis via the Hilbert–Huang transformation

2017 ◽  
Vol 89 ◽  
pp. 130-141 ◽  
Author(s):  
Weijing Song ◽  
Lizhe Wang ◽  
Yang Xiang ◽  
Albert Y. Zomaya
2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Hao Hu ◽  
Yuling Liu ◽  
Hongqi Zhang ◽  
Yuchen Zhang

Network security metrics allow quantitatively evaluating the overall resilience of networked systems against attacks. From this aim, security metrics are of great importance to the security-related decision-making process of enterprises. In this paper, we employ absorbing Markov chain (AMC) to estimate the network security combining with the technique of big data correlation analysis. Specifically, we construct the model of AMC using a large amount of alert data to describe the scenario of multistep attacks in the real world. In addition, we implement big data correlation analysis to generate the transition probability matrix from alert stream, which defines the probabilities of transferring from one attack action to another according to a given scenario before reaching one of some attack targets. Based on the probability reasoning, two metric algorithms are designed to estimate the attack scenario as well as the attackers, namely, the expected number of visits (ENV) and the expected success probability (ESP). The superiority is that the proposed model and algorithms assist the administrator in building new scenarios, prioritizing alerts, and ranking them.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sreemoyee Biswas ◽  
Nilay Khare ◽  
Pragati Agrawal ◽  
Priyank Jain

AbstractWith data becoming a salient asset worldwide, dependence amongst data kept on growing. Hence the real-world datasets that one works upon in today’s time are highly correlated. Since the past few years, researchers have given attention to this aspect of data privacy and found a correlation among data. The existing data privacy guarantees cannot assure the expected data privacy algorithms. The privacy guarantees provided by existing algorithms were enough when there existed no relation between data in the datasets. Hence, by keeping the existence of data correlation into account, there is a dire need to reconsider the privacy algorithms. Some of the research has considered utilizing a well-known machine learning concept, i.e., Data Correlation Analysis, to understand the relationship between data in a better way. This concept has given some promising results as well. Though it is still concise, the researchers did a considerable amount of research on correlated data privacy. Researchers have provided solutions using probabilistic models, behavioral analysis, sensitivity analysis, information theory models, statistical correlation analysis, exhaustive combination analysis, temporal privacy leakages, and weighted hierarchical graphs. Nevertheless, researchers are doing work upon the real-world datasets that are often large (technologically termed big data) and house a high amount of data correlation. Firstly, the data correlation in big data must be studied. Researchers are exploring different analysis techniques to find the best suitable. Then, they might suggest a measure to guarantee privacy for correlated big data. This survey paper presents a detailed survey of the methods proposed by different researchers to deal with the problem of correlated data privacy and correlated big data privacy and highlights the future scope in this area. The quantitative analysis of the reviewed articles suggests that data correlation is a significant threat to data privacy. This threat further gets magnified with big data. While considering and analyzing data correlation, then parameters such as Maximum queries executed, Mean average error values show better results when compared with other methods. Hence, there is a grave need to understand and propose solutions for correlated big data privacy.


2021 ◽  
Vol 9 (1) ◽  
pp. 95-104
Author(s):  
Fubo Shao ◽  
Hui Liu

Abstract In the era of big data, correlation analysis is significant because it can quickly detect the correlation between factors. And then, it has been received much attention. Due to the good properties of generality and equitability of the maximal information coefficient (MIC), MIC is a hotspot in the research of correlation analysis. However, if the original approximate algorithm of MIC is directly applied into mining correlations in big data, the computation time is very long. Then the theoretical time complexity of the original approximate algorithm is analyzed in depth and the time complexity is n 2.4 when parameters are default. And the experiments show that the large number of candidate partitions of random relationships results in long computation time. The analysis is a good preparation for the next step work of designing new fast algorithms.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sara Migliorini ◽  
Alberto Belussi ◽  
Elisa Quintarelli ◽  
Damiano Carra

AbstractThe MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called CoPart, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times.


Sign in / Sign up

Export Citation Format

Share Document