Quick multivariate kernel density estimation for massive data sets

2006 ◽  
Vol 22 (5-6) ◽  
pp. 533-546 ◽  
Author(s):  
K. F. Cheng ◽  
C. K. Chu ◽  
Dennis K. J. Lin
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Wenzhong Shi ◽  
Chengzhuo Tong ◽  
Anshu Zhang ◽  
Bin Wang ◽  
Zhicheng Shi ◽  
...  

A Correction to this paper has been published: https://doi.org/10.1038/s42003-021-01924-6


Author(s):  
A Salman Avestimehr ◽  
Seyed Mohammadreza Mousavi Kalan ◽  
Mahdi Soltanolkotabi

Abstract Dealing with the shear size and complexity of today’s massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper, we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy) and straggler toleration in this framework.


2021 ◽  
Vol 13 (1) ◽  
pp. 796-806
Author(s):  
Zhen Shuo ◽  
Zhang Jingyu ◽  
Zhang Zhengxiang ◽  
Zhao Jianjun

Abstract Understanding the risk of grassland fire occurrence associated with historical fire point events is critical for implementing effective management of grasslands. This may require a model to convert the fire point records into continuous spatial distribution data. Kernel density estimation (KDE) can be used to represent the spatial distribution of grassland fire occurrences and decrease the influences historical records in point format with inaccurate positions. The bandwidth is the most important parameter because it dominates the amount of variation in the estimation of KDE. In this study, the spatial distribution characteristic of the points was considered to determine the bandwidth of KDE with the Ripley’s K function method. With high, medium, and low concentration scenes of grassland fire points, kernel density surfaces were produced by using the kernel function with four bandwidth parameter selection methods. For acquiring the best maps, the estimated density surfaces were compared by mean integrated squared error methods. The results show that Ripley’s K function method is the best bandwidth selection method for mapping and analyzing the risk of grassland fire occurrence with the dependent or inaccurate point variable, considering the spatial distribution characteristics.


Sign in / Sign up

Export Citation Format

Share Document