SP-Partitioner: A novel partition method to handle intermediate data skew in spark streaming

2018 ◽  
Vol 86 ◽  
pp. 1054-1063 ◽  
Author(s):  
Guipeng Liu ◽  
Xiaomin Zhu ◽  
Ji Wang ◽  
Deke Guo ◽  
Weidong Bao ◽  
...  
2020 ◽  
Vol 100 ◽  
pp. 102699
Author(s):  
Zhongming Fu ◽  
Zhuo Tang ◽  
Li Yang ◽  
Kenli Li ◽  
Keqin Li
Keyword(s):  

Author(s):  
Chetana Tukkoji ◽  
Seetharam K

There is a growing need for an ad-hoc analysis of extremely large data sets, especially at web based companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, over a solution, but are usually prohibitively ex-pensive at this scale. But, most of the people who analyze data are called procedural programmers. The success of the more procedural map-reduce programming model and its associated scalable implementations on commodity hardware (low cost), is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse. The map reduce is an effective tool for parallel data processing. One significant issue in practical map reduce application is the data skew. The imbalance of the amount of the data assigned to each tasks to take much longer to finish than the others. Now we need to propose a framework, to solve the data skew problem to reduce side application in the map reduce. It usage a innovative sampling of the data input accurate approximation to the distribution of the intermediate data by sampling only small fraction of the intermediate data. It does not contain the any type of the data to prevent the overlap between the maps and reduce stages.


2013 ◽  
Vol 34 (9) ◽  
pp. 2078-2084 ◽  
Author(s):  
Yun-fei Wang ◽  
Du-yan Bi ◽  
De-qin Shi ◽  
Tian-jun Huang ◽  
Di Liu

Sign in / Sign up

Export Citation Format

Share Document