A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce

2019 ◽  
Vol 2 (1) ◽  
pp. 61-73
Author(s):  
Pankaj Lathar ◽  
K. G. Srinivasa

With the advancements in science and technology, data is being generated at a staggering rate. The raw data generated is generally of high value and may conceal important information with the potential to solve several real-world problems. In order to extract this information, the raw data available must be processed and analysed efficiently. It has however been observed, that such raw data is generated at a rate faster than it can be processed by traditional methods. This has led to the emergence of the popular parallel processing programming model – MapReduce. In this study, the authors perform a comparative analysis of two popular data processing engines – Apache Flink and Hadoop MapReduce. The analysis is based on the parameters of scalability, reliability and efficiency. The results reveal that Flink unambiguously outperformance Hadoop's MapReduce. Flink's edge over MapReduce can be attributed to following features – Active Memory Management, Dataflow Pipelining and an Inline Optimizer. It can be concluded that as the complexity and magnitude of real time raw data is continuously increasing, it is essential to explore newer platforms that are adequately and efficiently capable of processing such data.

Author(s):  
David Gelernter

we’ve installed the foundation piles and are ready to start building Mirror worlds. In this chapter we discuss (so to speak) the basement, in the next chapter we get to the attic, and the chapter after that fills in the middle region and glues the whole thing together. The basement we are about to describe is filled with lots of a certain kind of ensemble program. This kind of program, called a Trellis, makes the connection between external data and internal mirror-reality. The Trellis is, accordingly, a key player in the Mirror world cast. It’s also a good example of ensemble programming in general, and, I’ll argue, a highly significant gadget in itself. The hulking problem with which the Trellis does battle on the Mirror world’s behalf is a problem that the real world, too, will be confronting directly and in person very soon. Floods of data are pounding down all around us in torrents. How will we cope? what will we do with all this stuff? when the encroaching electronification of the world pushes the downpour rate higher by a thousand or a million times or more, what will we do then? Concretely: I’m talking about realtime data processing. The subject in this chapter is fresh data straight from the sensor. we’d like to analyze this fresh data in “realtime”—to achieve some understanding of data values as they emerge. Raw data pours into a Mirror world and gets refined by a data distillery in the basement. The processed, refined, one-hundredpercent pure stuff gets stored upstairs in the attic, where it ferments slowly into history. (In the next chapter we move upstairs.) Trellis programs are the topic here: how they are put together, how they work. But there’s an initial question that’s too important to ignore. we need to take a brief trip outside into the deluge, to establish what this stuff is and where it’s coming from. Data-gathering instruments are generally electronic. They are sensors in the field, dedicated to the non-stop, automatic gathering of measurements; or they are full-blown infomachines, waiting for people to sit down, log on and enter data by hand.


2018 ◽  
Vol 7 (3.8) ◽  
pp. 16
Author(s):  
Md Tahsir Ahmed Munna ◽  
Shaikh Muhammad Allayear ◽  
Mirza Mohtashim Alam ◽  
Sheikh Shah Mohammad Motiur Rahman ◽  
Md Samadur Rahman ◽  
...  

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.  


1998 ◽  
Vol 1 (05) ◽  
pp. 400-407 ◽  
Author(s):  
G.S. Shiralkar ◽  
R.E. Stephenson ◽  
Wayne Joubert ◽  
Olaf Lubeck ◽  
Bart van Bloemen Waanders

This paper (SPE 51969) was revised for publication from paper SPE 37975, first presented at the 1997 SPE Reservoir Simulation Symposium, Dallas, 8-11 June. Original manuscript received for review 30 June 1997. Revised manuscript received 30 March 1998. Paper peer approved 6 July 1998. Summary We describe a new production model, Falcon, that has achieved speeds on parallel computers that are 100 times faster on real world problems than current production models on a vector computer. Falcon has been used to conduct the largest, geostatistical reservoir study ever conducted within Amoco. In this paper we discuss the following: Falcon's data parallel paradigm with FORTRAN 90 and high performance FORTRAN (HPF); its single program, multiple data (SPMD) paradigm with message passing; efficient memory management that enables simulation of enormous studies; a numerical formulation that reconciles the generalized compositional approach (based on component masses and pressure) with earlier approaches (based on pressures and saturations), in a more general and more efficient approach. We also discuss Falcon's scalability up to 512 processor nodes and performance (timings and memory) achieved on a number of parallel platforms, including Cray Research's T3D and T3E, SGI's Power Challenge and Origin 2000, Thinking Machines' CM5, and IBM's SP2. Falcon also runs on single processor computers such as PC's and IBM's RS6000. We discuss a new parallel linear solver technology based on a fully parallel scalable implementation of incomplete lower-upper (ILU) preconditioning coupled with a GMRES or Orthomin iteration process. This naturally ordered global ILU preconditioner is scalable to hundreds of processors, efficiently solving the matrix problems arising from large scale simulations. The use of the techniques described in this paper has enabled us to run problem sizes of up to 16.5 million gridblocks. Falcon was used to simulate fifty geostatistically derived realizations of a large, black oil waterflood system. The realizations, each with 2.3 million cells and 1,039 wells, took an average of 4.2 hours to execute on a 128-node CM5 computer, thus enabling the simulation study to finish in less than a month. In this field study, we bypassed upscaling through the use of fine vertical resolution gridding. Our focus has been on the applicability of Falcon to real world problems. Falcon can be used for modeling both small and very large reservoirs, including reservoirs characterized by geostatistics. It can be used to simulate black oil, gas/water, and dry gas reservoirs. And, a fully compositional feature is being developed. P. 400


2021 ◽  
Vol 13 (10) ◽  
pp. 5491
Author(s):  
Melissa Robson-Williams ◽  
Bruce Small ◽  
Roger Robson-Williams ◽  
Nick Kirk

The socio-environmental challenges the world faces are ‘swamps’: situations that are messy, complex, and uncertain. The aim of this paper is to help disciplinary scientists navigate these swamps. To achieve this, the paper evaluates an integrative framework designed for researching complex real-world problems, the Integration and Implementation Science (i2S) framework. As a pilot study, we examine seven inter and transdisciplinary agri-environmental case studies against the concepts presented in the i2S framework, and we hypothesise that considering concepts in the i2S framework during the planning and delivery of agri-environmental research will increase the usefulness of the research for next users. We found that for the types of complex, real-world research done in the case studies, increasing attention to the i2S dimensions correlated with increased usefulness for the end users. We conclude that using the i2S framework could provide handrails for researchers, to help them navigate the swamps when engaging with the complexity of socio-environmental problems.


Mathematics ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 534
Author(s):  
F. Thomas Bruss

This paper presents two-person games involving optimal stopping. As far as we are aware, the type of problems we study are new. We confine our interest to such games in discrete time. Two players are to chose, with randomised choice-priority, between two games G1 and G2. Each game consists of two parts with well-defined targets. Each part consists of a sequence of random variables which determines when the decisive part of the game will begin. In each game, the horizon is bounded, and if the two parts are not finished within the horizon, the game is lost by definition. Otherwise the decisive part begins, on which each player is entitled to apply their or her strategy to reach the second target. If only one player achieves the two targets, this player is the winner. If both win or both lose, the outcome is seen as “deuce”. We motivate the interest of such problems in the context of real-world problems. A few representative problems are solved in detail. The main objective of this article is to serve as a preliminary manual to guide through possible approaches and to discuss under which circumstances we can obtain solutions, or approximate solutions.


2021 ◽  
Vol 52 (1) ◽  
pp. 12-15
Author(s):  
S.V. Nagaraj

This book is on algorithms for network flows. Network flow problems are optimization problems where given a flow network, the aim is to construct a flow that respects the capacity constraints of the edges of the network, so that incoming flow equals the outgoing flow for all vertices of the network except designated vertices known as the source and the sink. Network flow algorithms solve many real-world problems. This book is intended to serve graduate students and as a reference. The book is also available in eBook (ISBN 9781316952894/US$ 32.00), and hardback (ISBN 9781107185890/US$99.99) formats. The book has a companion web site www.networkflowalgs.com where a pre-publication version of the book can be downloaded gratis.


AI Matters ◽  
2019 ◽  
Vol 5 (3) ◽  
pp. 12-14
Author(s):  
Tara Chklovski

Sign in / Sign up

Export Citation Format

Share Document