A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce

Pankaj Lathar; K. G. Srinivasa

doi:10.4018/ijfc.2019010103

A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce

International Journal of Fog Computing ◽

10.4018/ijfc.2019010103 ◽

2019 ◽

Vol 2 (1) ◽

pp. 61-73

Author(s):

Pankaj Lathar ◽

K. G. Srinivasa

Keyword(s):

Comparative Analysis ◽

Parallel Processing ◽

Data Processing ◽

Real World ◽

Memory Management ◽

Programming Model ◽

Raw Data ◽

Hadoop Mapreduce ◽

Active Memory ◽

Real World Problems

With the advancements in science and technology, data is being generated at a staggering rate. The raw data generated is generally of high value and may conceal important information with the potential to solve several real-world problems. In order to extract this information, the raw data available must be processed and analysed efficiently. It has however been observed, that such raw data is generated at a rate faster than it can be processed by traditional methods. This has led to the emergence of the popular parallel processing programming model – MapReduce. In this study, the authors perform a comparative analysis of two popular data processing engines – Apache Flink and Hadoop MapReduce. The analysis is based on the parameters of scalability, reliability and efficiency. The results reveal that Flink unambiguously outperformance Hadoop's MapReduce. Flink's edge over MapReduce can be attributed to following features – Active Memory Management, Dataflow Pipelining and an Inline Optimizer. It can be concluded that as the complexity and magnitude of real time raw data is continuously increasing, it is essential to explore newer platforms that are adequately and efficiently capable of processing such data.

Download Full-text

The Deluge

Mirror Worlds ◽

10.1093/oso/9780195068122.003.0012 ◽

1991 ◽

Author(s):

David Gelernter

Keyword(s):

Data Processing ◽

Real World ◽

Data Gathering ◽

Middle Region ◽

Raw Data ◽

The Real ◽

External Data ◽

The World ◽

Initial Question ◽

The Subject

we’ve installed the foundation piles and are ready to start building Mirror worlds. In this chapter we discuss (so to speak) the basement, in the next chapter we get to the attic, and the chapter after that fills in the middle region and glues the whole thing together. The basement we are about to describe is filled with lots of a certain kind of ensemble program. This kind of program, called a Trellis, makes the connection between external data and internal mirror-reality. The Trellis is, accordingly, a key player in the Mirror world cast. It’s also a good example of ensemble programming in general, and, I’ll argue, a highly significant gadget in itself. The hulking problem with which the Trellis does battle on the Mirror world’s behalf is a problem that the real world, too, will be confronting directly and in person very soon. Floods of data are pounding down all around us in torrents. How will we cope? what will we do with all this stuff? when the encroaching electronification of the world pushes the downpour rate higher by a thousand or a million times or more, what will we do then? Concretely: I’m talking about realtime data processing. The subject in this chapter is fresh data straight from the sensor. we’d like to analyze this fresh data in “realtime”—to achieve some understanding of data values as they emerge. Raw data pours into a Mirror world and gets refined by a data distillery in the basement. The processed, refined, one-hundredpercent pure stuff gets stored upstairs in the attic, where it ferments slowly into history. (In the next chapter we move upstairs.) Trellis programs are the topic here: how they are put together, how they work. But there’s an initial question that’s too important to ignore. we need to take a brief trip outside into the deluge, to establish what this stuff is and where it’s coming from. Data-gathering instruments are generally electronic. They are sensors in the field, dedicated to the non-stop, automatic gathering of measurements; or they are full-blown infomachines, waiting for people to sit down, log on and enter data by hand.

Download Full-text

Simplified Mapreduce Mechanism for Large Scale Data Processing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.8.15211 ◽

2018 ◽

Vol 7 (3.8) ◽

pp. 16

Author(s):

Md Tahsir Ahmed Munna ◽

Shaikh Muhammad Allayear ◽

Mirza Mohtashim Alam ◽

Sheikh Shah Mohammad Motiur Rahman ◽

Md Samadur Rahman ◽

...

Keyword(s):

Data Processing ◽

Large Scale ◽

Processing Time ◽

Programming Model ◽

Data Sets ◽

Hadoop Mapreduce ◽

Large Scale Data ◽

Large Scale Data Processing ◽

Scale Data ◽

Large Scale Data Sets

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.

Download Full-text

Falcon: A Production Quality Distributed Memory Reservoir Simulator

SPE Reservoir Evaluation & Engineering ◽

10.2118/51969-pa ◽

1998 ◽

Vol 1 (05) ◽

pp. 400-407 ◽

Cited By ~ 3

Author(s):

G.S. Shiralkar ◽

R.E. Stephenson ◽

Wayne Joubert ◽

Olaf Lubeck ◽

Bart van Bloemen Waanders

Keyword(s):

Real World ◽

Memory Management ◽

Large Scale ◽

Iteration Process ◽

Production Model ◽

New Production ◽

Compositional Approach ◽

Original Manuscript ◽

Black Oil ◽

Real World Problems

This paper (SPE 51969) was revised for publication from paper SPE 37975, first presented at the 1997 SPE Reservoir Simulation Symposium, Dallas, 8-11 June. Original manuscript received for review 30 June 1997. Revised manuscript received 30 March 1998. Paper peer approved 6 July 1998. Summary We describe a new production model, Falcon, that has achieved speeds on parallel computers that are 100 times faster on real world problems than current production models on a vector computer. Falcon has been used to conduct the largest, geostatistical reservoir study ever conducted within Amoco. In this paper we discuss the following: Falcon's data parallel paradigm with FORTRAN 90 and high performance FORTRAN (HPF); its single program, multiple data (SPMD) paradigm with message passing; efficient memory management that enables simulation of enormous studies; a numerical formulation that reconciles the generalized compositional approach (based on component masses and pressure) with earlier approaches (based on pressures and saturations), in a more general and more efficient approach. We also discuss Falcon's scalability up to 512 processor nodes and performance (timings and memory) achieved on a number of parallel platforms, including Cray Research's T3D and T3E, SGI's Power Challenge and Origin 2000, Thinking Machines' CM5, and IBM's SP2. Falcon also runs on single processor computers such as PC's and IBM's RS6000. We discuss a new parallel linear solver technology based on a fully parallel scalable implementation of incomplete lower-upper (ILU) preconditioning coupled with a GMRES or Orthomin iteration process. This naturally ordered global ILU preconditioner is scalable to hundreds of processors, efficiently solving the matrix problems arising from large scale simulations. The use of the techniques described in this paper has enabled us to run problem sizes of up to 16.5 million gridblocks. Falcon was used to simulate fifty geostatistically derived realizations of a large, black oil waterflood system. The realizations, each with 2.3 million cells and 1,039 wells, took an average of 4.2 hours to execute on a 128-node CM5 computer, thus enabling the simulation study to finish in less than a month. In this field study, we bypassed upscaling through the use of fine vertical resolution gridding. Our focus has been on the applicability of Falcon to real world problems. Falcon can be used for modeling both small and very large reservoirs, including reservoirs characterized by geostatistics. It can be used to simulate black oil, gas/water, and dry gas reservoirs. And, a fully compositional feature is being developed. P. 400

Download Full-text

How psychologists help solve real-world problems in multidisciplinary research teams: Introduction to the special issue.

American Psychologist ◽

10.1037/amp0000458 ◽

2019 ◽

Vol 74 (3) ◽

pp. 271-277 ◽

Cited By ~ 4

Author(s):

Robert W. Proctor ◽

Kim-Phuong L. Vu

Keyword(s):

Real World ◽

Special Issue ◽

Multidisciplinary Research ◽

Research Teams ◽

Real World Problems

Download Full-text

Psychological science applied to real world problems

PsycEXTRA Dataset ◽

10.1037/e597642012-001 ◽

2012 ◽

Author(s):

Danny Wedding

Keyword(s):

Real World ◽

Psychological Science ◽

Real World Problems

Download Full-text

Analysis of elementary school pre-service teachers' responses to real-world problems and their case studies: Focusing on finding octagonal pavilion floor area

Korean Association For Learner-Centered Curriculum And Instruction ◽

10.22251/jlcci.2020.20.10.1061 ◽

2020 ◽

Vol 20 (10) ◽

pp. 1061-1083

Author(s):

Sang Hun Song

Keyword(s):

Elementary School ◽

Case Studies ◽

Real World ◽

Floor Area ◽

Real World Problems

Download Full-text

Handrails through the Swamp? A Pilot to Test the Integration and Implementation Science Framework in Complex Real-World Research

Sustainability ◽

10.3390/su13105491 ◽

2021 ◽

Vol 13 (10) ◽

pp. 5491

Author(s):

Melissa Robson-Williams ◽

Bruce Small ◽

Roger Robson-Williams ◽

Nick Kirk

Keyword(s):

Pilot Study ◽

Case Studies ◽

Implementation Science ◽

Real World ◽

Environmental Problems ◽

Environmental Research ◽

Integrative Framework ◽

The World ◽

Science Framework ◽

Real World Problems

The socio-environmental challenges the world faces are ‘swamps’: situations that are messy, complex, and uncertain. The aim of this paper is to help disciplinary scientists navigate these swamps. To achieve this, the paper evaluates an integrative framework designed for researching complex real-world problems, the Integration and Implementation Science (i2S) framework. As a pilot study, we examine seven inter and transdisciplinary agri-environmental case studies against the concepts presented in the i2S framework, and we hypothesise that considering concepts in the i2S framework during the planning and delivery of agri-environmental research will increase the usefulness of the research for next users. We found that for the types of complex, real-world research done in the case studies, increasing attention to the i2S dimensions correlated with increased usefulness for the end users. We conclude that using the i2S framework could provide handrails for researchers, to help them navigate the swamps when engaging with the complexity of socio-environmental problems.

Download Full-text

Combined Games with Randomly Delayed Beginnings

Mathematics ◽

10.3390/math9050534 ◽

2021 ◽

Vol 9 (5) ◽

pp. 534

Author(s):

F. Thomas Bruss

Keyword(s):

Discrete Time ◽

Real World ◽

Optimal Stopping ◽

Random Variables ◽

Approximate Solutions ◽

Real World Problems

This paper presents two-person games involving optimal stopping. As far as we are aware, the type of problems we study are new. We confine our interest to such games in discrete time. Two players are to chose, with randomised choice-priority, between two games G1 and G2. Each game consists of two parts with well-defined targets. Each part consists of a sequence of random variables which determines when the decisive part of the game will begin. In each game, the horizon is bounded, and if the two parts are not finished within the horizon, the game is lost by definition. Otherwise the decisive part begins, on which each player is entitled to apply their or her strategy to reach the second target. If only one player achieves the two targets, this player is the winner. If both win or both lose, the outcome is seen as “deuce”. We motivate the interest of such problems in the context of real-world problems. A few representative problems are solved in detail. The main objective of this article is to serve as a preliminary manual to guide through possible approaches and to discuss under which circumstances we can obtain solutions, or approximate solutions.

Download Full-text

Review of Network Flow Algorithms David P. Williamson

ACM SIGACT News ◽

10.1145/3457588.3457592 ◽

2021 ◽

Vol 52 (1) ◽

pp. 12-15

Author(s):

S.V. Nagaraj

Keyword(s):

Graduate Students ◽

Real World ◽

Network Flow ◽

Network Flows ◽

Optimization Problems ◽

Capacity Constraints ◽

Incoming Flow ◽

Network Flow Problems ◽

Flow Problems ◽

Real World Problems

This book is on algorithms for network flows. Network flow problems are optimization problems where given a flow network, the aim is to construct a flow that respects the capacity constraints of the edges of the network, so that incoming flow equals the outgoing flow for all vertices of the network except designated vertices known as the source and the sink. Network flow algorithms solve many real-world problems. This book is intended to serve graduate students and as a reference. The book is also available in eBook (ISBN 9781316952894/US$ 32.00), and hardback (ISBN 9781107185890/US$99.99) formats. The book has a companion web site www.networkflowalgs.com where a pre-publication version of the book can be downloaded gratis.

Download Full-text

Help communities solve real-world problems with AI

AI Matters ◽

10.1145/3362077.3362080 ◽

2019 ◽

Vol 5 (3) ◽

pp. 12-14

Author(s):

Tara Chklovski

Keyword(s):

Real World ◽

Real World Problems

Download Full-text