Accelerating Spark-Based Applications with MPI and OpenACC

Complexity ◽

10.1155/2021/9943289 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Saeed Alshahrani ◽

Waleed Al Shehri ◽

Jameel Almalki ◽

Ahmed M. Alghamdi ◽

Abdullah M. Alammari

Keyword(s):

Big Data ◽

Power Consumption ◽

Parallel Programming ◽

Graphics Processing Units ◽

Message Passing Interface ◽

Programming Model ◽

Programming Models ◽

Mapping Technique ◽

Big Data Applications ◽

Parallel Programming Models

The amount of data produced in scientific and commercial fields is growing dramatically. Correspondingly, big data technologies, such as Hadoop and Spark, have emerged to tackle the challenges of collecting, processing, and storing such large-scale data. Unfortunately, big data applications usually have performance issues and do not fully exploit a hardware infrastructure. One reason is that applications are developed using high-level programming languages that do not provide low-level system control in terms of performance of highly parallel programming models like message passing interface (MPI). Moreover, big data is considered a barrier of parallel programming models or accelerators (e.g., CUDA and OpenCL). Therefore, the aim of this study is to investigate how the performance of big data applications can be enhanced without sacrificing the power consumption of a hardware infrastructure. A Hybrid Spark MPI OpenACC (HSMO) system is proposed for integrating Spark as a big data programming model, with MPI and OpenACC as parallel programming models. Such integration brings together the advantages of each programming model and provides greater effectiveness. To enhance performance without sacrificing power consumption, the integration approach needs to exploit the hardware infrastructure in an intelligent manner. For achieving this performance enhancement, a mapping technique is proposed that is built based on the application’s virtual topology as well as the physical topology of the undelaying resources. To the best of our knowledge, there is no existing method in big data applications related to utilizing graphics processing units (GPUs), which are now an essential part of high-performance computing (HPC) as a powerful resource for fast computation.

Download Full-text

Task-based programming in COMPSs to converge from HPC to big data

The International Journal of High Performance Computing Applications ◽

10.1177/1094342017701278 ◽

2017 ◽

Vol 32 (1) ◽

pp. 45-60 ◽

Cited By ~ 11

Author(s):

Javier Conejero ◽

Sandra Corella ◽

Rosa M Badia ◽

Jesus Labarta

Keyword(s):

Big Data ◽

High Performance ◽

Programming Model ◽

Good Alternative ◽

Programming Models ◽

Suitable Model ◽

Advantages And Disadvantages ◽

Big Data Applications ◽

And Performance ◽

The Right

Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for addressing emerging big data problems. COMP Superscalar (COMPSs) is a task-based environment that tackles distributed computing (including Clouds) and is a good alternative for a task-based programming model for big data applications. This article describes why we consider that task-based programming models are a good approach for big data applications. The article includes a comparison of Spark and COMPSs in terms of architecture, programming model, and performance. It focuses on the differences that both frameworks have in structural terms, on their programmability interface, and in terms of their efficiency by means of three widely known benchmarking kernels: Wordcount, Kmeans, and Terasort. These kernels enable the evaluation of the more important functionalities of both programming models and analyze different work flows and conditions. The main results achieved from this comparison are (1) COMPSs is able to extract the inherent parallelism from the user code with minimal coding effort as opposed to Spark, which requires the existing algorithms to be adapted and rewritten by explicitly using their predefined functions, (2) it is an improvement in terms of performance when compared with Spark, and (3) COMPSs has shown to scale better than Spark in most cases. Finally, we discuss the advantages and disadvantages of both frameworks, highlighting the differences that make them unique, thereby helping to choose the right framework for each particular objective.

Download Full-text

Study of parallel programming models on computer clusters with Intel MIC coprocessors

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015580864 ◽

2015 ◽

Vol 31 (4) ◽

pp. 303-315 ◽

Cited By ~ 3

Author(s):

Miaoqing Huang ◽

Chenggang Lai ◽

Xuan Shi ◽

Zhijun Hao ◽

Haihang You

Keyword(s):

Parallel Programming ◽

High Performance ◽

Programming Model ◽

Fixed Number ◽

Parallel Applications ◽

Programming Models ◽

Communication Overhead ◽

Computer Clusters ◽

Parallel Programming Models ◽

Intel Mic

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

Download Full-text

Concurrent Collections

Scientific Programming ◽

10.1155/2010/521797 ◽

2010 ◽

Vol 18 (3-4) ◽

pp. 203-217 ◽

Cited By ~ 61

Author(s):

Zoran Budimlić ◽

Michael Burke ◽

Vincent Cavé ◽

Kathleen Knobe ◽

Geoff Lowney ◽

...

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Programming Models ◽

Data Parallelism ◽

Parallel Programming Models ◽

Ordering Constraints ◽

Cnc Programming ◽

High Level ◽

Execution Semantics ◽

Better Than

We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC offers performance and scalability equivalent to or better than that offered by lower-level parallel programming models.

Download Full-text

A Review on Large Scale Graph Processing Using Big Data Based Parallel Programming Models

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2017.02.07 ◽

2017 ◽

Vol 9 (2) ◽

pp. 49-57 ◽

Cited By ~ 1

Author(s):

Anuraj Mohan ◽

◽

Remya G

Keyword(s):

Big Data ◽

Parallel Programming ◽

Large Scale ◽

Programming Models ◽

Graph Processing ◽

Parallel Programming Models

Download Full-text

APPROACHING DEVELOPMENTS ON PARALLEL PROGRAMMING MODELS THROUGH JAVA

i-manager’s Journal on Software Engineering ◽

10.26634/jse.10.3.4900 ◽

2016 ◽

Vol 10 (3) ◽

pp. 14

Author(s):

VEERASAMY BALA DHANDAYUTHAPANI ◽

NASIRA G.M ◽

◽

Keyword(s):

Parallel Programming ◽

Programming Models ◽

Parallel Programming Models

Download Full-text

Dynamic clustering for distinct parallel programming models on NoC-based MPSoCs

Proceedings of the 4th International Workshop on Network on Chip Architectures - NoCArc '11 ◽

10.1145/2076501.2076514 ◽

2011 ◽

Cited By ~ 1

Author(s):

Gustavo Girão ◽

Thiago Santini ◽

Flávio R. Wagner

Keyword(s):

Parallel Programming ◽

Programming Models ◽

Dynamic Clustering ◽

Parallel Programming Models

Download Full-text

Evaluating attainable memory bandwidth of parallel programming models via BabelStream

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2017.10011352 ◽

2017 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Matt Martineau ◽

Simon McIntosh Smith ◽

James Price ◽

Tom Deakin

Keyword(s):

Parallel Programming ◽

Programming Models ◽

Memory Bandwidth ◽

Parallel Programming Models

Download Full-text

Tying Memory Management to Parallel Programming Models

Euro-Par 2006 Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/11823285_69 ◽

2006 ◽

pp. 666-675

Author(s):

Ioannis E. Venetis ◽

Theodore S. Papatheodorou

Keyword(s):

Parallel Programming ◽

Memory Management ◽

Programming Models ◽

Parallel Programming Models

Download Full-text

Fifth International Workshop on High-level Parallel Programming Models and Supportive Environments HIPS 2000

Lecture Notes in Computer Science - Parallel and Distributed Processing ◽

10.1007/3-540-45591-4_34 ◽

2000 ◽

pp. 257-260

Author(s):

Martin Schulz

Keyword(s):

Parallel Programming ◽

International Workshop ◽

Programming Models ◽

Parallel Programming Models ◽

Supportive Environments ◽

High Level

Download Full-text

On the adequacy of lightweight thread approaches for high-level parallel programming models

Future Generation Computer Systems ◽

10.1016/j.future.2018.02.016 ◽

2018 ◽

Vol 84 ◽

pp. 22-31 ◽

Cited By ~ 3

Author(s):

Adrián Castelló ◽

Rafael Mayo ◽

Kevin Sala ◽

Vicenç Beltran ◽

Pavan Balaji ◽

...

Keyword(s):

Parallel Programming ◽

Programming Models ◽

Parallel Programming Models ◽

High Level

Download Full-text