Improving Execution Time of Parallel Programs on Large Scale Chip Multiprocessors with Constant Average Power Processing

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.

Download Full-text

Automatic Partitioning of Large Scale Simulation in Grid Computing for Run Time Reduction

Innovations in Information Systems for Business Functionality and Operations Management ◽

10.4018/978-1-4666-0933-4.ch014 ◽

2012 ◽

pp. 225-252

Author(s):

Nurcin Celik ◽

Esfandyar Mazhari ◽

John Canby ◽

Omid Kazemi ◽

Parag Sarfare ◽

...

Keyword(s):

Execution Time ◽

Large Scale ◽

Time Synchronization ◽

Computational Grid ◽

Experimental Results ◽

Time Interval ◽

Computational Power ◽

Large Scale Systems ◽

Large Scale Simulations ◽

Reduce Execution Time

Simulating large-scale systems usually entails exhaustive computational powers and lengthy execution times. The goal of this research is to reduce execution time of large-scale simulations without sacrificing their accuracy by partitioning a monolithic model into multiple pieces automatically and executing them in a distributed computing environment. While this partitioning allows us to distribute required computational power to multiple computers, it creates a new challenge of synchronizing the partitioned models. In this article, a partitioning methodology based on a modified Prim’s algorithm is proposed to minimize the overall simulation execution time considering 1) internal computation in each of the partitioned models and 2) time synchronization between them. In addition, the authors seek to find the most advantageous number of partitioned models from the monolithic model by evaluating the tradeoff between reduced computations vs. increased time synchronization requirements. In this article, epoch- based synchronization is employed to synchronize logical times of the partitioned simulations, where an appropriate time interval is determined based on the off-line simulation analyses. A computational grid framework is employed for execution of the simulations partitioned by the proposed methodology. The experimental results reveal that the proposed approach reduces simulation execution time significantly while maintaining the accuracy as compared with the monolithic simulation execution approach.

Download Full-text

Re-Running Large-Scale Parallel Programs Using Two Nodes

2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) ◽

10.1109/bdcloud.2018.00079 ◽

2018 ◽

Author(s):

Yayu Guo ◽

Fang Lin ◽

Yi Liu ◽

Depei Qian

Keyword(s):

Large Scale ◽

Parallel Programs

Download Full-text

Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study

Applied Sciences ◽

10.3390/app10072634 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2634

Author(s):

JunWeon Yoon ◽

TaeYoung Hong ◽

ChanYeol Park ◽

Seo-Young Noh ◽

HeonChang Yu

Keyword(s):

Execution Time ◽

High Performance ◽

Large Scale ◽

Experimental Result ◽

Optimization Approach ◽

Root Cause ◽

Large Systems ◽

Job Scheduler ◽

Performance Computing

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.

Download Full-text

SYMBOLIC PERFORMANCE ESTIMATION OF SPECULATIVE PARALLEL PROGRAMS

Parallel Processing Letters ◽

10.1142/s0129626403001471 ◽

2003 ◽

Vol 13 (04) ◽

pp. 513-524 ◽

Cited By ~ 3

Author(s):

H. GAUTAMA ◽

A. J. C. VAN GEMUND

Keyword(s):

Performance Prediction ◽

Execution Time ◽

Time Distribution ◽

Prediction Accuracy ◽

Analytical Approach ◽

Parallel Programs ◽

Performance Estimation ◽

Program Execution ◽

Empirical Distributions ◽

Measurement Results

Speculative parallelism refers to searching in parallel for a solution, such as finding a pattern in a data base, where finding the first solution terminates the whole parallel process. Different performance prediction methods are required as compared to traditional parallelism. In this paper we introduce an analytical approach to predict the execution time distribution of data-dependent parallel programs that feature N-ary and binary speculative parallel compositions. The method is based on the use of statistical moments which allows program execution time distribution to be approximated at O(1) solution complexity. Measurement results for synthetic distributions indicate an accuracy that lies in the percent range while for empirical distributions on internet search engines the prediction accuracy is acceptable, provided sufficient workload unimodality.

Download Full-text

Error detection in large-scale parallel programs with long runtimes

Future Generation Computer Systems ◽

10.1016/s0167-739x(02)00178-4 ◽

2003 ◽

Vol 19 (5) ◽

pp. 689-700

Author(s):

Dieter Kranzlmüller ◽

Nam Thoai ◽

Jens Volkert

Keyword(s):

Error Detection ◽

Large Scale ◽

Parallel Programs

Download Full-text

Automatic Partitioning of Large Scale Simulation in Grid Computing for Run Time Reduction

International Journal of Operations Research and Information Systems ◽

10.4018/joris.2010040105 ◽

2010 ◽

Vol 1 (2) ◽

pp. 64-90 ◽

Cited By ~ 6

Author(s):

Nurcin Celik ◽

Esfandyar Mazhari ◽

John Canby ◽

Omid Kazemi ◽

Parag Sarfare ◽

...

Keyword(s):

Execution Time ◽

Large Scale ◽

Time Synchronization ◽

Computational Grid ◽

Time Interval ◽

Computational Power ◽

Computing Environment ◽

Large Scale Systems ◽

Large Scale Simulations ◽

Reduce Execution Time

Simulating large-scale systems usually entails exhaustive computational powers and lengthy execution times. The goal of this research is to reduce execution time of large-scale simulations without sacrificing their accuracy by partitioning a monolithic model into multiple pieces automatically and executing them in a distributed computing environment. While this partitioning allows us to distribute required computational power to multiple computers, it creates a new challenge of synchronizing the partitioned models. In this article, a partitioning methodology based on a modified Prim’s algorithm is proposed to minimize the overall simulation execution time considering 1) internal computation in each of the partitioned models and 2) time synchronization between them. In addition, the authors seek to find the most advantageous number of partitioned models from the monolithic model by evaluating the tradeoff between reduced computations vs. increased time synchronization requirements. In this article, epoch- based synchronization is employed to synchronize logical times of the partitioned simulations, where an appropriate time interval is determined based on the off-line simulation analyses. A computational grid framework is employed for execution of the simulations partitioned by the proposed methodology. The experimental results reveal that the proposed approach reduces simulation execution time significantly while maintaining the accuracy as compared with the monolithic simulation execution approach.

Download Full-text

ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters

The Journal of Supercomputing ◽

10.1007/s11227-020-03319-6 ◽

2020 ◽

Author(s):

Fang Lin ◽

Yi Liu ◽

Yayu Guo ◽

Depei Qian

Keyword(s):

Large Scale ◽

Parallel Programs ◽

Small Clusters

Download Full-text

Design and performance analysis of global path planning techniques for autonomous mobile robots in grid environments

International Journal of Advanced Robotic Systems ◽

10.1177/1729881416663663 ◽

2017 ◽

Vol 14 (2) ◽

pp. 172988141666366 ◽

Cited By ~ 18

Author(s):

Imen Chaari ◽

Anis Koubaa ◽

Hachemi Bennaceur ◽

Adel Ammar ◽

Maram Alajlan ◽

...

Keyword(s):

Path Planning ◽

Execution Time ◽

Large Scale ◽

Optimal Path ◽

Heuristic Methods ◽

Solution Quality ◽

Good Trade ◽

And Performance ◽

Path Planner ◽

The Cost

This article presents the results of the 2-year iroboapp research project that aims at devising path planning algorithms for large grid maps with much faster execution times while tolerating very small slacks with respect to the optimal path. We investigated both exact and heuristic methods. We contributed with the design, analysis, evaluation, implementation and experimentation of several algorithms for grid map path planning for both exact and heuristic methods. We also designed an innovative algorithm called relaxed A-star that has linear complexity with relaxed constraints, which provides near-optimal solutions with an extremely reduced execution time as compared to A-star. We evaluated the performance of the different algorithms and concluded that relaxed A-star is the best path planner as it provides a good trade-off among all the metrics, but we noticed that heuristic methods have good features that can be exploited to improve the solution of the relaxed exact method. This led us to design new hybrid algorithms that combine our relaxed A-star with heuristic methods which improve the solution quality of relaxed A-star at the cost of slightly higher execution time, while remaining much faster than A* for large-scale problems. Finally, we demonstrate how to integrate the relaxed A-star algorithm in the robot operating system as a global path planner and show that it outperforms its default path planner with an execution time 38% faster on average.

Download Full-text