Improving Execution Time of Parallel Programs on Large Scale Chip Multiprocessors with Constant Average Power Processing

Author(s):  
Kramer Straube ◽  
Christopher Nitta ◽  
Raj Amirtharajah ◽  
Matthew Farrens ◽  
Venkatesh Akella
Author(s):  
Adrian Jackson ◽  
Michèle Weiland

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.


Author(s):  
Nurcin Celik ◽  
Esfandyar Mazhari ◽  
John Canby ◽  
Omid Kazemi ◽  
Parag Sarfare ◽  
...  

Simulating large-scale systems usually entails exhaustive computational powers and lengthy execution times. The goal of this research is to reduce execution time of large-scale simulations without sacrificing their accuracy by partitioning a monolithic model into multiple pieces automatically and executing them in a distributed computing environment. While this partitioning allows us to distribute required computational power to multiple computers, it creates a new challenge of synchronizing the partitioned models. In this article, a partitioning methodology based on a modified Prim’s algorithm is proposed to minimize the overall simulation execution time considering 1) internal computation in each of the partitioned models and 2) time synchronization between them. In addition, the authors seek to find the most advantageous number of partitioned models from the monolithic model by evaluating the tradeoff between reduced computations vs. increased time synchronization requirements. In this article, epoch- based synchronization is employed to synchronize logical times of the partitioned simulations, where an appropriate time interval is determined based on the off-line simulation analyses. A computational grid framework is employed for execution of the simulations partitioned by the proposed methodology. The experimental results reveal that the proposed approach reduces simulation execution time significantly while maintaining the accuracy as compared with the monolithic simulation execution approach.


2020 ◽  
Vol 10 (7) ◽  
pp. 2634
Author(s):  
JunWeon Yoon ◽  
TaeYoung Hong ◽  
ChanYeol Park ◽  
Seo-Young Noh ◽  
HeonChang Yu

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.


2003 ◽  
Vol 13 (04) ◽  
pp. 513-524 ◽  
Author(s):  
H. GAUTAMA ◽  
A. J. C. VAN GEMUND

Speculative parallelism refers to searching in parallel for a solution, such as finding a pattern in a data base, where finding the first solution terminates the whole parallel process. Different performance prediction methods are required as compared to traditional parallelism. In this paper we introduce an analytical approach to predict the execution time distribution of data-dependent parallel programs that feature N-ary and binary speculative parallel compositions. The method is based on the use of statistical moments which allows program execution time distribution to be approximated at O(1) solution complexity. Measurement results for synthetic distributions indicate an accuracy that lies in the percent range while for empirical distributions on internet search engines the prediction accuracy is acceptable, provided sufficient workload unimodality.


2003 ◽  
Vol 19 (5) ◽  
pp. 689-700
Author(s):  
Dieter Kranzlmüller ◽  
Nam Thoai ◽  
Jens Volkert

Author(s):  
Nurcin Celik ◽  
Esfandyar Mazhari ◽  
John Canby ◽  
Omid Kazemi ◽  
Parag Sarfare ◽  
...  

Simulating large-scale systems usually entails exhaustive computational powers and lengthy execution times. The goal of this research is to reduce execution time of large-scale simulations without sacrificing their accuracy by partitioning a monolithic model into multiple pieces automatically and executing them in a distributed computing environment. While this partitioning allows us to distribute required computational power to multiple computers, it creates a new challenge of synchronizing the partitioned models. In this article, a partitioning methodology based on a modified Prim’s algorithm is proposed to minimize the overall simulation execution time considering 1) internal computation in each of the partitioned models and 2) time synchronization between them. In addition, the authors seek to find the most advantageous number of partitioned models from the monolithic model by evaluating the tradeoff between reduced computations vs. increased time synchronization requirements. In this article, epoch- based synchronization is employed to synchronize logical times of the partitioned simulations, where an appropriate time interval is determined based on the off-line simulation analyses. A computational grid framework is employed for execution of the simulations partitioned by the proposed methodology. The experimental results reveal that the proposed approach reduces simulation execution time significantly while maintaining the accuracy as compared with the monolithic simulation execution approach.


2017 ◽  
Vol 14 (2) ◽  
pp. 172988141666366 ◽  
Author(s):  
Imen Chaari ◽  
Anis Koubaa ◽  
Hachemi Bennaceur ◽  
Adel Ammar ◽  
Maram Alajlan ◽  
...  

This article presents the results of the 2-year iroboapp research project that aims at devising path planning algorithms for large grid maps with much faster execution times while tolerating very small slacks with respect to the optimal path. We investigated both exact and heuristic methods. We contributed with the design, analysis, evaluation, implementation and experimentation of several algorithms for grid map path planning for both exact and heuristic methods. We also designed an innovative algorithm called relaxed A-star that has linear complexity with relaxed constraints, which provides near-optimal solutions with an extremely reduced execution time as compared to A-star. We evaluated the performance of the different algorithms and concluded that relaxed A-star is the best path planner as it provides a good trade-off among all the metrics, but we noticed that heuristic methods have good features that can be exploited to improve the solution of the relaxed exact method. This led us to design new hybrid algorithms that combine our relaxed A-star with heuristic methods which improve the solution quality of relaxed A-star at the cost of slightly higher execution time, while remaining much faster than A* for large-scale problems. Finally, we demonstrate how to integrate the relaxed A-star algorithm in the robot operating system as a global path planner and show that it outperforms its default path planner with an execution time 38% faster on average.


Sign in / Sign up

Export Citation Format

Share Document