scholarly journals From FORTRAN 77 to Locality-Aware High Productivity Languages for Peta-Scale Computing

2007 ◽  
Vol 15 (1) ◽  
pp. 45-65
Author(s):  
Hans P. Zima

When the first specification of the FORTRAN language was released in 1956, the goal was to provide an "automatic programming system" that would enhance the economy of programming by replacing assembly language with a notation closer to the domain of scientific programming. A key issue in this context, explicitly recognized by the authors of the language, was the requirement to produce efficient object programs that could compete with their hand-coded counterparts. More than 50 years later, a similar situation exists with respect to finding the right programming paradigm for high performance computing systems. FORTRAN, as the traditional language for scientific programming, has played a major role in the quest for high-productivity programming languages that satisfy very strict performance constraints. This paper focuses on high-level support for locality awareness, one of the most important requirements in this context. The discussion centers on the High Performance Fortran (HPF) family of languages, and their influence on current language developments for peta-scale computing. HPF is a data-parallel language that was designed to provide the user with a high-level interface for programming scientific applications, while delegating to the compiler the task of generating an explicitly parallel message-passing program. We outline developments that led to HPF, explain its major features, identify a set of weaknesses, and discuss subsequent languages that address these problems. The final part of the paper deals with Chapel, a modern object-oriented language developed in the High Productivity Computing Systems (HPCS) program sponsored by DARPA. A salient property of Chapel is its general framework for the support of user-defined distributions, which is related in many ways to ideas first described in Vienna Fortran. This framework is general enough to allow a concise specification of sparse data distributions. The paper concludes with an outlook to future research in this area.

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1275
Author(s):  
Changdao Du ◽  
Yoshiki Yamaguchi

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.


2021 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum

<p>Until recently, our pure Python, primitive equation ocean model Veros <br>has been about 1.5x slower than a corresponding Fortran implementation. <br>But thanks to a thriving scientific and machine learning library <br>ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are <br>within reach. Leveraging Google's JAX library, we find that our Python <br>model code can reach a 2-5 times higher energy efficiency on GPU <br>compared to a traditional Fortran model.</p><p>Therefore, we propose a new generation of geophysical models: One that <br>combines high-level abstractions and user friendliness on one hand, and <br>that leverages modern developments in high-performance computing and <br>machine learning research on the other hand.</p><p>We discuss what there is to gain from building models in high-level <br>programming languages, what we have achieved in Veros, and where we see <br>the modelling community heading in the future.</p>


Author(s):  
JOST BERTHOLD ◽  
HANS-WOLFGANG LOIDL ◽  
KEVIN HAMMOND

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.


2020 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum ◽  
Brian Vinter

<div>So far, our pure Python, primitive equation ocean model Veros has been</div><div>about 50% slower than a corresponding Fortran implementation. But recent</div><div>benchmarks show that, thanks to a thriving scientific and machine</div><div>learning library ecosystem, tremendous speed-ups on GPU, and to a lesser</div><div>degree CPU, are within reach. On GPU, we find that the same model code</div><div>can reach a 2-5 times higher energy efficiency compared to a traditional</div><div>Fortran model.</div><div>We thus propose a new generation of geophysical models. One that</div><div>combines high-level abstractions and user friendliness on one hand, and</div><div>that leverages modern developments in high-performance computing on the</div><div>other hand.</div><div>We discuss what there is to gain from building models in high-level</div><div>programming languages, what we have achieved, and what the future holds</div><div>for us and the modelling community.</div>


Symmetry ◽  
2020 ◽  
Vol 12 (6) ◽  
pp. 1029
Author(s):  
Anabi Hilary Kelechi ◽  
Mohammed H. Alsharif ◽  
Okpe Jonah Bameyi ◽  
Paul Joan Ezra ◽  
Iorshase Kator Joseph ◽  
...  

Power-consuming entities such as high performance computing (HPC) sites and large data centers are growing with the advance in information technology. In business, HPC is used to enhance the product delivery time, reduce the production cost, and decrease the time it takes to develop a new product. Today’s high level of computing power from supercomputers comes at the expense of consuming large amounts of electric power. It is necessary to consider reducing the energy required by the computing systems and the resources needed to operate these computing systems to minimize the energy utilized by HPC entities. The database could improve system energy efficiency by sampling all the components’ power consumption at regular intervals and the information contained in a database. The information stored in the database will serve as input data for energy-efficiency optimization. More so, device workload information and different usage metrics are stored in the database. There has been strong momentum in the area of artificial intelligence (AI) as a tool for optimizing and processing automation by leveraging on already existing information. This paper discusses ideas for improving energy efficiency for HPC using AI.


1999 ◽  
Vol 7 (1) ◽  
pp. 67-81 ◽  
Author(s):  
Siegfried Benkner

High Performance Fortran (HPF) offers an attractive high‐level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and delegating to the compiler the task of generating an explicitly parallel program. Available HPF compilers can handle regular codes quite efficiently, but dramatic performance losses may be encountered for applications which are based on highly irregular, dynamically changing data structures and access patterns. In this paper we introduce the Vienna Fortran Compiler (VFC), a new source‐to‐source parallelization system for HPF+, an optimized version of HPF, which addresses the requirements of irregular applications. In addition to extended data distribution and work distribution mechanisms, HPF+ provides the user with language features for specifying certain information that decisively influence a program’s performance. This comprises data locality assertions, non‐local access specifications and the possibility of reusing runtime‐generated communication schedules of irregular loops. Performance measurements of kernels from advanced applications demonstrate that with a high‐level data parallel language such as HPF+ a performance close to hand‐written message‐passing programs can be achieved even for highly irregular codes.


2013 ◽  
Vol 2013 (1) ◽  
pp. 000753-000757
Author(s):  
Thomas A. Wassick

Over the past few years, lead - free solder interconnects have been significantly incorporated into electronic products, and are increasingly found in high performance computing systems and in their associated power electronics. As power and current levels increase within these products, the overall reliability of a lead-free solder based system can be impacted by an increasing risk of finding electromigration (EM) degradation during the product lifetime, especially if the product is operating at higher temperatures and with very high current densities. This paper provides a high-level technical overview of lead-free electromigration and describes the key factors and issues that can influence the EM performance of lead-free interconnects, especially in the environments in which power electronics are typically found.


2021 ◽  
Vol 16 (9) ◽  
pp. 1934578X2110352
Author(s):  
Xian Zhou ◽  
Declan Power ◽  
Andrew Jones ◽  
Agustín Acquaviva ◽  
Gary R. Dennis ◽  
...  

Reaction flow (RF) chromatography is a powerful and efficient approach that utilizes conventional high-performance liquid chromatography (HPLC)–ultraviolet (UV)–visible detection. This technique exploits a novel column end-fitting and an extra HPLC pump that delivers a reagent specific for selective detection, in particular the antioxidant profiling of natural products. This study employed RF for the first time to identify antioxidants in a commercial ginger sample. This demonstrated the previously validated assay's ease and power to extract information about the natural product's antioxidant properties. Due to the simplicity involved with data analysis and peak matching process, the following information was revealed between the chemical and antioxidant profiles: three of the strongest antioxidant activity peaks in the ginger sample (593 nm) did not correlate with the three most abundant chemical profile peaks (UV absorbance at 254 and 280 nm); the ratio of seven antioxidant peaks may be potentially used for food authenticity purposes, and future research should target these peaks for the early discovery of novel antioxidants sourced in ginger. Utilization of this previously validated assay provided the resolution of numerous peaks in the ginger extract and information associated with their antioxidant attributes and chemical abundance. This approach is more informative than total antioxidant assays that lack compound specificity information. Furthermore, it is superior to mass spectrometric (MS) assays that cannot evaluate each compound's antioxidant strength, and does not involve the expense involved in the acquisition and maintenance of the MS detection hardware, and does not require the high level of expertise needed to conduct the MS data analysis.


Sign in / Sign up

Export Citation Format

Share Document