Author(s):  
Jack Dongarra ◽  
Laura Grigori ◽  
Nicholas J. Higham

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


Systems ◽  
2019 ◽  
Vol 7 (1) ◽  
pp. 6
Author(s):  
Allen D. Parks ◽  
David J. Marchette

The Müller-Wichards model (MW) is an algebraic method that quantitatively estimates the performance of sequential and/or parallel computer applications. Because of category theory’s expressive power and mathematical precision, a category theoretic reformulation of MW, i.e., CMW, is presented in this paper. The CMW is effectively numerically equivalent to MW and can be used to estimate the performance of any system that can be represented as numerical sequences of arithmetic, data movement, and delay processes. The CMW fundamental symmetry group is introduced and CMW’s category theoretic formalism is used to facilitate the identification of associated model invariants. The formalism also yields a natural approach to dividing systems into subsystems in a manner that preserves performance. Closed form models are developed and studied statistically, and special case closed form models are used to abstractly quantify the effect of parallelization upon processing time vs. loading, as well as to establish a system performance stationary action principle.


2016 ◽  
Vol 12 (1) ◽  
pp. 1-17 ◽  
Author(s):  
Stephanie N. Jones ◽  
Ahmed Amer ◽  
Ethan L. Miller ◽  
Darrell D. E. Long ◽  
Rekha Pitchumani ◽  
...  
Keyword(s):  

Author(s):  
Isaac Sánchez Barrera ◽  
Miquel Moretó ◽  
Eduard Ayguadé ◽  
Jesús Labarta ◽  
Mateo Valero ◽  
...  

Author(s):  
Daqi Lin ◽  
Elena Vasiou ◽  
Cem Yuksel ◽  
Daniel Kopta ◽  
Erik Brunvand

Bounding volume hierarchies (BVH) are the most widely used acceleration structures for ray tracing due to their high construction and traversal performance. However, the bounding planes shared between parent and children bounding boxes is an inherent storage redundancy that limits further improvement in performance due to the memory cost of reading these redundant planes. Dual-split trees can create identical space partitioning as BVHs, but in a compact form using less memory by eliminating the redundancies of the BVH structure representation. This reduction in memory storage and data movement translates to faster ray traversal and better energy efficiency. Yet, the performance benefits of dual-split trees are undermined by the processing required to extract the necessary information from their compact representation. This involves bit manipulations and branching instructions which are inefficient in software. We introduce hardware acceleration for dual-split trees and show that the performance advantages over BVHs are emphasized in a hardware ray tracing context that can take advantage of such acceleration. We provide details on how the operations needed for decoding dual-split tree nodes can be implemented in hardware and present experiments in a number of scenes with different sizes using path tracing. In our experiments, we have observed up to 31% reduction in render time and 38% energy saving using dual-split trees as compared to binary BVHs representing identical space partitioning.


2017 ◽  
Vol 46 (2) ◽  
pp. 207-224
Author(s):  
Ge Zhang ◽  
Wenwen Zhang ◽  
Subhrajit Guhathakurta ◽  
Nisha Botchwey

Open data have come of age with many cities, states, and other jurisdictions joining the open data movement by offering relevant information about their communities for free and easy access to the public. Despite the growing volume of open data, their use has been limited in planning scholarship and practice. The bottleneck is often the format in which the data are available and the organization of such data, which may be difficult to incorporate in existing analytical tools. The overall goal of this research is to develop an open data-based community planning support system that can collect related open data, analyze the data for specific objectives, and visualize the results to improve usability. To accomplish this goal, this study undertakes three research tasks. First, it describes the current state of open data analysis efforts in the community planning field. Second, it examines the challenges analysts experience when using open data in planning analysis. Third, it develops a new flow-based planning support system for examining neighborhood quality of life and health for the City of Atlanta as a prototype, which addresses many of these open data challenges.


Author(s):  
PÅL HALVORSEN ◽  
TOM ANDERS DALSENG ◽  
CARSTEN GRIWODZ

Distributed multimedia streaming systems are increasingly popular due to technological advances, and numerous streaming services are available today. On servers or proxy caches, there is a huge scaling challenge in supporting thousands of concurrent users that request delivery of high-rate, time-dependent data like audio and video, because this requires transfers of large amounts of data through several sub-systems within a streaming node. Unnecessary copy operations in the data path can therefore contribute significantly to the resource consumption of streaming operations. Despite previous research, off-the-shelf operating systems have only limited support for data paths that have been optimized for streaming. Additionally, system call overhead has grown with newer operating systems editions, adding to the cost of data movement. Frequently, it is argued that these issues can be ignored because of the continuing growth of CPU speeds. However, such an argument fails to take problems of modern streaming systems into account. The dissipation of heat generated by disks and high-end CPUs is a major problem of data centers, which would be alleviated if less power-hungry CPUs could be used. The power budget of mobile devices, which are increasingly used for streaming as well, is tight, and reduced power consumption an important issue. In this paper, we prove that these operations consume a large amount of resources, and we therefore revisit the data movement problem and provide a comprehensive evaluation of possible streaming data I/O paths in the Linux 2.6 kernel. We have implemented and evaluated several enhanced mechanisms and show how to provide support for more efficient memory usage and reduction of user/kernel space switches for content download and streaming applications. In particular, we are able to reduce the CPU usage by approximately 27% compared to the best approach without kernel modifications, by removing copy operations and system calls for a streaming scenario in which RTP headers must be added to stored data for sequence numbers and timing.


Sign in / Sign up

Export Citation Format

Share Document