GPU Accelerated Path Tracing of Massive Scenes

Milan Jaroš; Lubomír Říha; Petr Strakoš; Matěj Špeťko

doi:10.1145/3447807

GPU Accelerated Path Tracing of Massive Scenes

ACM Transactions on Graphics ◽

10.1145/3447807 ◽

2021 ◽

Vol 40 (2) ◽

pp. 1-17

Author(s):

Milan Jaroš ◽

Lubomír Říha ◽

Petr Strakoš ◽

Matěj Špeťko

Keyword(s):

Data Structures ◽

Memory Management ◽

Memory Access ◽

Minimal Effect ◽

Proof Of Concept ◽

Access Pattern ◽

Multiple Gpus ◽

Management Level ◽

Path Tracing ◽

Memory Accesses

This article presents a solution to path tracing of massive scenes on multiple GPUs. Our approach analyzes the memory access pattern of a path tracer and defines how the scene data should be distributed across up to 16 GPUs with minimal effect on performance. The key concept is that the parts of the scene that have the highest amount of memory accesses are replicated on all GPUs. We propose two methods for maximizing the performance of path tracing when working with partially distributed scene data. Both methods work on the memory management level and therefore path tracer data structures do not have to be redesigned, making our approach applicable to other path tracers with only minor changes in their code. As a proof of concept, we have enhanced the open-source Blender Cycles path tracer. The approach was validated on scenes of sizes up to 169 GB. We show that only 1–5% of the scene data needs to be replicated to all machines for such large scenes. On smaller scenes we have verified that the performance is very close to rendering a fully replicated scene. In terms of scalability we have achieved a parallel efficiency of over 94% using up to 16 GPUs.

Download Full-text

Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit

ACM Transactions on Modeling and Performance Evaluation of Computing Systems ◽

10.1145/3433687 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-28

Author(s):

Eduardo H. M. Cruz ◽

Matthias Diener ◽

Laércio L. Pilla ◽

Philippe O. A. Navaux

Keyword(s):

Energy Efficiency ◽

Memory Management ◽

Substantial Reduction ◽

Management Unit ◽

Memory Access ◽

Parallel Applications ◽

Data Mapping ◽

Wide Range ◽

Memory Accesses ◽

Level Parallelism

Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings. In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.

Download Full-text

Memory access behavior of dynamically allocated data structures and programs with irregular access patterns

10.32920/ryerson.14648538.v1 ◽

2021 ◽

Author(s):

Zhen Yu

Keyword(s):

Data Structures ◽

Memory Management ◽

Memory Hierarchy ◽

Research Work ◽

Memory Access ◽

Mathematical Formula ◽

Dynamic Memory ◽

Dynamic Memory Management ◽

Management Policies ◽

Access Patterns

With the development of modern computers, memory latencies have become a key bottleneck for the performance of computer systems. Since then, much research work has targeted improving the performance of memory hierarchy. In this thesis, we examine the behavior of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP). DADS and PIAP use dynamic memory management or algorithms with unpredictable behaviour. By simulating some applications of dynamically allocated data structures (DADS) and programs with irregular access patterns (PIAP), it is found that general cache management policies can not effectively use the treasurable cache resources for DADS and PIAP. We explored the use of mathematical formula applied to signal processing to improve the performance of memory hierarchy.

Download Full-text

Memory access behavior of dynamically allocated data structures and programs with irregular access patterns

10.32920/ryerson.14648538 ◽

2021 ◽

Author(s):

Zhen Yu

Keyword(s):

Data Structures ◽

Memory Management ◽

Memory Hierarchy ◽

Research Work ◽

Memory Access ◽

Mathematical Formula ◽

Dynamic Memory ◽

Dynamic Memory Management ◽

Management Policies ◽

Access Patterns

Download Full-text

DATA STRUCTURES AND MEMORY MANAGEMENT IN THE FREERTOS AND PREDICATE ОS OPERATING SYSTEMS

Problems of Modeling and Design Automatization ◽

10.31474/2074-7888-2019-1-43-52 ◽

2019 ◽

Vol 21 (1) ◽

pp. 43-52

Author(s):

K. S. Haiduk ◽

◽

O. H. Shevchenko ◽

Keyword(s):

Operating Systems ◽

Data Structures ◽

Memory Management

Download Full-text

On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale

Proceedings of the Workshop on Memory Centric High Performance Computing ◽

10.1145/3286475.3286477 ◽

2018 ◽

Author(s):

Aleix Roca Nonell ◽

Balazs Gerofi ◽

Leonardo Bautista-Gomez ◽

Dominique Martinet ◽

Vicenç Beltran Querol ◽

...

Keyword(s):

Memory Management ◽

Memory Access

Download Full-text

Subblock-Based BPE Scheme to Conquer Mismatch in Memory Access Pattern

2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing ◽

10.1109/iih-msp.2008.171 ◽

2008 ◽

Author(s):

Bao-Feng Li ◽

Yong Dou

Keyword(s):

Memory Access ◽

Access Pattern

Download Full-text

PINstruct – Efficient Memory Access to Data Structures

Facing the Multicore-Challenge III - Lecture Notes in Computer Science ◽

10.1007/978-3-642-35893-7_14 ◽

2013 ◽

pp. 127-128

Author(s):

Rainer Keller ◽

Shiqing Fan

Keyword(s):

Data Structures ◽

Memory Access ◽

Access To Data ◽

Efficient Memory

Download Full-text

Nonblocking memory management support for dynamic-sized data structures

ACM Transactions on Computer Systems ◽

10.1145/1062247.1062249 ◽

2005 ◽

Vol 23 (2) ◽

pp. 146-196 ◽

Cited By ~ 58

Author(s):

Maurice Herlihy ◽

Victor Luchangco ◽

Paul Martin ◽

Mark Moir

Keyword(s):

Data Structures ◽

Memory Management ◽

Management Support

Download Full-text

Evaluation of Memory Access Pattern Protection in a Practical Setting

10.14257/astl.2014.48.01 ◽

2014 ◽

Cited By ~ 1

Author(s):

Yuto Nakano ◽

Shinsaku Kiyomoto ◽

Yutaka Miyake

Keyword(s):

Memory Access ◽

Access Pattern

Download Full-text

Data structures access model for remote shared memory

E3S Web of Conferences ◽

10.1051/e3sconf/202124407001 ◽

2021 ◽

Vol 244 ◽

pp. 07001

Author(s):

Anatoliy Nyrkov ◽

Konstantin Ianiushkin ◽

Andrey Nyrkov ◽

Yulia Romanova ◽

Vagiz Gaskarov

Keyword(s):

Shared Memory ◽

Data Structures ◽

Data Model ◽

High Performance ◽

Direct Memory Access ◽

Performance Comparison ◽

Memory Access ◽

Memory Storage ◽

Race Conditions ◽

Performance Computing

Recent achievements in high-performance computing significantly narrow the performance gap between single and multi-node computing, and open up opportunities for systems with remote shared memory. The combination of in-memory storage, remote direct memory access and remote calls requires rethinking how data organized, protected and queried in distributed systems. Reviewed models let us implement new interpretations of distributed algorithms allowing us to validate different approaches to avoid race conditions, decrease resource acquisition or synchronization time. In this paper, we describe the data model for mixed memory access with analysis of optimized data structures. We also provide the result of experiments, which contain a performance comparison of data structures, operating with different approaches, evaluate the limitations of these models, and show that the model does not always meet expectations. The purpose of this paper to assist developers in designing data structures that will help to achieve architectural benefits or improve the design of existing distributed system.

Download Full-text