An order of magnitude faster isosurface rendering in software on a PC than using dedicated, general purpose rendering hardware

G.J. Grevera; J.K. Udupa; D. Odhner

doi:10.1109/2945.895878

Order-of-magnitude faster isosurface rendering in software on a PC than using dedicated general-purpose rendering hardware

10.1117/12.349431 ◽

1999 ◽

Cited By ~ 3

Author(s):

George J. Grevera ◽

Jayaram K. Udupa ◽

Dewey Odhner

Keyword(s):

General Purpose ◽

Order Of Magnitude ◽

Isosurface Rendering

Download Full-text

A general-purpose time-step criterion for simulations with gravity

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa1453 ◽

2020 ◽

Vol 495 (4) ◽

pp. 4306-4313 ◽

Cited By ~ 1

Author(s):

Michael Y Grudić ◽

Philip F Hopkins

Keyword(s):

Time Scale ◽

General Purpose ◽

Gravitational Acceleration ◽

Time Step ◽

Simulation Code ◽

Time Stepping ◽

Dynamical Time ◽

Adaptive Time Step ◽

Order Of Magnitude ◽

True Time

ABSTRACT We describe a new adaptive time-step criterion for integrating gravitational motion, which uses the tidal tensor to estimate the local dynamical time-scale and scales the time-step proportionally. This provides a better candidate for a truly general-purpose gravitational time-step criterion than the usual prescription derived from the gravitational acceleration, which does not respect the equivalence principle, breaks down when $\boldsymbol {a}=0$, and does not obey the same dimensional scaling as the true time-scale of orbital motion. We implement the tidal time-step criterion in the simulation code gizmo, and examine controlled tests of collisionless galaxy and star cluster models, as well as galaxy merger simulations. The tidal criterion estimates the dynamical time faithfully, and generally provides a more efficient time-stepping scheme compared to an acceleration criterion. Specifically, the tidal criterion achieves order-of-magnitude smaller energy errors for the same number of force evaluations in potentials with inner profiles shallower than ρ ∝ r−1 (i.e. where $\boldsymbol {a}\rightarrow 0$), such as star clusters and cored galaxies. For a given problem these advantages must be weighed against the additional overhead of computing the tidal tensor on-the-fly, but in many cases this overhead is small.

Download Full-text

Simulation of Fire with a Gas Kinetic Scheme on Distributed GPGPU Architectures

Computation ◽

10.3390/computation8020050 ◽

2020 ◽

Vol 8 (2) ◽

pp. 50

Author(s):

Stephan Lenz ◽

Martin Geier ◽

Manfred Krafczyk

Keyword(s):

Graphics Processing Units ◽

Parallel Implementation ◽

Kinetic Scheme ◽

General Purpose ◽

Massively Parallel ◽

Small Scale ◽

Fire Dynamics ◽

Linear Interaction ◽

Turbulent Natural Convection ◽

Order Of Magnitude

The simulation of fire is a challenging task due to its occurrence on multiple space-time scales and the non-linear interaction of multiple physical processes. Current state-of-the-art software such as the Fire Dynamics Simulator (FDS) implements most of the required physics, yet a significant drawback of this implementation is its limited scalability on modern massively parallel hardware. The current paper presents a massively parallel implementation of a Gas Kinetic Scheme (GKS) on General Purpose Graphics Processing Units (GPGPUs) as a potential alternative modeling and simulation approach. The implementation is validated for turbulent natural convection against experimental data. Subsequently, it is validated for two simulations of fire plumes, including a small-scale table top setup and a fire on the scale of a few meters. We show that the present GKS achieves comparable accuracy to the results obtained by FDS. Yet, due to the parallel efficiency on dedicated hardware, our GKS implementation delivers a reduction of wall-clock times of more than an order of magnitude. This paper demonstrates the potential of explicit local schemes in massively parallel environments for the simulation of fire.

Download Full-text

Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A

Bioinformatics ◽

10.1093/bioinformatics/btaa272 ◽

2020 ◽

Vol 36 (13) ◽

pp. 3975-3981

Author(s):

Laurent David ◽

Riccardo Vicedomini ◽

Hugues Richard ◽

Alessandra Carbone

Keyword(s):

Simulated Data ◽

General Purpose ◽

Supplementary Information ◽

Confidence Measure ◽

Specific Domain ◽

Fast Evaluation ◽

Metagenomic Sequence ◽

Domain Annotation ◽

Functional Profiling ◽

Order Of Magnitude

Abstract Motivation The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases demands for approaches that rapidly ‘explore’ the content of multiple and/or large metagenomic datasets with respect to specific domain targets, avoiding full domain annotation and full assembly. Results S3A is a fast and accurate domain-targeted assembler designed for a rapid functional profiling. It is based on a novel construction and a fast traversal of the Overlap-Layout-Consensus graph, designed to reconstruct coding regions from domain annotated metagenomic sequence reads. S3A relies on high-quality domain annotation to efficiently assemble metagenomic sequences and on the design of a new confidence measure for a fast evaluation of overlapping reads. Its implementation is highly generic and can be applied to any arbitrary type of annotation. On simulated data, S3A achieves a level of accuracy similar to that of classical metagenomics assembly tools while permitting to conduct a faster and sensitive profiling on domains of interest. When studying a few dozens of functional domains—a typical scenario—S3A is up to an order of magnitude faster than general purpose metagenomic assemblers, thus enabling the analysis of a larger number of datasets in the same amount of time. S3A opens new avenues to the fast exploration of the rapidly increasing number of metagenomic datasets displaying an ever-increasing size. Availability and implementation S3A is available at http://www.lcqb.upmc.fr/S3A_ASSEMBLER/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Estimation of Absolute Number Densities from Shapes of Atomic Fluorescence Curves of Growth

Applied Spectroscopy ◽

10.1366/0003702874448625 ◽

1987 ◽

Vol 41 (4) ◽

pp. 613-620 ◽

Cited By ~ 5

Author(s):

B. W. Smith ◽

M. J. Rutledge ◽

J. D. Winefordner

Keyword(s):

Spectral Width ◽

Absolute Number ◽

General Purpose ◽

Number Density ◽

Density Region ◽

Atomic Fluorescence ◽

Damping Parameter ◽

Wide Range ◽

Order Of Magnitude ◽

General Purpose Computer

A general-purpose computer program has been developed to calculate atomic fluorescence curves of growth (COG) for a wide variety of cases and particularly for the experimentally interesting cases where the excitation source spectral bandwidth is not much different from the absorption linewidth in typical flames and plasmas. Calculations over a wide range of the variables which affect the shape of the COG in the high-number-density region show that the point of departure from linearity can be used to predict the absolute number density in the atomizer cell. For resonance atomic fluorescence, the point at which the experimental curve of growth is twofold below the low-density linear asymptote invariably occurs at a k0 L (peak absorption coefficient × absorption pathlength) product of 2 ± 1. Number densities can easily be determined to within an order of magnitude accuracy, even when such variables as source spectral width, collection geometry, and damping parameter are totally unknown. In favorable circumstances, where such variables are well known, accuracies of ±10% may be obtained. The calculated curves of growth are in excellent agreement with several COGs which have been experimentally obtained.

Download Full-text

Hybrid techniques for generation of arbitrary functions

SIMULATION ◽

10.1177/003754976600700614 ◽

1966 ◽

Vol 7 (6) ◽

pp. 293-308 ◽

Cited By ~ 6

Author(s):

Arthur I. Rubin

Keyword(s):

Continuous Function ◽

Hybrid Method ◽

Arbitrary Function ◽

General Purpose ◽

Hybrid Techniques ◽

Order Of Magnitude ◽

True Hybrid ◽

Discrete Values ◽

Function Of Two Variables ◽

Digital Processor

A discussion of the implementation of a true hybrid func tion generator module is presented. Included is a detailed description of how two of these modules, when used in conjunction with a hybrid computer, are used simultane ously to generate a continuous function of two variables, and how four modules are used to generate a continuous function of three variables. From a variety of possible con figurations, an optimum choice for a general-purpose HFG module is made, taking into account digital coefficient storage requirements and analog multiplier requirements. A detailed error analysis compares the errors produced in generating an arbitrary function using the HFG module with the errors produced by the classical hybrid (or purely digital) method of generating an arbitrary function by table look-up and interpolation for discrete values of the argu ment. The conclusion is reached that the hybrid method can be an order of magnitude more accurate than the classical discrete method for generating arbitrary functions and has further advantages in that the hybrid method does not require a 100°lo duty cycle in the digital processor for accurate function generation.

Download Full-text

Graphics card processing: accelerating profile-profile alignment

Open Computer Science ◽

10.2478/s13537-012-0033-5 ◽

2012 ◽

Vol 2 (4) ◽

Author(s):

Muhammad Hanif ◽

Karl-Heinz Zimmermann

Keyword(s):

Molecular Biology ◽

General Purpose ◽

Matrix Product ◽

Fundamental Operation ◽

Computational Power ◽

Multiple Sequence ◽

Average Speed ◽

Order Of Magnitude ◽

Speed Up ◽

Profile Alignment

AbstractAlignment is the fundamental operation in molecular biology for comparing biomolecular sequences. The most widely used method for aligning groups of alignments is based on the alignment of the profiles corresponding to the groups. We show that profile-profile alignment can be significantly speeded up by general purpose computing on a modern commodity graphics card. Wavefront and matrix-matrix product approaches for implementing profile-profile alignment onto graphics processor are analyzed. The average speed-up obtained is one order of magnitude even when overheads are considered. Thus the computational power of graphics cards can be exploited to develop improved solutions for multiple sequence alignment.

Download Full-text

Efficient SIMDization and Data Management of the Lattice QCD Computation on the Cell Broadband Engine

Scientific Programming ◽

10.1155/2009/634756 ◽

2009 ◽

Vol 17 (1-2) ◽

pp. 153-172 ◽

Cited By ~ 2

Author(s):

Khaled Z. Ibrahim ◽

François Bodin

Keyword(s):

Lattice Qcd ◽

General Purpose ◽

Double Precision ◽

Limited Bandwidth ◽

Cell Broadband Engine ◽

Order Of Magnitude ◽

Simd Execution ◽

Local Store ◽

Time Continuum ◽

Better Than

Lattice Quantum Chromodynamic (QCD) models subatomic interactions based on a four-dimensional discretized space–time continuum. The Lattice QCD computation is one of the grand challenges in physics especially when modeling a lattice with small spacing. In this work, we study the implementation of the main kernel routine of Lattice QCD that dominates the execution time on the Cell Broadband Engine. We tackle the problem of efficient SIMD execution and the problem of limited bandwidth for data transfers with the off-chip memory. For efficient SIMD execution, we present runtime data fusion technique that groups data processed similarly at runtime. We also introduce analysis needed to reduce the pressure on the scarce memory bandwidth that limits the performance of this computation. We studied two implementations for the main kernel routine that exhibit different patterns of accessing the memory and thus allowing different sets of optimizations. We show the attributes that make one implementation more favorable in terms of performance. For lattice size that is significantly larger than the local store, our implementation achieves 31.2 GFlops for single precision computations and 16.6 GFlops for double precision computations on the PowerXCell 8i, an order of magnitude better than the performance achieved on most general-purpose processors.

Download Full-text

CoLoRd: Compressing long reads

10.1101/2021.07.17.452767 ◽

2021 ◽

Author(s):

Marek Kokot ◽

Adam Gudys ◽

Heng Li ◽

Sebastian Deorowicz

Keyword(s):

General Purpose ◽

Third Generation ◽

Sequencing Data ◽

The Third ◽

Third Generation Sequencing ◽

Long Reads ◽

Order Of Magnitude ◽

Generation Sequencing

The costs of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomics. In spite of the increasing popularity of the third generation sequencing, the existing algorithms for compressing long reads exhibit minor advantage over general purpose gzip. We present CoLoRd, an algorithm able to reduce 3rd generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyzes.

Download Full-text

Exploring the Future of Out-of-Core Computing with Compute-Local Non-Volatile Memory

Scientific Programming ◽

10.1155/2014/303810 ◽

2014 ◽

Vol 22 (2) ◽

pp. 125-139 ◽

Cited By ~ 1

Author(s):

Myoungsoo Jung ◽

Ellis H. Wilson ◽

Wonil Choi ◽

John Shalf ◽

Hasan Metin Aktulga ◽

...

Keyword(s):

High Performance ◽

File Systems ◽

Network Capacity ◽

General Purpose ◽

Graphical Processing Units ◽

Non Volatile Memory ◽

Order Of Magnitude ◽

Volatile Memory ◽

Graphical Processing ◽

Point To Point

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.

Download Full-text