An order of magnitude faster isosurface rendering in software on a PC than using dedicated, general purpose rendering hardware

2000 ◽  
Vol 6 (4) ◽  
pp. 335-345 ◽  
Author(s):  
G.J. Grevera ◽  
J.K. Udupa ◽  
D. Odhner
2020 ◽  
Vol 495 (4) ◽  
pp. 4306-4313 ◽  
Author(s):  
Michael Y Grudić ◽  
Philip F Hopkins

ABSTRACT We describe a new adaptive time-step criterion for integrating gravitational motion, which uses the tidal tensor to estimate the local dynamical time-scale and scales the time-step proportionally. This provides a better candidate for a truly general-purpose gravitational time-step criterion than the usual prescription derived from the gravitational acceleration, which does not respect the equivalence principle, breaks down when $\boldsymbol {a}=0$, and does not obey the same dimensional scaling as the true time-scale of orbital motion. We implement the tidal time-step criterion in the simulation code gizmo, and examine controlled tests of collisionless galaxy and star cluster models, as well as galaxy merger simulations. The tidal criterion estimates the dynamical time faithfully, and generally provides a more efficient time-stepping scheme compared to an acceleration criterion. Specifically, the tidal criterion achieves order-of-magnitude smaller energy errors for the same number of force evaluations in potentials with inner profiles shallower than ρ ∝ r−1 (i.e. where $\boldsymbol {a}\rightarrow 0$), such as star clusters and cored galaxies. For a given problem these advantages must be weighed against the additional overhead of computing the tidal tensor on-the-fly, but in many cases this overhead is small.


Computation ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 50
Author(s):  
Stephan Lenz ◽  
Martin Geier ◽  
Manfred Krafczyk

The simulation of fire is a challenging task due to its occurrence on multiple space-time scales and the non-linear interaction of multiple physical processes. Current state-of-the-art software such as the Fire Dynamics Simulator (FDS) implements most of the required physics, yet a significant drawback of this implementation is its limited scalability on modern massively parallel hardware. The current paper presents a massively parallel implementation of a Gas Kinetic Scheme (GKS) on General Purpose Graphics Processing Units (GPGPUs) as a potential alternative modeling and simulation approach. The implementation is validated for turbulent natural convection against experimental data. Subsequently, it is validated for two simulations of fire plumes, including a small-scale table top setup and a fire on the scale of a few meters. We show that the present GKS achieves comparable accuracy to the results obtained by FDS. Yet, due to the parallel efficiency on dedicated hardware, our GKS implementation delivers a reduction of wall-clock times of more than an order of magnitude. This paper demonstrates the potential of explicit local schemes in massively parallel environments for the simulation of fire.


2020 ◽  
Vol 36 (13) ◽  
pp. 3975-3981
Author(s):  
Laurent David ◽  
Riccardo Vicedomini ◽  
Hugues Richard ◽  
Alessandra Carbone

Abstract Motivation The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases demands for approaches that rapidly ‘explore’ the content of multiple and/or large metagenomic datasets with respect to specific domain targets, avoiding full domain annotation and full assembly. Results S3A is a fast and accurate domain-targeted assembler designed for a rapid functional profiling. It is based on a novel construction and a fast traversal of the Overlap-Layout-Consensus graph, designed to reconstruct coding regions from domain annotated metagenomic sequence reads. S3A relies on high-quality domain annotation to efficiently assemble metagenomic sequences and on the design of a new confidence measure for a fast evaluation of overlapping reads. Its implementation is highly generic and can be applied to any arbitrary type of annotation. On simulated data, S3A achieves a level of accuracy similar to that of classical metagenomics assembly tools while permitting to conduct a faster and sensitive profiling on domains of interest. When studying a few dozens of functional domains—a typical scenario—S3A is up to an order of magnitude faster than general purpose metagenomic assemblers, thus enabling the analysis of a larger number of datasets in the same amount of time. S3A opens new avenues to the fast exploration of the rapidly increasing number of metagenomic datasets displaying an ever-increasing size. Availability and implementation S3A is available at http://www.lcqb.upmc.fr/S3A_ASSEMBLER/. Supplementary information Supplementary data are available at Bioinformatics online.


1987 ◽  
Vol 41 (4) ◽  
pp. 613-620 ◽  
Author(s):  
B. W. Smith ◽  
M. J. Rutledge ◽  
J. D. Winefordner

A general-purpose computer program has been developed to calculate atomic fluorescence curves of growth (COG) for a wide variety of cases and particularly for the experimentally interesting cases where the excitation source spectral bandwidth is not much different from the absorption linewidth in typical flames and plasmas. Calculations over a wide range of the variables which affect the shape of the COG in the high-number-density region show that the point of departure from linearity can be used to predict the absolute number density in the atomizer cell. For resonance atomic fluorescence, the point at which the experimental curve of growth is twofold below the low-density linear asymptote invariably occurs at a k0 L (peak absorption coefficient × absorption pathlength) product of 2 ± 1. Number densities can easily be determined to within an order of magnitude accuracy, even when such variables as source spectral width, collection geometry, and damping parameter are totally unknown. In favorable circumstances, where such variables are well known, accuracies of ±10% may be obtained. The calculated curves of growth are in excellent agreement with several COGs which have been experimentally obtained.


SIMULATION ◽  
1966 ◽  
Vol 7 (6) ◽  
pp. 293-308 ◽  
Author(s):  
Arthur I. Rubin

A discussion of the implementation of a true hybrid func tion generator module is presented. Included is a detailed description of how two of these modules, when used in conjunction with a hybrid computer, are used simultane ously to generate a continuous function of two variables, and how four modules are used to generate a continuous function of three variables. From a variety of possible con figurations, an optimum choice for a general-purpose HFG module is made, taking into account digital coefficient storage requirements and analog multiplier requirements. A detailed error analysis compares the errors produced in generating an arbitrary function using the HFG module with the errors produced by the classical hybrid (or purely digital) method of generating an arbitrary function by table look-up and interpolation for discrete values of the argu ment. The conclusion is reached that the hybrid method can be an order of magnitude more accurate than the classical discrete method for generating arbitrary functions and has further advantages in that the hybrid method does not require a 100°lo duty cycle in the digital processor for accurate function generation.


2012 ◽  
Vol 2 (4) ◽  
Author(s):  
Muhammad Hanif ◽  
Karl-Heinz Zimmermann

AbstractAlignment is the fundamental operation in molecular biology for comparing biomolecular sequences. The most widely used method for aligning groups of alignments is based on the alignment of the profiles corresponding to the groups. We show that profile-profile alignment can be significantly speeded up by general purpose computing on a modern commodity graphics card. Wavefront and matrix-matrix product approaches for implementing profile-profile alignment onto graphics processor are analyzed. The average speed-up obtained is one order of magnitude even when overheads are considered. Thus the computational power of graphics cards can be exploited to develop improved solutions for multiple sequence alignment.


2009 ◽  
Vol 17 (1-2) ◽  
pp. 153-172 ◽  
Author(s):  
Khaled Z. Ibrahim ◽  
François Bodin

Lattice Quantum Chromodynamic (QCD) models subatomic interactions based on a four-dimensional discretized space–time continuum. The Lattice QCD computation is one of the grand challenges in physics especially when modeling a lattice with small spacing. In this work, we study the implementation of the main kernel routine of Lattice QCD that dominates the execution time on the Cell Broadband Engine. We tackle the problem of efficient SIMD execution and the problem of limited bandwidth for data transfers with the off-chip memory. For efficient SIMD execution, we present runtime data fusion technique that groups data processed similarly at runtime. We also introduce analysis needed to reduce the pressure on the scarce memory bandwidth that limits the performance of this computation. We studied two implementations for the main kernel routine that exhibit different patterns of accessing the memory and thus allowing different sets of optimizations. We show the attributes that make one implementation more favorable in terms of performance. For lattice size that is significantly larger than the local store, our implementation achieves 31.2 GFlops for single precision computations and 16.6 GFlops for double precision computations on the PowerXCell 8i, an order of magnitude better than the performance achieved on most general-purpose processors.


2021 ◽  
Author(s):  
Marek Kokot ◽  
Adam Gudys ◽  
Heng Li ◽  
Sebastian Deorowicz

The costs of maintaining exabytes of data produced by sequencing experiments every year has become a major issue in today's genomics. In spite of the increasing popularity of the third generation sequencing, the existing algorithms for compressing long reads exhibit minor advantage over general purpose gzip. We present CoLoRd, an algorithm able to reduce 3rd generation sequencing data by an order of magnitude without affecting the accuracy of downstream analyzes.


2014 ◽  
Vol 22 (2) ◽  
pp. 125-139 ◽  
Author(s):  
Myoungsoo Jung ◽  
Ellis H. Wilson ◽  
Wonil Choi ◽  
John Shalf ◽  
Hasan Metin Aktulga ◽  
...  

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.


Sign in / Sign up

Export Citation Format

Share Document