scholarly journals A Fast CT Reconstruction Scheme for a General Multi-Core PC

2007 ◽  
Vol 2007 ◽  
pp. 1-9 ◽  
Author(s):  
Kai Zeng ◽  
Erwei Bai ◽  
Ge Wang

Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors.

Author(s):  
Franz Pichler ◽  
Gundolf Haase

A finite element code is developed in which all of the computationally expensive steps are performed on a graphics processing unit via the THRUST and the PARALUTION libraries. The code focuses on the simulation of transient problems where the repeated computations per time-step create the computational cost. It is used to solve partial and ordinary differential equations as they arise in thermal-runaway simulations of automotive batteries. The speed-up obtained by utilizing the graphics processing unit for every critical step is compared against the single core and the multi-threading solutions which are also supported by the chosen libraries. This way a high total speed-up on the graphics processing unit is achieved without the need for programming a single classical Compute Unified Device Architecture kernel.


Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2018 ◽  
Vol 10 (8) ◽  
pp. 168781401879434 ◽  
Author(s):  
Bing Xu ◽  
Yong Cai

The purpose of this article is to improve the convergence efficiency of the traditional efficient global optimization method. Furthermore, we try a graphics processing unit–based parallel computing method to improve the computing efficiency of the efficient global optimization method for both mathematical and practical engineering problems. First, we propose a multiple-data-based efficient global optimization algorithm instead of the multiple-surrogates-based efficient global optimization algorithm. Second, a novel graphics processing unit–based general-purpose computing technology is adopted to accelerate the solution efficiency of our multiple-data-based efficient global optimization algorithm. Third, a hybrid parallel computing approach using the OpenMP and compute unified device architecture is adopted to further improve the solution efficiency of forward problems in practical application. This is accomplished by integrating the graphics processing unit–based finite element method numerical analysis system into the optimization software. The numerical results show that for the same problem, the optimal result of the multiple-data-based efficient global optimization algorithm is consistently better than the multiple-surrogates-based efficient global optimization algorithm with the same optimization iterations. In addition, the graphics processing unit–based parallel simulation system helps in the reduction of the calculation time for practical engineering problems. The multiple-data-based efficient global optimization method performs stably in both high-order mathematical functions and large-scale nonlinear practical engineering optimization problems. An added benefit is that the computational time and accuracy are no longer obstacles.


Geophysics ◽  
2019 ◽  
Vol 84 (1) ◽  
pp. A13-A17 ◽  
Author(s):  
Fredrik Andersson ◽  
Johan Robertsson

We have developed simple, fast, and accurate algorithms for the linear Radon ([Formula: see text]-[Formula: see text]) transform and its inverse. The algorithms have an [Formula: see text] computational complexity in contrast to the [Formula: see text] cost of a direct implementation in 2D and an [Formula: see text] computational complexity compared to the [Formula: see text] cost of a direct implementation in 3D. The methods use Bluestein’s algorithm to evaluate discrete nonstandard Fourier sums, and they need, apart from the fast Fourier transform (FFT), only multiplication of chirp functions and their Fourier transforms. The computational cost and accuracy are thus reduced to that inherited by the FFT. Fully working algorithms can be implemented in a couple of lines of code. Moreover, we find that efficient graphics processing unit (GPU) implementations could achieve processing speeds of approximately [Formula: see text], implying that the algorithms are I/O bound rather than compute bound.


2020 ◽  
Vol 10 (24) ◽  
pp. 9121
Author(s):  
KyungWoon Cho ◽  
Hyokyung Bahn

GPGPU (General-Purpose Graphics Processing Unit) consists of hardware resources that can execute tens of thousands of threads simultaneously. However, in reality, the parallelism is limited as resource allocation is performed by the base unit called thread block, which is not managed judiciously in the current GPGPU systems. To schedule threads in GPGPU, a specialized hardware scheduler allocates thread blocks to the computing unit called SM (Stream Multiprocessors) in a Round-Robin manner. Although scheduling in hardware is simple and fast, we observe that the Round-Robin scheduling is not efficient in GPGPU, as it does not consider the workload characteristics of threads and the resource balance among SMs. In this article, we present a new thread block scheduling model that has the ability of analyzing and quantifying the performances of thread block scheduling. We implement our model as a GPGPU scheduling simulator and show that the conventional thread block scheduling provided in GPGPU hardware does not perform well as the workload becomes heavy. Specifically, we observe that the performance degradation of Round-Robin can be eliminated by adopting DFA (Depth First Allocation), which is simple but scalable. Moreover, as our simulator consists of modular forms based on the framework and we publicly open it for other researchers to use, various scheduling policies can be incorporated into our simulator for evaluating the performance of GPGPU schedulers.


2020 ◽  
Author(s):  
Xiao Wen ◽  
Xiang Chen ◽  
Decheng Wan

Abstract In this paper, a new multiphase MPS-GPU method is proposed through the combination of moving particle semi-implicit (MPS) method and Graphics Processing Unit (GPU) acceleration technique. The new method not only inherits the advantage of MPS method in capturing complex interface deformations, but also overcomes the limitations of huge computational cost in three-dimensional MPS simulation. By this method, both the two-layer-liquid and three-layer-liquids sloshing problems are simulated three-dimensionally on the GPU device, in which more than one million of particles are included. In simulations, the sloshing patterns of each liquid layer under different external excitations are accurately captured. From the interface elevations and impacting pressures calculated by present method, it is found that an obvious discrepancy exists between the deformations of free surface and phase interfaces. Then, the results obtained by multiphase MPS-GPU method are compared with experimental data and other numerical results in open literature and a good agreement is achieved, which validates the accuracy and applicability of the present method in three-dimensional simulations of multi-layer-liquid sloshing flows.


2007 ◽  
Author(s):  
Fredrick H. Rothganger ◽  
Kurt W. Larson ◽  
Antonio Ignacio Gonzales ◽  
Daniel S. Myers

2021 ◽  
Vol 22 (10) ◽  
pp. 5212
Author(s):  
Andrzej Bak

A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.


Sign in / Sign up

Export Citation Format

Share Document