True 4D Image Denoising on the GPU

International Journal of Biomedical Imaging ◽

10.1155/2011/952819 ◽

2011 ◽

Vol 2011 ◽

pp. 1-16 ◽

Cited By ~ 14

Author(s):

Anders Eklund ◽

Mats Andersson ◽

Hans Knutsson

Keyword(s):

Image Denoising ◽

Processing Time ◽

Graphics Processing Unit ◽

Spatial Filtering ◽

Processing Unit ◽

Clinical Value ◽

Ct Data ◽

Graphics Processing ◽

Common Application ◽

Work Done

The use of image denoising techniques is an important part of many medical imaging applications. One common application is to improve the image quality of low-dose (noisy) computed tomography (CT) data. While 3D image denoising previously has been applied to several volumes independently, there has not been much work done on true 4D image denoising, where the algorithm considers several volumes at the same time. The problem with 4D image denoising, compared to 2D and 3D denoising, is that the computational complexity increases exponentially. In this paper we describe a novel algorithm for true 4D image denoising, based on local adaptive filtering, and how to implement it on the graphics processing unit (GPU). The algorithm was applied to a 4D CT heart dataset of the resolution 512 × 512 × 445 × 20. The result is that the GPU can complete the denoising in about 25 minutes if spatial filtering is used and in about 8 minutes if FFT-based filtering is used. The CPU implementation requires several days of processing time for spatial filtering and about 50 minutes for FFT-based filtering. The short processing time increases the clinical value of true 4D image denoising significantly.

Download Full-text

Acceleration of a CFD Code with a GPU

Scientific Programming ◽

10.1155/2010/564806 ◽

2010 ◽

Vol 18 (3-4) ◽

pp. 193-201 ◽

Cited By ~ 11

Author(s):

Dennis C. Jespersen

Keyword(s):

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Graphics Processing Unit ◽

Computational Time ◽

Processing Unit ◽

Graphics Processing ◽

Work Done

The Computational Fluid Dynamics code OVERFLOW includes as one of its solver options an algorithm which is a fairly small piece of code but which accounts for a significant portion of the total computational time. This paper studies some of the issues in accelerating this piece of code by using a Graphics Processing Unit (GPU). The algorithm needs to be modified to be suitable for a GPU and attention needs to be given to 64-bit and 32-bit arithmetic. Interestingly, the work done for the GPU produced ideas for accelerating the CPU code and led to significant speedup on the CPU.

Download Full-text

TIME-DOMAIN INTERPOLATION ON GRAPHICS PROCESSING UNIT

Journal of Innovative Optical Health Sciences ◽

10.1142/s1793545811001277 ◽

2011 ◽

Vol 04 (01) ◽

pp. 89-95 ◽

Cited By ~ 3

Author(s):

XIQI LI ◽

GUOHUA SHI ◽

YUDONG ZHANG

Keyword(s):

Signal Processing ◽

Real Time ◽

Time Domain ◽

Processing Time ◽

Graphics Processing Unit ◽

Interpolation Method ◽

Time Signal ◽

Processing Unit ◽

Graphics Processing ◽

Sd Oct

The signal processing speed of spectral domain optical coherence tomography (SD-OCT) has become a bottleneck in a lot of medical applications. Recently, a time-domain interpolation method was proposed. This method can get better signal-to-noise ratio (SNR) but much-reduced signal processing time in SD-OCT data processing as compared with the commonly used zero-padding interpolation method. Additionally, the resampled data can be obtained by a few data and coefficients in the cutoff window. Thus, a lot of interpolations can be performed simultaneously. So, this interpolation method is suitable for parallel computing. By using graphics processing unit (GPU) and the compute unified device architecture (CUDA) program model, time-domain interpolation can be accelerated significantly. The computing capability can be achieved more than 250,000 A-lines, 200,000 A-lines, and 160,000 A-lines in a second for 2,048 pixel OCT when the cutoff length is L = 11, L = 21, and L = 31, respectively. A frame SD-OCT data (400A-lines × 2,048 pixel per line) is acquired and processed on GPU in real time. The results show that signal processing time of SD-OCT can be finished in 6.223 ms when the cutoff length L = 21, which is much faster than that on central processing unit (CPU). Real-time signal processing of acquired data can be realized.

Download Full-text

Detection of Inflatable Boats and People in Thermal Infrared with Deep Learning Methods

Sensors ◽

10.3390/s21165330 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5330

Author(s):

Marcin Łukasz Kowalski ◽

Norbert Pałka ◽

Jarosław Młyńczak ◽

Mateusz Karol ◽

Elżbieta Czerwińska ◽

...

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Processing Time ◽

Graphics Processing Unit ◽

Weather Conditions ◽

Processing Unit ◽

Long Time ◽

Financial Interests ◽

Graphics Processing

Smuggling of drugs and cigarettes in small inflatable boats across border rivers is a serious threat to the EU’s financial interests. Early detection of such threats is challenging due to difficult and changing environmental conditions. This study reports on the automatic detection of small inflatable boats and people in a rough wild terrain in the infrared thermal domain. Three acquisition campaigns were carried out during spring, summer, and fall under various weather conditions. Three deep learning algorithms, namely, YOLOv2, YOLOv3, and Faster R-CNN working with six different feature extraction neural networks were trained and evaluated in terms of performance and processing time. The best performance was achieved with Faster R-CNN with ResNet101, however, processing requires a long time and a powerful graphics processing unit.

Download Full-text

Fast iterative solvers for large compressed-sparse row linear systems on graphics processing unit

Pollack Periodica ◽

10.1556/pollack.10.2015.1.1 ◽

2015 ◽

Vol 10 (1) ◽

pp. 3-18 ◽

Cited By ~ 1

Author(s):

Frédéric Magoulès ◽

Abal-Kassim Cheik Ahamed ◽

Roman Putanowicz

Keyword(s):

Linear Systems ◽

Graphics Processing Unit ◽

Iterative Solvers ◽

Processing Unit ◽

Compressed Sparse Row ◽

Graphics Processing

Download Full-text

Performance Analysis and Optimization of Graphics Processing Unit

SSRN Electronic Journal ◽

10.2139/ssrn.3350249 ◽

2019 ◽

Author(s):

Lokendra Singh Umrao ◽

Jay Prakash Pandey

Keyword(s):

Performance Analysis ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Implementing wide baseline matching algorithms on a graphics processing unit.

10.2172/921737 ◽

2007 ◽

Author(s):

Fredrick H. Rothganger ◽

Kurt W. Larson ◽

Antonio Ignacio Gonzales ◽

Daniel S. Myers

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Wide Baseline Matching ◽

Graphics Processing

Download Full-text

Two Decades of 4D-QSAR: A Dying Art or Staging a Comeback?

International Journal of Molecular Sciences ◽

10.3390/ijms22105212 ◽

2021 ◽

Vol 22 (10) ◽

pp. 5212

Author(s):

Andrzej Bak

Keyword(s):

Molecular Conformation ◽

Graphics Processing Unit ◽

Processing Unit ◽

Diverse Range ◽

Current State ◽

Gpu Clusters ◽

Pharmacophore Hypothesis ◽

Rising Power ◽

Graphics Processing ◽

Ligand Conformation

A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.

Download Full-text

Parallelization of Global Sequence Alignment on Graphics Processing Unit

2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI) ◽

10.1109/ccci49893.2020.9256747 ◽

2020 ◽

Author(s):

Kailash W. Kalare ◽

Mohammad S. Obaidat ◽

Jitendra V. Tembhurne ◽

Chandrashekhar Meshram ◽

Kuei-Fang Hsiao

Keyword(s):

Sequence Alignment ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Graphics processing unit acceleration of the island model genetic algorithm using the CUDA programming platform

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6286 ◽

2021 ◽

Author(s):

Dylan M. Janssen ◽

Wayne Pullan ◽

Alan Wee‐Chung Liew

Keyword(s):

Genetic Algorithm ◽

Graphics Processing Unit ◽

Island Model ◽

Processing Unit ◽

Cuda Programming ◽

Graphics Processing

Download Full-text

Real-time, High-resolution Depth Upsampling on Embedded Accelerators

ACM Transactions on Embedded Computing Systems ◽

10.1145/3436878 ◽

2021 ◽

Vol 20 (3) ◽

pp. 1-22

Author(s):

David Langerman ◽

Alan George

Keyword(s):

High Resolution ◽

Low Power ◽

Real Time ◽

Mixed Reality ◽

Graphics Processing Unit ◽

Processing Unit ◽

Reconfigurable Logic ◽

Depth Sensors ◽

Time Requirements ◽

Graphics Processing

High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps. 1

Download Full-text