Accelerating 3-D GPU-based Motion Tracking for Ultrasound Strain Elastography Using Sum-Tables: Analysis and Initial Results

Bo Peng; Shasha Luo; Zhengqiu Xu; Jingfeng Jiang

doi:10.3390/app9101991

Accelerating 3-D GPU-based Motion Tracking for Ultrasound Strain Elastography Using Sum-Tables: Analysis and Initial Results

Applied Sciences ◽

10.3390/app9101991 ◽

2019 ◽

Vol 9 (10) ◽

pp. 1991 ◽

Cited By ~ 3

Author(s):

Bo Peng ◽

Shasha Luo ◽

Zhengqiu Xu ◽

Jingfeng Jiang

Keyword(s):

Correlation Coefficient ◽

Computational Efficiency ◽

Graphics Processing Units ◽

Motion Tracking ◽

Performance Comparison ◽

Tracking Accuracy ◽

Computing Environment ◽

Phantom Experiment ◽

Strain Elastography ◽

Computationally Intensive

Now, with the availability of 3-D ultrasound data, a lot of research efforts are being devoted to developing 3-D ultrasound strain elastography (USE) systems. Because 3-D motion tracking, a core component in any 3-D USE system, is computationally intensive, a lot of efforts are under way to accelerate 3-D motion tracking. In the literature, the concept of Sum-Table has been used in a serial computing environment to reduce the burden of computing signal correlation, which is the single most computationally intensive component in 3-D motion tracking. In this study, parallel programming using graphics processing units (GPU) is used in conjunction with the concept of Sum-Table to improve the computational efficiency of 3-D motion tracking. To our knowledge, sum-tables have not been used in a GPU environment for 3-D motion tracking. Our main objective here is to investigate the feasibility of using sum-table-based normalized correlation coefficient (ST-NCC) method for the above-mentioned GPU-accelerated 3-D USE. More specifically, two different implementations of ST-NCC methods proposed by Lewis et al. and Luo-Konofagou are compared against each other. During the performance comparison, the conventional method for calculating the normalized correlation coefficient (NCC) was used as the baseline. All three methods were implemented using compute unified device architecture (CUDA; Version 9.0, Nvidia Inc., CA, USA) and tested on a professional GeForce GTX TITAN X card (Nvidia Inc., CA, USA). Using 3-D ultrasound data acquired during a tissue-mimicking phantom experiment, both displacement tracking accuracy and computational efficiency were evaluated for the above-mentioned three different methods. Based on data investigated, we found that under the GPU platform, Lou-Konofaguo method can still improve the computational efficiency (17–46%), as compared to the classic NCC method implemented into the same GPU platform. However, the Lewis method does not improve the computational efficiency in some configuration or improves the computational efficiency at a lower rate (7–23%) under the GPU parallel computing environment. Comparable displacement tracking accuracy was obtained by both methods.

Download Full-text

Performance comparison of generational and steady-state asynchronous multi-objective evolutionary algorithms for computationally-intensive problems

Knowledge-Based Systems ◽

10.1016/j.knosys.2015.05.029 ◽

2015 ◽

Vol 87 ◽

pp. 47-60 ◽

Cited By ~ 20

Author(s):

Alexandru-Ciprian Zăvoianu ◽

Edwin Lughofer ◽

Werner Koppelstätter ◽

Günther Weidenholzer ◽

Wolfgang Amrhein ◽

...

Keyword(s):

Steady State ◽

Evolutionary Algorithms ◽

Performance Comparison ◽

Multi Objective ◽

Computationally Intensive

Download Full-text

Augmenting 3D Ultrasound Strain Elastography by combining Bayesian inference with local Polynomial fitting in Region-growing-based Motion Tracking

10.1109/icip42928.2021.9506520 ◽

2021 ◽

Author(s):

Shuojie Wen ◽

Bo Peng ◽

Hao Jiang ◽

Junkai Cao ◽

Jingfeng Jiang

Keyword(s):

Bayesian Inference ◽

Motion Tracking ◽

Region Growing ◽

3D Ultrasound ◽

Local Polynomial Fitting ◽

Polynomial Fitting ◽

Strain Elastography ◽

Local Polynomial ◽

Fitting In

Download Full-text

Extending the usage of graphics processing units on the cloud for cost savings on seismic data regularization

Brazilian Journal of Geophysics ◽

10.22564/rbgf.v38i2.2048 ◽

2021 ◽

Vol 38 (2) ◽

Author(s):

Nicholas Torres Okita ◽

Tiago A. Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Cloud Computing ◽

Graphics Processing Units ◽

Cost Savings ◽

Data Sets ◽

Computing Paradigm ◽

Common Reflection Surface ◽

User Demand ◽

Computationally Intensive ◽

Zero Offset ◽

Graphics Processing

ABSTRACT. The usage of graphics processing units is already known as an alternative to traditional multi-core CPU processing, offering faster performance in the order of dozens of times in parallel tasks. Another new computing paradigm is cloud computing usage as a replacement to traditional in-house clusters, enabling seemingly unlimited computation power, no maintenance costs, and cutting-edge technology, dynamically on user demand. Previously those two tools were used to accelerate the estimation of Common Reflection Surface (CRS) traveltime parameters, both in zero-offset and finite-offset domain, delivering very satisfactory results with large time savings from GPU devices alongside cost savings on the cloud. This work extends those results by using GPUs on the cloud to accelerate the Offset Continuation Trajectory (OCT) traveltime parameter estimation. The results have shown that the time and cost savings from GPU devices’ usage are even larger than those seen in the CRS results, being up to fifty times faster and sixty times cheaper. This analysis reaffirms that it is possible to save both time and money when using GPU devices on the cloud and concludes that the larger the data sets are and the more computationally intensive the traveltime operators are, we can see larger improvements.Keywords: cloud computing, GPU, seismic processing. Estendendo o uso de placas gráficas na nuvem para economias em regularização de dados sísmicosRESUMO. O uso de aceleradores gráficos para processamento já é uma alternativa conhecida ao uso de CPUs multi-cores, oferecendo um desempenho na ordem de dezenas de vezes mais rápido em tarefas paralelas. Outro novo paradigma de computação é o uso da nuvem computacional como substituta para os tradicionais clusters internos, possibilitando o uso de um poder computacional aparentemente infinito sem custo de manutenção e com tecnologia de ponta, dinamicamente sob demanda de usuário. Anteriormente essas duas ferramentas foram utilizadas para acelerar a estimação de parâmetros do tempo de trânsito de Common Reflection Surface (CRS), tanto em zero-offset quanto em offsets finitos, obtendo resultados satisfatórios com amplas economias tanto de tempo quanto de dinheiro na nuvem. Este trabalho estende os resultados obtidos anteriormente, desta vez utilizando GPUs na nuvem para acelerar a estimação de parâmetros do tempo de trânsito em Offset Continuation Trajectory (OCT). Os resultados obtidos mostraram que as economias de tempo e dinheiro foram ainda maiores do que aquelas obtidas no CRS, sendo até cinquenta vezes mais rápido e sessenta vezes mais barato. Esta análise reafirma que é possível economizar tanto tempo quanto dinheiro usando GPUs na nuvem, e conclui que quanto maior for o dado e quanto mais computacionalmente intenso for o operador, maiores serão os ganhos de desempenho observados e economias.Palavras-chave: computação em nuvem, GPU, processamento sísmico.

Download Full-text

A Study of Tracking System Using Multiple Emitter Towers

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.694-697.927 ◽

2013 ◽

Vol 694-697 ◽

pp. 927-935 ◽

Cited By ~ 1

Author(s):

Yi Sun ◽

Tao Ma ◽

Chia Yung Han ◽

Joseph Ross ◽

William Wee

Keyword(s):

Coordinate Transformation ◽

Motion Tracking ◽

Tracking System ◽

Transformation Method ◽

Tracking Accuracy ◽

Working Space ◽

Frontal Part ◽

Motion Tracking System ◽

Coordinate Transformation Method

This paper presents a simple and accurate coordinate transformation method for extending the tracking space of the Intersense IS-900 spatial and motion tracking system using multiple pre-configured emitter towers to form the emitter constellation, but without resorting to the use of a surveyor machine. The proposed approach uses the differences of positional coordinate readings from each emitter tower among a set of commonly viewed spatial points to calculate the parameters needed to define the coordinate transformation. By applying this method, the tracking accuracy using the entire emitter constellation can be achieved by less than 0.5 inches error in most of the working space, and as low as 0.2 inches error in the frontal part of the working space.

Download Full-text

The use of a cascaded Kinect and electromyography gesture decoding algorithm in an initial robot-aided hand neurorehabilitation

Advances in Mechanical Engineering ◽

10.1177/1687814017751967 ◽

2018 ◽

Vol 10 (1) ◽

pp. 168781401775196 ◽

Cited By ~ 2

Author(s):

Ping Wang ◽

Yabo Wang ◽

He Huang ◽

Feng Ru ◽

Quan Pan

Keyword(s):

Control Algorithm ◽

Motion Tracking ◽

Simulation Experiment ◽

Tracking Error ◽

Torque Control ◽

Tracking Accuracy ◽

Rehabilitation Robots ◽

Machine Interface ◽

Repetitive Activities ◽

Active Assistance

In order to improve the neurological recovery of hand neurorehabilitation, target-oriented, intensive, repetitive activities of daily living are used, such as training with recognition of hand gestures during robot-aided exercise. In this article, a cascade control algorithm integrating electromyography bio-feedback into hand gesture recognition is proposed. The outer loop is the trajectory motion tracking with Kinect-based gesture decoding classifier, and the inner loop is torque control with electromyography bio-feedback in the real time. This proposed method improves the tracking accuracy. The tracking error is effectively reduced from 70.56 to 28.07 in the simulation experiment. The initial test proves that the proposed method with additional torque control allows active assistance on the human–machine interface of other rehabilitation robots in future.

Download Full-text

Performance comparison of prediction filters for respiratory motion tracking in radiotherapy

Medical Physics ◽

10.1002/mp.13929 ◽

2019 ◽

Vol 47 (2) ◽

pp. 643-650 ◽

Cited By ~ 2

Author(s):

Alexander Jöhl ◽

Stefanie Ehrbar ◽

Matthias Guckenberger ◽

Stephan Klöck ◽

Mirko Meboldt ◽

...

Keyword(s):

Motion Tracking ◽

Respiratory Motion ◽

Performance Comparison

Download Full-text

Performance comparison of software and FPGA implementation of computationally intensive algorithms

Proceedings of the International Conference and Workshop on Emerging Trends in Technology - ICWET '10 ◽

10.1145/1741906.1742166 ◽

2010 ◽

Author(s):

Laxmikant Bordekar ◽

Gajanan S. Gawde

Keyword(s):

Performance Comparison ◽

Fpga Implementation ◽

Computationally Intensive

Download Full-text

Performance comparison of cache invalidation techniques in mobile computing environment

2015 1st International Conference on Next Generation Computing Technologies (NGCT) ◽

10.1109/ngct.2015.7375091 ◽

2015 ◽

Author(s):

Rajeev Tiwari ◽

Neeraj Kumar

Keyword(s):

Mobile Computing ◽

Performance Comparison ◽

Computing Environment ◽

Cache Invalidation ◽

Mobile Computing Environment

Download Full-text

A Distributed Product Realization Environment for Design and Manufacturing

Journal of Computing and Information Science in Engineering ◽

10.1115/1.1412230 ◽

2001 ◽

Vol 1 (3) ◽

pp. 235-244 ◽

Cited By ~ 20

Author(s):

Jonathan F. Gerhard ◽

David Rosen ◽

Janet K. Allen ◽

Farrokh Mistree

Keyword(s):

Computing Environment ◽

Global Marketplace ◽

Design And Manufacturing ◽

Geographically Distributed ◽

Computationally Intensive ◽

Manufacturing Software ◽

Event Based ◽

Realization Process ◽

Product Realization ◽

Information Requests

Geographically distributed engineers must collaboratively develop, build and test solutions to design-manufacture problems to be competitive in the global marketplace. Engineers operate in a distributed system in which separate entities communicate cooperatively—ideas and information requests are generated anywhere within the system, rapid turn-around is essential, and multiple projects must be handled simultaneously. In this paper we present a prototype platform-independent framework to integrate distributed and heterogeneous software resources to support the computationally intensive activities in the product realization process. This framework, PRE-RMI, is based on an experimental event-based communications model; it has been coded in Java and uses the RMI messaging system. We describe its usage in a distributed product realization environment, the Rapid Tooling TestBed. PRE-RMI is compared to a previous environment, called P2 that was based on Java Servlet technology. PRE-RMI is adaptable to different design processes, is modular and extensible, is robust to network and computing failures, and is far preferable to P2. Further, we demonstrate the successful integration of CAD, CAE, design, and manufacturing software tools and resources in this flexible distributed computing environment.

Download Full-text

A GPU based multidimensional amplitude analysis to search for tetraquark candidates

10.21203/rs.3.rs-51185/v3 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Multiple Cores ◽

Analysis Strategies ◽

Computationally Intensive ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text