Accelerating 3D Fourier migration with graphics processing units

Jin-Hai Zhang; Shu-Qin Wang; Zhen-Xing Yao

doi:10.1190/1.3223186

Accelerating 3D Fourier migration with graphics processing units

Geophysics ◽

10.1190/1.3223186 ◽

2009 ◽

Vol 74 (6) ◽

pp. WCA129-WCA139 ◽

Cited By ~ 20

Author(s):

Jin-Hai Zhang ◽

Shu-Qin Wang ◽

Zhen-Xing Yao

Keyword(s):

Fourier Transform ◽

Graphics Processing Units ◽

Data Transfer ◽

Computational Cost ◽

Velocity Model ◽

Time Shift ◽

Computing Device ◽

Depth Migration ◽

3D Velocity Model ◽

Graphics Processing

Computational cost is a major factor that inhibits the practical application of 3D depth migration. We have developed a fast parallel scheme to speed up 3D wave-equation depth migration on a parallel computing device, i.e., on graphics processing units (GPUs). The third-order optimized generalized-screen propagator is used to take advantage of the built-in software implementation of the fast Fourier transform. The propagator is coded as a sequence of kernels that can be called from the computer host for each frequency component. Moving the wavefield extrapolation for each depth level to the GPUs allows handling a large 3D velocity model, but this scheme can be speeded up to a limited degree over the CPU implementation because of the low-bandwidth data transfer between host and device. We have created further speedup in this extrapolation scheme by minimizing the low-bandwidth data transfer, which is done by storing the 3D velocity model and imaged data in the device memory, and reducing half the memory demand by compressing the 3D velocity model and imaged data using integer arrays instead of float arrays. By incorporating a 2D tapered function, time-shift propagator, and scaling of the inverse Fourier transform into a compact kernel, the computation time is reduced greatly. Three-dimensional impulse responses and synthetic data examples have demonstrated that the GPU-based Fourier migration typically is 25 to 40 times faster than the CPU-based implementation. It enables us to image complex media using 3D depth migration with little concern for computational cost. The macrovelocity model can be built in a much shorter turnaround time.

Download Full-text

Real-Time 4D Full-Range Complex Fourier-domain OCT with Non-Uniform Fast Fourier Transform Based on Dual Graphics Processing Units Architecture

Optics in the Life Sciences ◽

10.1364/boda.2011.btuc2 ◽

2011 ◽

Author(s):

Kang Zhang ◽

Jin U. Kang

Keyword(s):

Fourier Transform ◽

Fast Fourier Transform ◽

Real Time ◽

Graphics Processing Units ◽

Full Range ◽

Fourier Domain ◽

Fourier Domain Oct ◽

Graphics Processing

Download Full-text

Noise removal by migration of time-shift images

Geophysics ◽

10.1190/geo2012-0497.1 ◽

2014 ◽

Vol 79 (3) ◽

pp. S105-S111 ◽

Cited By ~ 8

Author(s):

Sheng Xu ◽

Feng Chen ◽

Bing Tang ◽

Gilles Lambare

Keyword(s):

Time Lag ◽

Real Data ◽

Velocity Model ◽

Noise Removal ◽

Time Shift ◽

Reverse Time ◽

Coherent Noise ◽

Data Set ◽

Depth Migration ◽

Time Migration

When using seismic data to image complex structures, the reverse time migration (RTM) algorithm generally provides the best results when the velocity model is accurate. With an inexact model, moveouts appear in common image gathers (CIGs), which are either in the surface offset domain or in subsurface angle domain; thus, the stacked image is not well focused. In extended image gathers, the strongest energy of a seismic event may occur at non-zero-lag in time-shift or offset-shift gathers. Based on the operation of RTM images produced by the time-shift imaging condition, the non-zero-lag time-shift images exhibit a spatial shift; we propose an approach to correct them by a second pass of migration similar to zero-offset depth migration; the proposed approach is based on the local poststack depth migration assumption. After the proposed second-pass migration, the time-shift CIGs appear to be flat and can be stacked. The stack enhances the energy of seismic events that are defocused at zero time lag due to the inaccuracy of the model, even though the new focused events stay at the previous positions, which might deviate from the true positions of seismic reflection. With the stack, our proposed approach is also able to attenuate the long-wavelength RTM artifacts. In the case of tilted transverse isotropic migration, we propose a scheme to defocus the coherent noise, such as migration artifacts from residual multiples, by applying the original migration velocity model along the symmetry axis but with different anisotropic parameters in the second pass of migration. We demonstrate that our approach is effective to attenuate the coherent noise at subsalt area with two synthetic data sets and one real data set from the Gulf of Mexico.

Download Full-text

Overthrust imaging with tomo‐datuming: A case study

Geophysics ◽

10.1190/1.1444319 ◽

1998 ◽

Vol 63 (1) ◽

pp. 25-38 ◽

Cited By ~ 27

Author(s):

Xianhuai Zhu ◽

Burke G. Angstman ◽

David P. Sixta

Keyword(s):

Velocity Model ◽

Surface Velocity ◽

Time Shift ◽

Lateral Velocity ◽

Prestack Depth Migration ◽

Near Surface ◽

Depth Migration ◽

Static Corrections ◽

Velocity Variations ◽

Kirchhoff Method

Through the use of iterative turning‐ray tomography followed by wave‐equation datuming (or tomo‐datuming) and prestack depth migration, we generate accurate prestack images of seismic data in overthrust areas containing both highly variable near‐surface velocities and rough topography. In tomo‐datuming, we downward continue shot records from the topography to a horizontal datum using velocities estimated from tomography. Turning‐ray tomography often provides a more accurate near‐surface velocity model than that from refraction statics. The main advantage of tomo‐datuming over tomo‐statics (tomography plus static corrections) or refraction statics is that instead of applying a vertical time‐shift to the data, tomo‐datuming propagates the recorded wavefield to the new datum. We find that tomo‐datuming better reconstructs diffractions and reflections, subsequently providing better images after migration. In the datuming process, we use a recursive finite‐difference (FD) scheme to extrapolate wavefield without applying the imaging condition, such that lateral velocity variations can be handled properly and approximations in traveltime calculations associated with the raypath distortions near the surface for migration are avoided. We follow the downward continuation step with a conventional Kirchhoff prestack depth migration. This results in better images than those migrated from the topography using the conventional Kirchhoff method with traveltime calculation in the complicated near surface. Since FD datuming is only applied to the shallow part of the section, its cost is much less than the whole volume FD migration. This is attractive because (1) prestack depth migration usually is used iteratively to build a velocity model, so both efficiency and accuracy are important factors to be considered; and (2) tomo‐datuming can improve the signal‐to‐noise (S/N) ratio of prestack gathers, leading to more accurate migration velocity analysis and better images after depth migration. Case studies with synthetic and field data examples show that tomo‐datuming is especially helpful when strong lateral velocity variations are present below the topography.

Download Full-text

MULTIBEAM GPU TRANSIENT PIPELINE FOR THE MEDICINA BEST-2 ARRAY

Journal of Astronomical Instrumentation ◽

10.1142/s2251171713500086 ◽

2013 ◽

Vol 02 (01) ◽

pp. 1350008 ◽

Cited By ~ 6

Author(s):

A. MAGRO ◽

J. HICKISH ◽

K. Z. ADAMI

Keyword(s):

Event Detection ◽

Graphics Processing Units ◽

High Performance ◽

Data Transfer ◽

Digital Signal ◽

Dispersion Measure ◽

Radio Telescopes ◽

Arrival Times ◽

Detection Techniques ◽

Graphics Processing

Radio transient discovery using next generation radio telescopes will pose several digital signal processing and data transfer challenges, requiring specialized high-performance backends. Several accelerator technologies are being considered as prototyping platforms, including Graphics Processing Units (GPUs). In this paper we present a real-time pipeline prototype capable of processing multiple beams concurrently, performing Radio Frequency Interference (RFI) rejection through thresholding, correcting for the delay in signal arrival times across the frequency band using brute-force dedispersion, event detection and clustering, and finally candidate filtering, with the capability of persisting data buffers containing interesting signals to disk. This setup was deployed at the BEST-2 SKA pathfinder in Medicina, Italy, where several benchmarks and test observations of astrophysical transients were conducted. These tests show that on the deployed hardware eight 20 MHz beams can be processed simultaneously for ~640 Dispersion Measure (DM) values. Furthermore, the clustering and candidate filtering algorithms employed prove to be good candidates for online event detection techniques. The number of beams which can be processed increases proportionally to the number of servers deployed and number of GPUs, making it a viable architecture for current and future radio telescopes.

Download Full-text

Large-scale analytical Fourier transform of photomask layouts using graphics processing units

10.1117/12.2192040 ◽

2015 ◽

Author(s):

Julia A. Sakamoto

Keyword(s):

Fourier Transform ◽

Graphics Processing Units ◽

Large Scale ◽

Graphics Processing

Download Full-text

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Download Full-text

Integrating GPU support for FreeSurfer with OpenACC

10.1101/2020.09.03.282210 ◽

2020 ◽

Author(s):

Jingcheng Shen ◽

Jie Mei ◽

Marcus Walldén ◽

Fumihiko Ino

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Data Transfer ◽

Target Function ◽

Brain Anatomy ◽

Transfer Scheme ◽

Accelerated Program ◽

The Core ◽

Computationally Expensive ◽

Graphics Processing

AbstractFreeSurfer is among the most widely used suites of software for the study of cortical and subcortical brain anatomy. However, analysis using FreeSurfer can be time-consuming and it lacks support for the graphics processing units (GPUs) after the core development team stopped maintaining GPU-accelerated versions due to significant programming cost. As FreeSurfer is a large project with millions of source lines, in this work, we introduce and examine the use of a directive-based framework, OpenACC, in GPU acceleration of FreeSurfer, and we found the OpenACC-based approach significantly reduces programming costs. Moreover, because the overhead incurred by CPU-to-GPU data transfer is the major challenge in delivering GPU-based codes of high performance, we compare two schemes, copy- and-transfer and overlapped-fully-transfer, to reduce such data transfer overhead. Exper-imental results show that the target function we accelerated with overlapped-fully-transfer scheme ran 2.3 as fast as the original CPU-based function, and the GPU-accelerated program achieved an average speedup of 1.2 compared to the original CPU-based program. These results demonstrate the usefulness and potential of utilizing the proposed OpenACC-based approach to integrate GPU support for FreeSurfer which can be easily extended to other computationally expensive functions and modules of FreeSurfer to achieve further speedup.

Download Full-text

A Family of Scheduling Algorithms for Hybrid Parallel Platforms

International Journal of Foundations of Computer Science ◽

10.1142/s012905411850003x ◽

2018 ◽

Vol 29 (01) ◽

pp. 63-90 ◽

Cited By ~ 3

Author(s):

Safia Kedad-Sidhoum ◽

Florence Monna ◽

Grégory Mounié ◽

Denis Trystram

Keyword(s):

Graphics Processing Units ◽

Dynamic Programming Algorithm ◽

Computational Cost ◽

General Purpose ◽

Parallel Applications ◽

Programming Algorithm ◽

Approximation Bound ◽

Computing Platforms ◽

Graphics Processing ◽

Independent Tasks

More and more parallel computing platforms are built upon hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like General Purpose Graphics Processing Units (GPGPUs). We present in this paper a new method for scheduling efficiently parallel applications with [Formula: see text] CPUs and [Formula: see text] GPGPUs, where each task of the application can be processed either on an usual core (CPU) or on a GPGPU. We consider the problem of scheduling [Formula: see text] independent tasks with the objective to minimize the time for completing the whole application (makespan). This problem is NP-hard, thus, we present two families of approximation algorithms that can achieve approximation ratios of [Formula: see text] or [Formula: see text] for any integer [Formula: see text] when only one GPGPU is considered, and [Formula: see text] or [Formula: see text] for [Formula: see text] GPGPUs, where [Formula: see text] is an arbitrary small value which corresponds to the target accuracy of a binary search. The proposed method is based on a dual approximation scheme that uses a dynamic programming algorithm. The associated computational costs are for the first (resp. second) family in [Formula: see text] (resp. [Formula: see text]) per step of dual approximation. The greater the value of parameter [Formula: see text], the better the approximation, but the more expensive the computational cost. Finally, we propose a relaxed version of the algorithm which achieves a running time in [Formula: see text] with a constant approximation bound of [Formula: see text]. This last result is compared to the state-of-the-art algorithm HEFT. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes.

Download Full-text

Interactive OCT-Based Tooth Scan and Reconstruction

Sensors ◽

10.3390/s19194234 ◽

2019 ◽

Vol 19 (19) ◽

pp. 4234 ◽

Cited By ~ 2

Author(s):

Yu-Chi Lai ◽

Jin-Yang Lin ◽

Chih-Yuan Yao ◽

Dong-Yuan Lyu ◽

Shyh-Yuan Lee ◽

...

Keyword(s):

Graphics Processing Units ◽

User Study ◽

Data Transfer ◽

Boundary Detection ◽

Optical Methods ◽

Optical Rectification ◽

Immediate Feedback ◽

Low Coherence ◽

Infrared Rays ◽

Graphics Processing

Digital dental reconstruction can be a more efficient and effective mechanism for artificial crown construction and period inspection. However, optical methods cannot reconstruct those portions under gums, and X-ray-based methods have high radiation to limit their applied frequency. Optical coherence tomography (OCT) can harmlessly penetrate gums using low-coherence infrared rays, and thus, this work designs an OCT-based framework for dental reconstruction using optical rectification, fast Fourier transform, volumetric boundary detection, and Poisson surface reconstruction to overcome noisy imaging. Additionally, in order to operate in a patient’s mouth, the caliber of the injector is small along with its short penetration depth and effective operation range, and thus, reconstruction requires multiple scans from various directions along with proper alignment. However, flat regions, such as the mesial side of front teeth, may not have enough features for alignment. As a result, we design a scanning order for different types of teeth starting from an area of abundant features for easier alignment while using gyros to track scanned postures for better initial orientations. It is important to provide immediate feedback for each scan, and thus, we accelerate the entire signal processing, boundary detection, and point-cloud alignment using Graphics Processing Units (GPUs) while streamlining the data transfer and GPU computations. Finally, our framework can successfully reconstruct three isolated teeth and a side of one living tooth with comparable precisions against the state-of-art method. Moreover, a user study also verifies the effectiveness of our interactive feedback for efficient and fast clinic scanning.

Download Full-text

Diffraction imaging and depth-velocity inversion with 3D P-Cable seismic data

10.5194/egusphere-egu21-12578 ◽

2021 ◽

Author(s):

Alexander Bauer ◽

Benjamin Schwarz ◽

Dirk Gajewski

Keyword(s):

Seismic Data ◽

Model Building ◽

Computational Cost ◽

Velocity Model ◽

Data Cube ◽

Velocity Inversion ◽

Velocity Models ◽

North East ◽

3D Velocity Model ◽

Zero Offset

<p>Most established methods for the estimation of subsurface velocity models rely on the measurements of reflected or diving waves and therefore require data with sufficiently large source-receiver offsets. For seismic data that lacks these offsets, such as vintage data, low-fold academic data or near zero-offset P-Cable data, these methods fail. Building on recent studies, we apply a workflow that exploits the diffracted wavefield for depth-velocity-model building. This workflow consists of three principal steps: (1) revealing the diffracted wavefield by modeling and adaptively subtracting reflections from the raw data, (2) characterizing the diffractions with physically meaningful wavefront attributes, (3) estimating depth-velocity models with wavefront tomography. We propose a hybrid 2D/3D approach, in which we apply the well-established and automated 2D workflow to numerous inlines of a high-resolution 3D P-Cable dataset acquired near Ritter Island, a small volcanic island located north-east of New Guinea known for a catastrophic flank collapse in 1888. We use the obtained set of parallel 2D velocity models to interpolate a 3D velocity model for the whole data cube, thus overcoming possible issues such as varying data quality in inline and crossline direction and the high computational cost of 3D data analysis. Even though the 2D workflow may suffer from out-of-plane effects, we obtain a smooth 3D velocity model that is consistent with the data.</p>

Download Full-text