Accelerating 3D Fourier migration with graphics processing units

Geophysics ◽  
2009 ◽  
Vol 74 (6) ◽  
pp. WCA129-WCA139 ◽  
Author(s):  
Jin-Hai Zhang ◽  
Shu-Qin Wang ◽  
Zhen-Xing Yao

Computational cost is a major factor that inhibits the practical application of 3D depth migration. We have developed a fast parallel scheme to speed up 3D wave-equation depth migration on a parallel computing device, i.e., on graphics processing units (GPUs). The third-order optimized generalized-screen propagator is used to take advantage of the built-in software implementation of the fast Fourier transform. The propagator is coded as a sequence of kernels that can be called from the computer host for each frequency component. Moving the wavefield extrapolation for each depth level to the GPUs allows handling a large 3D velocity model, but this scheme can be speeded up to a limited degree over the CPU implementation because of the low-bandwidth data transfer between host and device. We have created further speedup in this extrapolation scheme by minimizing the low-bandwidth data transfer, which is done by storing the 3D velocity model and imaged data in the device memory, and reducing half the memory demand by compressing the 3D velocity model and imaged data using integer arrays instead of float arrays. By incorporating a 2D tapered function, time-shift propagator, and scaling of the inverse Fourier transform into a compact kernel, the computation time is reduced greatly. Three-dimensional impulse responses and synthetic data examples have demonstrated that the GPU-based Fourier migration typically is 25 to 40 times faster than the CPU-based implementation. It enables us to image complex media using 3D depth migration with little concern for computational cost. The macrovelocity model can be built in a much shorter turnaround time.

Geophysics ◽  
2014 ◽  
Vol 79 (3) ◽  
pp. S105-S111 ◽  
Author(s):  
Sheng Xu ◽  
Feng Chen ◽  
Bing Tang ◽  
Gilles Lambare

When using seismic data to image complex structures, the reverse time migration (RTM) algorithm generally provides the best results when the velocity model is accurate. With an inexact model, moveouts appear in common image gathers (CIGs), which are either in the surface offset domain or in subsurface angle domain; thus, the stacked image is not well focused. In extended image gathers, the strongest energy of a seismic event may occur at non-zero-lag in time-shift or offset-shift gathers. Based on the operation of RTM images produced by the time-shift imaging condition, the non-zero-lag time-shift images exhibit a spatial shift; we propose an approach to correct them by a second pass of migration similar to zero-offset depth migration; the proposed approach is based on the local poststack depth migration assumption. After the proposed second-pass migration, the time-shift CIGs appear to be flat and can be stacked. The stack enhances the energy of seismic events that are defocused at zero time lag due to the inaccuracy of the model, even though the new focused events stay at the previous positions, which might deviate from the true positions of seismic reflection. With the stack, our proposed approach is also able to attenuate the long-wavelength RTM artifacts. In the case of tilted transverse isotropic migration, we propose a scheme to defocus the coherent noise, such as migration artifacts from residual multiples, by applying the original migration velocity model along the symmetry axis but with different anisotropic parameters in the second pass of migration. We demonstrate that our approach is effective to attenuate the coherent noise at subsalt area with two synthetic data sets and one real data set from the Gulf of Mexico.


Geophysics ◽  
1998 ◽  
Vol 63 (1) ◽  
pp. 25-38 ◽  
Author(s):  
Xianhuai Zhu ◽  
Burke G. Angstman ◽  
David P. Sixta

Through the use of iterative turning‐ray tomography followed by wave‐equation datuming (or tomo‐datuming) and prestack depth migration, we generate accurate prestack images of seismic data in overthrust areas containing both highly variable near‐surface velocities and rough topography. In tomo‐datuming, we downward continue shot records from the topography to a horizontal datum using velocities estimated from tomography. Turning‐ray tomography often provides a more accurate near‐surface velocity model than that from refraction statics. The main advantage of tomo‐datuming over tomo‐statics (tomography plus static corrections) or refraction statics is that instead of applying a vertical time‐shift to the data, tomo‐datuming propagates the recorded wavefield to the new datum. We find that tomo‐datuming better reconstructs diffractions and reflections, subsequently providing better images after migration. In the datuming process, we use a recursive finite‐difference (FD) scheme to extrapolate wavefield without applying the imaging condition, such that lateral velocity variations can be handled properly and approximations in traveltime calculations associated with the raypath distortions near the surface for migration are avoided. We follow the downward continuation step with a conventional Kirchhoff prestack depth migration. This results in better images than those migrated from the topography using the conventional Kirchhoff method with traveltime calculation in the complicated near surface. Since FD datuming is only applied to the shallow part of the section, its cost is much less than the whole volume FD migration. This is attractive because (1) prestack depth migration usually is used iteratively to build a velocity model, so both efficiency and accuracy are important factors to be considered; and (2) tomo‐datuming can improve the signal‐to‐noise (S/N) ratio of prestack gathers, leading to more accurate migration velocity analysis and better images after depth migration. Case studies with synthetic and field data examples show that tomo‐datuming is especially helpful when strong lateral velocity variations are present below the topography.


2013 ◽  
Vol 02 (01) ◽  
pp. 1350008 ◽  
Author(s):  
A. MAGRO ◽  
J. HICKISH ◽  
K. Z. ADAMI

Radio transient discovery using next generation radio telescopes will pose several digital signal processing and data transfer challenges, requiring specialized high-performance backends. Several accelerator technologies are being considered as prototyping platforms, including Graphics Processing Units (GPUs). In this paper we present a real-time pipeline prototype capable of processing multiple beams concurrently, performing Radio Frequency Interference (RFI) rejection through thresholding, correcting for the delay in signal arrival times across the frequency band using brute-force dedispersion, event detection and clustering, and finally candidate filtering, with the capability of persisting data buffers containing interesting signals to disk. This setup was deployed at the BEST-2 SKA pathfinder in Medicina, Italy, where several benchmarks and test observations of astrophysical transients were conducted. These tests show that on the deployed hardware eight 20 MHz beams can be processed simultaneously for ~640 Dispersion Measure (DM) values. Furthermore, the clustering and candidate filtering algorithms employed prove to be good candidates for online event detection techniques. The number of beams which can be processed increases proportionally to the number of servers deployed and number of GPUs, making it a viable architecture for current and future radio telescopes.


Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2020 ◽  
Author(s):  
Jingcheng Shen ◽  
Jie Mei ◽  
Marcus Walldén ◽  
Fumihiko Ino

AbstractFreeSurfer is among the most widely used suites of software for the study of cortical and subcortical brain anatomy. However, analysis using FreeSurfer can be time-consuming and it lacks support for the graphics processing units (GPUs) after the core development team stopped maintaining GPU-accelerated versions due to significant programming cost. As FreeSurfer is a large project with millions of source lines, in this work, we introduce and examine the use of a directive-based framework, OpenACC, in GPU acceleration of FreeSurfer, and we found the OpenACC-based approach significantly reduces programming costs. Moreover, because the overhead incurred by CPU-to-GPU data transfer is the major challenge in delivering GPU-based codes of high performance, we compare two schemes, copy- and-transfer and overlapped-fully-transfer, to reduce such data transfer overhead. Exper-imental results show that the target function we accelerated with overlapped-fully-transfer scheme ran 2.3 as fast as the original CPU-based function, and the GPU-accelerated program achieved an average speedup of 1.2 compared to the original CPU-based program. These results demonstrate the usefulness and potential of utilizing the proposed OpenACC-based approach to integrate GPU support for FreeSurfer which can be easily extended to other computationally expensive functions and modules of FreeSurfer to achieve further speedup.


2018 ◽  
Vol 29 (01) ◽  
pp. 63-90 ◽  
Author(s):  
Safia Kedad-Sidhoum ◽  
Florence Monna ◽  
Grégory Mounié ◽  
Denis Trystram

More and more parallel computing platforms are built upon hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like General Purpose Graphics Processing Units (GPGPUs). We present in this paper a new method for scheduling efficiently parallel applications with [Formula: see text] CPUs and [Formula: see text] GPGPUs, where each task of the application can be processed either on an usual core (CPU) or on a GPGPU. We consider the problem of scheduling [Formula: see text] independent tasks with the objective to minimize the time for completing the whole application (makespan). This problem is NP-hard, thus, we present two families of approximation algorithms that can achieve approximation ratios of [Formula: see text] or [Formula: see text] for any integer [Formula: see text] when only one GPGPU is considered, and [Formula: see text] or [Formula: see text] for [Formula: see text] GPGPUs, where [Formula: see text] is an arbitrary small value which corresponds to the target accuracy of a binary search. The proposed method is based on a dual approximation scheme that uses a dynamic programming algorithm. The associated computational costs are for the first (resp. second) family in [Formula: see text] (resp. [Formula: see text]) per step of dual approximation. The greater the value of parameter [Formula: see text], the better the approximation, but the more expensive the computational cost. Finally, we propose a relaxed version of the algorithm which achieves a running time in [Formula: see text] with a constant approximation bound of [Formula: see text]. This last result is compared to the state-of-the-art algorithm HEFT. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes.


Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4234 ◽  
Author(s):  
Yu-Chi Lai ◽  
Jin-Yang Lin ◽  
Chih-Yuan Yao ◽  
Dong-Yuan Lyu ◽  
Shyh-Yuan Lee ◽  
...  

Digital dental reconstruction can be a more efficient and effective mechanism for artificial crown construction and period inspection. However, optical methods cannot reconstruct those portions under gums, and X-ray-based methods have high radiation to limit their applied frequency. Optical coherence tomography (OCT) can harmlessly penetrate gums using low-coherence infrared rays, and thus, this work designs an OCT-based framework for dental reconstruction using optical rectification, fast Fourier transform, volumetric boundary detection, and Poisson surface reconstruction to overcome noisy imaging. Additionally, in order to operate in a patient’s mouth, the caliber of the injector is small along with its short penetration depth and effective operation range, and thus, reconstruction requires multiple scans from various directions along with proper alignment. However, flat regions, such as the mesial side of front teeth, may not have enough features for alignment. As a result, we design a scanning order for different types of teeth starting from an area of abundant features for easier alignment while using gyros to track scanned postures for better initial orientations. It is important to provide immediate feedback for each scan, and thus, we accelerate the entire signal processing, boundary detection, and point-cloud alignment using Graphics Processing Units (GPUs) while streamlining the data transfer and GPU computations. Finally, our framework can successfully reconstruct three isolated teeth and a side of one living tooth with comparable precisions against the state-of-art method. Moreover, a user study also verifies the effectiveness of our interactive feedback for efficient and fast clinic scanning.


2021 ◽  
Author(s):  
Alexander Bauer ◽  
Benjamin Schwarz ◽  
Dirk Gajewski

<p>Most established methods for the estimation of subsurface velocity models rely on the measurements of reflected or diving waves and therefore require data with sufficiently large source-receiver offsets. For seismic data that lacks these offsets, such as vintage data, low-fold academic data or near zero-offset P-Cable data, these methods fail. Building on recent studies, we apply a workflow that exploits the diffracted wavefield for depth-velocity-model building. This workflow consists of three principal steps: (1) revealing the diffracted wavefield by modeling and adaptively subtracting reflections from the raw data, (2) characterizing the diffractions with physically meaningful wavefront attributes, (3) estimating depth-velocity models with wavefront tomography. We propose a hybrid 2D/3D approach, in which we apply the well-established and automated 2D workflow to numerous inlines of a high-resolution 3D P-Cable dataset acquired near Ritter Island, a small volcanic island located north-east of New Guinea known for a catastrophic flank collapse in 1888. We use the obtained set of parallel 2D velocity models to interpolate a 3D velocity model for the whole data cube, thus overcoming possible issues such as varying data quality in inline and crossline direction and the high computational cost of 3D data analysis. Even though the 2D workflow may suffer from out-of-plane effects, we obtain a smooth 3D velocity model that is consistent with the data.</p>


Sign in / Sign up

Export Citation Format

Share Document