scholarly journals High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures

2011 ◽  
Vol 2011 ◽  
pp. 1-11 ◽  
Author(s):  
Daehyun Kim ◽  
Joshua Trzasko ◽  
Mikhail Smelyanskiy ◽  
Clifton Haider ◽  
Pradeep Dubey ◽  
...  

Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability.

Geophysics ◽  
2017 ◽  
Vol 82 (6) ◽  
pp. O91-O104 ◽  
Author(s):  
Georgios Pilikos ◽  
A. C. Faul

Extracting the maximum possible information from the available measurements is a challenging task but is required when sensing seismic signals in inaccessible locations. Compressive sensing (CS) is a framework that allows reconstruction of sparse signals from fewer measurements than conventional sampling rates. In seismic CS, the use of sparse transforms has some success; however, defining fixed basis functions is not trivial given the plethora of possibilities. Furthermore, the assumption that every instance of a seismic signal is sparse in any acquisition domain under the same transformation is limiting. We use beta process factor analysis (BPFA) to learn sparse transforms for seismic signals in the time slice and shot record domains from available data, and we use them as dictionaries for CS and denoising. Algorithms that use predefined basis functions are compared against BPFA, with BPFA obtaining state-of-the-art reconstructions, illustrating the importance of decomposing seismic signals into learned features.


2021 ◽  
Vol 14 (3) ◽  
pp. 1-33
Author(s):  
Enrico Reggiani ◽  
Emanuele DEL Sozzo ◽  
Davide Conficconi ◽  
Giuseppe Natale ◽  
Carlo Moroni ◽  
...  

Stencil-based algorithms are a relevant class of computational kernels in high-performance systems, as they appear in a plethora of fields, from image processing to seismic simulations, from numerical methods to physical modeling. Among the various incarnations of stencil-based computations, Iterative Stencil Loops (ISLs) and Convolutional Neural Networks (CNNs) represent two well-known examples of kernels belonging to the stencil class. Indeed, ISLs apply the same stencil several times until convergence, while CNN layers leverage stencils to extract features from an image. The computationally intensive essence of ISLs, CNNs, and in general stencil-based workloads, requires solutions able to produce efficient implementations in terms of throughput and power efficiency. In this context, FPGAs are ideal candidates for such workloads, as they allow design architectures tailored to the stencil regular computational pattern. Moreover, the ever-growing need for performance enhancement leads FPGA-based architectures to scale to multiple devices to benefit from a distributed acceleration. For this reason, we propose a library of HDL components to effectively compute ISLs and CNNs inference on FPGA, along with a scalable multi-FPGA architecture, based on custom PCB interconnects. Our solution eases the design flow and guarantees both scalability and performance competitive with state-of-the-art works.


2011 ◽  
Vol 21 (02) ◽  
pp. 245-272 ◽  
Author(s):  
DUANE MERRILL ◽  
ANDREW GRIMSHAW

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.


Author(s):  
Daehyun Kim ◽  
Joshua D Trzasko ◽  
Mikhail Smelyanskiy ◽  
Clifton R Haider ◽  
Armando Manduca ◽  
...  

Author(s):  
Wei Huang ◽  
Xiaoshu Zhou ◽  
Mingchao Dong ◽  
Huaiyu Xu

AbstractRobust and high-performance visual multi-object tracking is a big challenge in computer vision, especially in a drone scenario. In this paper, an online Multi-Object Tracking (MOT) approach in the UAV system is proposed to handle small target detections and class imbalance challenges, which integrates the merits of deep high-resolution representation network and data association method in a unified framework. Specifically, while applying tracking-by-detection architecture to our tracking framework, a Hierarchical Deep High-resolution network (HDHNet) is proposed, which encourages the model to handle different types and scales of targets, and extract more effective and comprehensive features during online learning. After that, the extracted features are fed into different prediction networks for interesting targets recognition. Besides, an adjustable fusion loss function is proposed by combining focal loss and GIoU loss to solve the problems of class imbalance and hard samples. During the tracking process, these detection results are applied to an improved DeepSORT MOT algorithm in each frame, which is available to make full use of the target appearance features to match one by one on a practical basis. The experimental results on the VisDrone2019 MOT benchmark show that the proposed UAV MOT system achieves the highest accuracy and the best robustness compared with state-of-the-art methods.


Nanoscale ◽  
2021 ◽  
Author(s):  
Chenxi Gao ◽  
Jiawei Wang ◽  
Yuan Huang ◽  
Zixuan Li ◽  
Jiyan Zhang ◽  
...  

Zinc-ion batteries (ZIBs) have attracted significant attention owing to their high safety, high energy density, and low cost. ZIBs have been studied as a potential energy device for portable and...


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Mehdi Srifi ◽  
Ahmed Oussous ◽  
Ayoub Ait Lahcen ◽  
Salma Mouline

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.


Sign in / Sign up

Export Citation Format

Share Document