High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures

International Journal of Biomedical Imaging ◽

10.1155/2011/473128 ◽

2011 ◽

Vol 2011 ◽

pp. 1-11 ◽

Cited By ~ 20

Author(s):

Daehyun Kim ◽

Joshua Trzasko ◽

Mikhail Smelyanskiy ◽

Clifton Haider ◽

Pradeep Dubey ◽

...

Keyword(s):

Compressive Sensing ◽

High Performance ◽

State Of The Art ◽

Sparse Signals ◽

Mri Scan ◽

Nyquist Criterion ◽

Mri Reconstruction ◽

Computationally Intensive ◽

Many Core ◽

Significant Attention

Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability.

Download Full-text

Bayesian feature learning for seismic compressive sensing and denoising

Geophysics ◽

10.1190/geo2016-0373.1 ◽

2017 ◽

Vol 82 (6) ◽

pp. O91-O104 ◽

Cited By ~ 7

Author(s):

Georgios Pilikos ◽

A. C. Faul

Keyword(s):

Compressive Sensing ◽

State Of The Art ◽

Feature Learning ◽

Seismic Signal ◽

Basis Functions ◽

Sparse Signals ◽

Seismic Signals ◽

Process Factor ◽

Learned Features ◽

Fixed Basis

Extracting the maximum possible information from the available measurements is a challenging task but is required when sensing seismic signals in inaccessible locations. Compressive sensing (CS) is a framework that allows reconstruction of sparse signals from fewer measurements than conventional sampling rates. In seismic CS, the use of sparse transforms has some success; however, defining fixed basis functions is not trivial given the plethora of possibilities. Furthermore, the assumption that every instance of a seismic signal is sparse in any acquisition domain under the same transformation is limiting. We use beta process factor analysis (BPFA) to learn sparse transforms for seismic signals in the time slice and shot record domains from available data, and we use them as dictionaries for CS and denoising. Algorithms that use predefined basis functions are compared against BPFA, with BPFA obtaining state-of-the-art reconstructions, illustrating the importance of decomposing seismic signals into learned features.

Download Full-text

Enhancing the Scalability of Multi-FPGA Stencil Computations via Highly Optimized HDL Components

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3461478 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1-33

Author(s):

Enrico Reggiani ◽

Emanuele DEL Sozzo ◽

Davide Conficconi ◽

Giuseppe Natale ◽

Carlo Moroni ◽

...

Keyword(s):

Power Efficiency ◽

High Performance ◽

Performance Enhancement ◽

State Of The Art ◽

Design Flow ◽

Art Works ◽

Multiple Devices ◽

High Performance Systems ◽

Computationally Intensive ◽

And Performance

Stencil-based algorithms are a relevant class of computational kernels in high-performance systems, as they appear in a plethora of fields, from image processing to seismic simulations, from numerical methods to physical modeling. Among the various incarnations of stencil-based computations, Iterative Stencil Loops (ISLs) and Convolutional Neural Networks (CNNs) represent two well-known examples of kernels belonging to the stencil class. Indeed, ISLs apply the same stencil several times until convergence, while CNN layers leverage stencils to extract features from an image. The computationally intensive essence of ISLs, CNNs, and in general stencil-based workloads, requires solutions able to produce efficient implementations in terms of throughput and power efficiency. In this context, FPGAs are ideal candidates for such workloads, as they allow design architectures tailored to the stencil regular computational pattern. Moreover, the ever-growing need for performance enhancement leads FPGA-based architectures to scale to multiple devices to benefit from a distributed acceleration. For this reason, we propose a library of HDL components to effectively compute ISLs and CNNs inference on FPGA, along with a scalable multi-FPGA architecture, based on custom PCB interconnects. Our solution eases the design flow and guarantees both scalability and performance competitive with state-of-the-art works.

Download Full-text

HIGH PERFORMANCE AND SCALABLE RADIX SORTING: A CASE STUDY OF IMPLEMENTING DYNAMIC PARALLELISM FOR GPU COMPUTING

Parallel Processing Letters ◽

10.1142/s0129626411000187 ◽

2011 ◽

Vol 21 (02) ◽

pp. 245-272 ◽

Cited By ~ 106

Author(s):

DUANE MERRILL ◽

ANDREW GRIMSHAW

Keyword(s):

High Performance ◽

Gpu Computing ◽

State Of The Art ◽

Design Strategies ◽

Kernel Fusion ◽

Parallel Prefix ◽

Scan Data ◽

Many Core ◽

Global Data

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.

Download Full-text

High-performance 3D Compressive Sensing MRI reconstruction

2010 Annual International Conference of the IEEE Engineering in Medicine and Biology ◽

10.1109/iembs.2010.5627493 ◽

2010 ◽

Cited By ~ 1

Author(s):

Daehyun Kim ◽

Joshua D Trzasko ◽

Mikhail Smelyanskiy ◽

Clifton R Haider ◽

Armando Manduca ◽

...

Keyword(s):

Compressive Sensing ◽

High Performance ◽

Mri Reconstruction

Download Full-text

State-of-the-Art Admixture for high performance SCC in China

SCC'2005-China - 1st International Symposium on Design, Performance and Use of Self-Consolidating Concrete ◽

10.1617/2912143624.012 ◽

2005 ◽

Author(s):

S. Asmus

Keyword(s):

High Performance ◽

State Of The Art

Download Full-text

Multiple objects tracking in the UAV system based on hierarchical deep high-resolution network

Multimedia Tools and Applications ◽

10.1007/s11042-020-10427-1 ◽

2021 ◽

Author(s):

Wei Huang ◽

Xiaoshu Zhou ◽

Mingchao Dong ◽

Huaiyu Xu

Keyword(s):

High Resolution ◽

Object Tracking ◽

High Performance ◽

State Of The Art ◽

Class Imbalance ◽

Unified Framework ◽

Multiple Objects ◽

Tracking Process ◽

Objects Tracking ◽

Different Types

AbstractRobust and high-performance visual multi-object tracking is a big challenge in computer vision, especially in a drone scenario. In this paper, an online Multi-Object Tracking (MOT) approach in the UAV system is proposed to handle small target detections and class imbalance challenges, which integrates the merits of deep high-resolution representation network and data association method in a unified framework. Specifically, while applying tracking-by-detection architecture to our tracking framework, a Hierarchical Deep High-resolution network (HDHNet) is proposed, which encourages the model to handle different types and scales of targets, and extract more effective and comprehensive features during online learning. After that, the extracted features are fed into different prediction networks for interesting targets recognition. Besides, an adjustable fusion loss function is proposed by combining focal loss and GIoU loss to solve the problems of class imbalance and hard samples. During the tracking process, these detection results are applied to an improved DeepSORT MOT algorithm in each frame, which is available to make full use of the target appearance features to match one by one on a practical basis. The experimental results on the VisDrone2019 MOT benchmark show that the proposed UAV MOT system achieves the highest accuracy and the best robustness compared with state-of-the-art methods.

Download Full-text

Direct laser additive manufacturing of high performance oxide ceramics: a state-of-the-art review

Journal of the European Ceramic Society ◽

10.1016/j.jeurceramsoc.2021.05.035 ◽

2021 ◽

Author(s):

Stefan Pfeiffer ◽

Kevin Florio ◽

Dario Puccio ◽

Marco Grasso ◽

Bianca Maria Colosimo ◽

...

Keyword(s):

Additive Manufacturing ◽

High Performance ◽

State Of The Art ◽

Oxide Ceramics ◽

Laser Additive Manufacturing ◽

Direct Laser

Download Full-text

A High-Performance Free-Standing Zn Anode for Flexible Zinc-Ion Batteries

Nanoscale ◽

10.1039/d1nr01266e ◽

2021 ◽

Author(s):

Chenxi Gao ◽

Jiawei Wang ◽

Yuan Huang ◽

Zixuan Li ◽

Jiyan Zhang ◽

...

Keyword(s):

Energy Density ◽

High Performance ◽

Low Cost ◽

High Energy ◽

Zinc Ion ◽

High Energy Density ◽

Free Standing ◽

Energy Device ◽

Zn Anode ◽

Significant Attention

Zinc-ion batteries (ZIBs) have attracted significant attention owing to their high safety, high energy density, and low cost. ZIBs have been studied as a potential energy device for portable and...

Download Full-text

The Plural Many‐core Architecture – High Performance at Low Power

Multi‐Processor System‐on‐Chip 1 ◽

10.1002/9781119818298.ch3 ◽

2021 ◽

pp. 53-68

Author(s):

Ran Ginosar

Keyword(s):

Low Power ◽

High Performance ◽

Many Core

Download Full-text

Evaluation of recent advances in recommender systems on Arabic content

Journal Of Big Data ◽

10.1186/s40537-021-00420-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehdi Srifi ◽

Ahmed Oussous ◽

Ayoub Ait Lahcen ◽

Salma Mouline

Keyword(s):

Recommender Systems ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Recent Advances ◽

Research Gap ◽

Text Preprocessing

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.

Download Full-text