State of the Art and Development of Wearable Computer Graphics Processing Unit

继业 焦

doi:10.12677/ojcs.2015.43007

GPU-Based Embedded Intelligence Architectures and Applications

Electronics ◽

10.3390/electronics10080952 ◽

2021 ◽

Vol 10 (8) ◽

pp. 952

Author(s):

Li Minn Ang ◽

Kah Phooi Seng

Keyword(s):

State Of The Art ◽

Graphics Processing Unit ◽

Research Area ◽

Machine Learning Techniques ◽

Processing Unit ◽

Full Spectrum ◽

Comprehensive Review ◽

Learning Techniques ◽

Graphics Processing ◽

Embedded Intelligence

This paper present contributions to the state-of-the art for graphics processing unit (GPU-based) embedded intelligence (EI) research for architectures and applications. This paper gives a comprehensive review and representative studies of the emerging and current paradigms for GPU-based EI with the focus on the architecture, technologies and applications: (1) First, the overview and classifications of GPU-based EI research are presented to give the full spectrum in this area that also serves as a concise summary of the scope of the paper; (2) Second, various architecture technologies for GPU-based deep learning techniques and applications are discussed in detail; and (3) Third, various architecture technologies for machine learning techniques and applications are discussed. This paper aims to give useful insights for the research area and motivate researchers towards the development of GPU-based EI for practical deployment and applications.

Download Full-text

Work-Efficient Parallel Non-Maximum Suppression Kernels

The Computer Journal ◽

10.1093/comjnl/bxaa108 ◽

2020 ◽

Author(s):

David Oro ◽

Carles Fernández ◽

Xavier Martorell ◽

Javier Hernando

Keyword(s):

State Of The Art ◽

Graphics Processing Unit ◽

Sliding Window ◽

Processing Unit ◽

Single Shot ◽

True Location ◽

Speed Up ◽

Gpu Architectures ◽

Performance Results ◽

Graphics Processing

Abstract In the context of object detection, sliding-window classifiers and single-shot convolutional neural network (CNN) meta-architectures typically yield multiple overlapping candidate windows with similar high scores around the true location of a particular object. Non-maximum suppression (NMS) is the process of selecting a single representative candidate within this cluster of detections, so as to obtain a unique detection per object appearing on a given picture. In this paper, we present a highly scalable NMS algorithm for embedded graphics processing unit (GPU) architectures that is designed from scratch to handle workloads featuring thousands of simultaneous detections on a given picture. Our kernels are directly applicable to other sequential NMS algorithms such as FeatureNMS, Soft-NMS or AdaptiveNMS that share the inner workings of the classic greedy NMS method. The obtained performance results show that our parallel NMS algorithm is capable of clustering 1024 simultaneous detected objects per frame in roughly 1 ms on both Tegra X1 and Tegra X2 on-die GPUs, while taking 2 ms on Tegra K1. Furthermore, our proposed parallel greedy NMS algorithm yields a 14–40x speed up when compared to state-of-the-art NMS methods that require learning a CNN from annotated data.

Download Full-text

Procedure Crystallized: The Graphics Processing Unit and the Rise of Computer Graphics

Image Objects ◽

10.7551/mitpress/11077.003.0008 ◽

2021 ◽

Keyword(s):

Computer Graphics ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Fast iterative solvers for large compressed-sparse row linear systems on graphics processing unit

Pollack Periodica ◽

10.1556/pollack.10.2015.1.1 ◽

2015 ◽

Vol 10 (1) ◽

pp. 3-18 ◽

Cited By ~ 1

Author(s):

Frédéric Magoulès ◽

Abal-Kassim Cheik Ahamed ◽

Roman Putanowicz

Keyword(s):

Linear Systems ◽

Graphics Processing Unit ◽

Iterative Solvers ◽

Processing Unit ◽

Compressed Sparse Row ◽

Graphics Processing

Download Full-text

Performance Analysis and Optimization of Graphics Processing Unit

SSRN Electronic Journal ◽

10.2139/ssrn.3350249 ◽

2019 ◽

Author(s):

Lokendra Singh Umrao ◽

Jay Prakash Pandey

Keyword(s):

Performance Analysis ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Implementing wide baseline matching algorithms on a graphics processing unit.

10.2172/921737 ◽

2007 ◽

Author(s):

Fredrick H. Rothganger ◽

Kurt W. Larson ◽

Antonio Ignacio Gonzales ◽

Daniel S. Myers

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Wide Baseline Matching ◽

Graphics Processing

Download Full-text

Two Decades of 4D-QSAR: A Dying Art or Staging a Comeback?

International Journal of Molecular Sciences ◽

10.3390/ijms22105212 ◽

2021 ◽

Vol 22 (10) ◽

pp. 5212

Author(s):

Andrzej Bak

Keyword(s):

Molecular Conformation ◽

Graphics Processing Unit ◽

Processing Unit ◽

Diverse Range ◽

Current State ◽

Gpu Clusters ◽

Pharmacophore Hypothesis ◽

Rising Power ◽

Graphics Processing ◽

Ligand Conformation

A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.

Download Full-text

Parallelization of Global Sequence Alignment on Graphics Processing Unit

2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI) ◽

10.1109/ccci49893.2020.9256747 ◽

2020 ◽

Author(s):

Kailash W. Kalare ◽

Mohammad S. Obaidat ◽

Jitendra V. Tembhurne ◽

Chandrashekhar Meshram ◽

Kuei-Fang Hsiao

Keyword(s):

Sequence Alignment ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Graphics processing unit acceleration of the island model genetic algorithm using the CUDA programming platform

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6286 ◽

2021 ◽

Author(s):

Dylan M. Janssen ◽

Wayne Pullan ◽

Alan Wee‐Chung Liew

Keyword(s):

Genetic Algorithm ◽

Graphics Processing Unit ◽

Island Model ◽

Processing Unit ◽

Cuda Programming ◽

Graphics Processing

Download Full-text

Real-time, High-resolution Depth Upsampling on Embedded Accelerators

ACM Transactions on Embedded Computing Systems ◽

10.1145/3436878 ◽

2021 ◽

Vol 20 (3) ◽

pp. 1-22

Author(s):

David Langerman ◽

Alan George

Keyword(s):

High Resolution ◽

Low Power ◽

Real Time ◽

Mixed Reality ◽

Graphics Processing Unit ◽

Processing Unit ◽

Reconfigurable Logic ◽

Depth Sensors ◽

Time Requirements ◽

Graphics Processing

High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps. 1

Download Full-text