Developing a prototype of high-performance graph-processing framework for NEC SX–Aurora TSUBASA vector architecture

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v21r325 ◽

2020 ◽

pp. 290-305

Author(s):

И.В. Афанасьев

Keyword(s):

Graph Algorithms ◽

High Performance ◽

Graph Algorithm ◽

Efficient Implementation ◽

Graph Processing ◽

Irregular Structure ◽

Vector Systems ◽

Order Of Magnitude ◽

Vector Graph ◽

Processing Framework

В данной статье описан подход к созданию прототипа графового фреймворка VGL (Vector Graph Library), нацеленного на эффективную реализацию графовых алгоритмов для современной векторной архитектуры NEC SX–Aurora TSUBASA. Современные векторные системы позволяют значительно ускорять приложения, интенсивно использующие подсистему памяти, подклассом которых являются графовые алгоритмы. Однако подходы к эффективной реализации графовых алгоритмов для векторных систем на сегодняшний день исследованы крайне слабо: вследствие сильно нерегулярной структуры графов реального мира, эффективно задействовать векторные особенности целевых платформ затруднительно. В работе показано, что разработанные на основе предложенного фреймворка VGL реализации графовых алгоритмов не уступают в производительности оптимизированным “вручную” аналогам за счет инкапсуляции большого числа оптимизаций графовых алгоритмов, характерных для векторных систем. Вместе с этим предложенный фреймворк позволяет значительно упростить процесс разработки графовых алгоритмов для векторных систем, на порядок сокращая объем кода реализуемых алгоритмов и скрывая от пользователя особенности программирования систем данного класса. This article describes a prototype of graph-processing framework VGL (Vector Graph Library), aimed at the efficient implementation of graph algorithms for the modern NEC SX–Aurora TSUBASA vector architecture. Present day vector systems can significantly speed up various memory-intensive applications, including graph algorithms. However, approaches to the efficient implementation of graph algorithms for vector systems have been studied extremely poorly as of today: due to the highly irregular structure of real-world graphs, it is difficult to effectively use vector features of target platforms. This paper shows that the implementations of graph algorithms developed on the basis of the proposed VGL framework show the performance comparable to their manually optimized versions due to the encapsulation of a large number of graph algorithm optimizations typical for vector systems. At the same time, the proposed framework makes it possible to significantly simplify the process of developing graph algorithms for vector systems, by an order of magnitude reducing the amount of code for implemented algorithms and hiding the programming features of systems of this class from the user.

Download Full-text

VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture

The Journal of Supercomputing ◽

10.1007/s11227-020-03564-9 ◽

2021 ◽

Author(s):

Ilya V. Afanasyev ◽

Vladimir V. Voevodin ◽

Kazuhiko Komatsu ◽

Hiroaki Kobayashi

Keyword(s):

High Performance ◽

Graph Processing ◽

Processing Framework

Download Full-text

LCC-Graph: A high-performance graph-processing framework with low communication costs

2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS) ◽

10.1109/iwqos.2016.7590434 ◽

2016 ◽

Author(s):

Yongli Cheng ◽

Fang Wang ◽

Hong Jiang ◽

Yu Hua ◽

Dan Feng ◽

...

Keyword(s):

High Performance ◽

Graph Processing ◽

Communication Costs ◽

Processing Framework

Download Full-text

GraphIn: An Online High Performance Incremental Graph Processing Framework

Euro-Par 2016: Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-43659-3_24 ◽

2016 ◽

pp. 319-333 ◽

Cited By ~ 11

Author(s):

Dipanjan Sengupta ◽

Narayanan Sundaram ◽

Xia Zhu ◽

Theodore L. Willke ◽

Jeffrey Young ◽

...

Keyword(s):

High Performance ◽

Graph Processing ◽

Processing Framework

Download Full-text

GraphPEG

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3450440 ◽

2021 ◽

Vol 18 (3) ◽

pp. 1-24

Author(s):

Yashuai Lü ◽

Hui Guo ◽

Libo Huang ◽

Qi Yu ◽

Li Shen ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Graph Algorithm ◽

Graph Processing ◽

Graph Traversal ◽

Fine Grain ◽

Large Scale Data ◽

Load Imbalance ◽

Work Distribution ◽

Level Parallelism

Due to massive thread-level parallelism, GPUs have become an attractive platform for accelerating large-scale data parallel computations, such as graph processing. However, achieving high performance for graph processing with GPUs is non-trivial. Processing graphs on GPUs introduces several problems, such as load imbalance, low utilization of hardware unit, and memory divergence. Although previous work has proposed several software strategies to optimize graph processing on GPUs, there are several issues beyond the capability of software techniques to address. In this article, we present GraphPEG, a graph processing engine for efficient graph processing on GPUs. Inspired by the observation that many graph algorithms have a common pattern on graph traversal, GraphPEG improves the performance of graph processing by coupling automatic edge gathering with fine-grain work distribution. GraphPEG can also adapt to various input graph datasets and simplify the software design of graph processing with hardware-assisted graph traversal. Simulation results show that, in comparison with two representative highly efficient GPU graph processing software framework Gunrock and SEP-Graph, GraphPEG improves graph processing throughput by 2.8× and 2.5× on average, and up to 7.3× and 7.0× for six graph algorithm benchmarks on six graph datasets, with marginal hardware cost.

Download Full-text

A Peristaltic Micropump Based on the Fast Electrochemical Actuator: Design, Fabrication, and Preliminary Testing

Actuators ◽

10.3390/act10030062 ◽

2021 ◽

Vol 10 (3) ◽

pp. 62

Author(s):

Ilia Uvarov ◽

Pavel Shlepakov ◽

Artem Melenev ◽

Kechun Ma ◽

Vitaly Svetovoy ◽

...

Keyword(s):

High Performance ◽

Drug Delivery Systems ◽

Main Part ◽

Fabrication Technology ◽

Fabrication Procedure ◽

Order Of Magnitude ◽

Peristaltic Micropump ◽

Biomedical Field ◽

Lift Off ◽

Actuator Design

Microfluidic devices providing an accurate delivery of fluids at required rates are of considerable interest, especially for the biomedical field. The progress is limited by the lack of micropumps, which are compact, have high performance, and are compatible with standard microfabrication. This paper describes a micropump based on a new driving principle. The pump contains three membrane actuators operating peristaltically. The actuators are driven by nanobubbles of hydrogen and oxygen, which are generated in the chamber by a series of short voltage pulses of alternating polarity applied to the electrodes. This process guaranties the response time of the actuators to be much shorter than that of any other electrochemical device. The main part of the pump has a size of about 3 mm, which is an order of magnitude smaller in comparison with conventional micropumps. The pump is fabricated in glass and silicon wafers using standard cleanroom processes. The channels are formed in SU-8 photoresist and the membrane is made of SiNx. The channels are sealed by two processes of bonding between SU-8 and SiNx. Functionality of the channels and membranes is demonstrated. A defect of electrodes related to the lift-off fabrication procedure did not allow a demonstration of the pumping process although a flow rate of 1.5 µl/min and dosage accuracy of 0.25 nl are expected. The working characteristics of the pump make it attractive for the use in portable drug delivery systems, but the fabrication technology must be improved.

Download Full-text

Highly efficient self-powered perovskite photodiode with an electron-blocking hole-transport NiOx layer

Scientific Reports ◽

10.1038/s41598-020-80640-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Amir Muhammad Afzal ◽

In-Gon Bae ◽

Yushika Aggarwal ◽

Jaewoo Park ◽

Hye-Ryeon Jeong ◽

...

Keyword(s):

Energy Harvesting ◽

Bias Voltage ◽

High Performance ◽

Hole Transport ◽

Transport Layer ◽

Hole Transport Layer ◽

Zero Bias ◽

Decay Times ◽

Order Of Magnitude ◽

Self Powered

AbstractHybrid organic–inorganic perovskite materials provide noteworthy compact systems that could offer ground-breaking architectures for dynamic operations and advanced engineering in high-performance energy-harvesting optoelectronic devices. Here, we demonstrate a highly effective self-powered perovskite-based photodiode with an electron-blocking hole-transport layer (NiOx). A high value of responsivity (R = 360 mA W−1) with good detectivity (D = 2.1 × 1011 Jones) and external quantum efficiency (EQE = 76.5%) is achieved due to the excellent interface quality and suppression of the dark current at zero bias voltage owing to the NiOx layer, providing outcomes one order of magnitude higher than values currently in the literature. Meanwhile, the value of R is progressively increased to 428 mA W−1 with D = 3.6 × 1011 Jones and EQE = 77% at a bias voltage of − 1.0 V. With a diode model, we also attained a high value of the built-in potential with the NiOx layer, which is a direct signature of the improvement of the charge-selecting characteristics of the NiOx layer. We also observed fast rise and decay times of approximately 0.9 and 1.8 ms, respectively, at zero bias voltage. Hence, these astonishing results based on the perovskite active layer together with the charge-selective NiOx layer provide a platform on which to realise high-performance self-powered photodiode as well as energy-harvesting devices in the field of optoelectronics.

Download Full-text

BLADYG: A Graph Processing Framework for Large Dynamic Graphs

Big Data Research ◽

10.1016/j.bdr.2017.05.003 ◽

2017 ◽

Vol 9 ◽

pp. 9-17 ◽

Cited By ~ 10

Author(s):

Sabeur Aridhi ◽

Alberto Montresor ◽

Yannis Velegrakis

Keyword(s):

Graph Processing ◽

Dynamic Graphs ◽

Processing Framework

Download Full-text

Workstation benchmark of Spark Capable Genome Analysis ToolKit 4 Variant Calling

10.1101/2020.05.17.101105 ◽

2020 ◽

Author(s):

Marcus H. Hansen ◽

Anita T. Simonsen ◽

Hans B. Ommen ◽

Charlotte G. Nyvold

Keyword(s):

Dna Sequencing ◽

Genome Analysis ◽

High Speed ◽

High Performance ◽

Variant Calling ◽

Amplicon Sequencing ◽

Targeted Sequencing ◽

Sequencing Analysis ◽

Genome Analysis Toolkit ◽

Order Of Magnitude

AbstractBackgroundRapid and practical DNA-sequencing processing has become essential for modern biomedical laboratories, especially in the field of cancer, pathology and genetics. While sequencing turn-over time has been, and still is, a bottleneck in research and diagnostics, the field of bioinformatics is moving at a rapid pace – both in terms of hardware and software development. Here, we benchmarked the local performance of three of the most important Spark-enabled Genome analysis toolkit 4 (GATK4) tools in a targeted sequencing workflow: Duplicate marking, base quality score recalibration (BQSR) and variant calling on targeted DNA sequencing using a modest hyperthreading 12-core single CPU and a high-speed PCI express solid-state drive.ResultsCompared to the previous GATK version the performance of Spark-enabled BQSR and HaplotypeCaller is shifted towards a more efficient usage of the available cores on CPU and outperforms the earlier GATK3.8 version with an order of magnitude reduction in processing time to analysis ready variants, whereas MarkDuplicateSpark was found to be thrice as fast. Furthermore, HaploTypeCallerSpark and BQSRPipelineSpark were significantly faster than the equivalent GATK4 standard tools with a combined ∼86% reduction in execution time, reaching a median rate of ten million processed bases per second, and duplicate marking was reduced ∼42%. The called variants were found to be in close agreement between the Spark and non-Spark versions, with an overall concordance of 98%. In this setup, the tools were also highly efficient when compared execution on a small 72 virtual CPU/18-node Google Cloud cluster.ConclusionIn conclusion, GATK4 offers practical parallelization possibilities for DNA sequence processing, and the Spark-enabled tools optimize performance and utilization of local CPUs. Spark utilizing GATK variant calling is several times faster than previous GATK3.8 multithreading with the same multi-core, single CPU, configuration. The improved opportunities for parallel computations not only hold implications for high-performance cluster, but also for modest laboratory or research workstations for targeted sequencing analysis, such as exome, panel or amplicon sequencing.

Download Full-text

High performance and memory efficient implementation of matrix multiplication on FPGAs

2010 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2010.5681769 ◽

2010 ◽

Cited By ~ 5

Author(s):

Guiming Wu ◽

Yong Dou ◽

Miao Wang

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Efficient Implementation ◽

Memory Efficient

Download Full-text

Gunrock: a high-performance graph processing library on the GPU

ACM SIGPLAN Notices ◽

10.1145/2858788.2688538 ◽

2015 ◽

Vol 50 (8) ◽

pp. 265-266 ◽

Cited By ~ 13

Author(s):

Yangzihao Wang ◽

Andrew Davidson ◽

Yuechao Pan ◽

Yuduo Wu ◽

Andy Riffel ◽

...

Keyword(s):

High Performance ◽

Graph Processing

Download Full-text