SatEC: A 5G Satellite Edge Computing Framework Based on Microservice Architecture

Lei Yan; Suzhi Cao; Yongsheng Gong; Hao Han; Junyong Wei; Yi Zhao; Shuling Yang

doi:10.3390/s19040831

SatEC: A 5G Satellite Edge Computing Framework Based on Microservice Architecture

Sensors ◽

10.3390/s19040831 ◽

2019 ◽

Vol 19 (4) ◽

pp. 831 ◽

Cited By ~ 12

Author(s):

Lei Yan ◽

Suzhi Cao ◽

Yongsheng Gong ◽

Hao Han ◽

Junyong Wei ◽

...

Keyword(s):

Graphics Processing Unit ◽

Edge Computing ◽

Satellite Network ◽

Processing Unit ◽

Central Processing ◽

5G Network ◽

Field Programmable ◽

High Bandwidth ◽

Basic Services ◽

Computing Framework

As outlined in the 3Gpp Release 16, 5G satellite access is important for 5G network development in the future. A terrestrial-satellite network integrated with 5G has the characteristics of low delay, high bandwidth, and ubiquitous coverage. A few researchers have proposed integrated schemes for such a network; however, these schemes do not consider the possibility of achieving optimization of the delay characteristic by changing the computing mode of the 5G satellite network. We propose a 5G satellite edge computing framework (5GsatEC), which aims to reduce delay and expand network coverage. This framework consists of embedded hardware platforms and edge computing microservices in satellites. To increase the flexibility of the framework in complex scenarios, we unify the resource management of the central processing unit (CPU), graphics processing unit (GPU), and field-programmable gate array (FPGA); we divide the services into three types: system services, basic services, and user services. In order to verify the performance of the framework, we carried out a series of experiments. The results show that 5GsatEC has a broader coverage than the ground 5G network. The results also show that 5GsatEC has lower delay, a lower packet loss rate, and lower bandwidth consumption than the 5G satellite network.

Download Full-text

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Electronics ◽

10.3390/electronics8030295 ◽

2019 ◽

Vol 8 (3) ◽

pp. 295 ◽

Cited By ~ 15

Author(s):

Min Zhang ◽

Linpeng Li ◽

Hai Wang ◽

Yan Liu ◽

Hongbo Qin ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Short Term Memory ◽

Graphics Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Pruning Strategy ◽

Field Programmable ◽

Storage Format ◽

Evaluation Board

Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs).

Download Full-text

Recent Results on the Implementation of a Burst Error and Burst Erasure Channel Emulator Using an FPGA Architecture

Journal of Communications Software and Systems ◽

10.24138/jcomss.v16i1.766 ◽

2020 ◽

Vol 16 (1) ◽

pp. 19-29

Author(s):

Caterina Travan ◽

Francesca Vatta ◽

Fulvio Babich

Keyword(s):

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Transmission Channel ◽

Central Processing ◽

Fpga Architecture ◽

Burst Error ◽

Field Programmable ◽

Erasure Channel ◽

Burst Erasure

The behaviour of a transmission channel may be simulated using the performance abilities of current generation multiprocessing hardware, namely, a multicore Central Processing Unit (CPU), a general purpose Graphics Processing Unit (GPU), or a Field Programmable Gate Array (FPGA). These were investigated by Cullinan et al. in a recent paper (published in 2012) where these three devices capabilities were compared to determine which device would be best suited towards which specific task. In particular, it was shown that, for the application which is objective of our work (i.e., for a transmission channel simulation), the FPGA is 26.67 times faster than the GPU and 10.76 times faster than the CPU. Motivated by these results, in this paper we propose and present a direct hardware emulation. In particular, a Cyclone II FPGA architecture is implemented to simulate a burst error channel behaviour, in which errors are clustered together, and a burst erasure channel behaviour, in which the erasures are clustered together. The results presented in the paper are valid for any FPGA architecture that may be considered for this scope.

Download Full-text

A TensorFlow Extension Framework for Optimized Generation of Hardware CNN Inference Engines

Technologies ◽

10.3390/technologies8010006 ◽

2020 ◽

Vol 8 (1) ◽

pp. 6 ◽

Cited By ~ 1

Author(s):

Vasileios Leon ◽

Spyridon Mouselinos ◽

Konstantina Koliogeorgi ◽

Sotirios Xydis ◽

Dimitrios Soudris ◽

...

Keyword(s):

Integrated Circuit ◽

High Performance ◽

Graphics Processing Unit ◽

Inference Engine ◽

Efficient Solutions ◽

Processing Unit ◽

Central Processing ◽

Inference Engines ◽

Field Programmable ◽

Hardware Description

The workloads of Convolutional Neural Networks (CNNs) exhibit a streaming nature that makes them attractive for reconfigurable architectures such as the Field-Programmable Gate Arrays (FPGAs), while their increased need for low-power and speed has established Application-Specific Integrated Circuit (ASIC)-based accelerators as alternative efficient solutions. During the last five years, the development of Hardware Description Language (HDL)-based CNN accelerators, either for FPGA or ASIC, has seen huge academic interest due to their high-performance and room for optimizations. Towards this direction, we propose a library-based framework, which extends TensorFlow, the well-established machine learning framework, and automatically generates high-throughput CNN inference engines for FPGAs and ASICs. The framework allows software developers to exploit the benefits of FPGA/ASIC acceleration without requiring any expertise on HDL development and low-level design. Moreover, it provides a set of optimization knobs concerning the model architecture and the inference engine generation, allowing the developer to tune the accelerator according to the requirements of the respective use case. Our framework is evaluated by optimizing the LeNet CNN model on the MNIST dataset, and implementing FPGA- and ASIC-based accelerators using the generated inference engine. The optimal FPGA-based accelerator on Zynq-7000 delivers 93% less memory footprint and 54% less Look-Up Table (LUT) utilization, and up to 10× speedup on the inference execution vs. different Graphics Processing Unit (GPU) and Central Processing Unit (CPU) implementations of the same model, in exchange for a negligible accuracy loss, i.e., 0.89%. For the same accuracy drop, the 45 nm standard-cell-based ASIC accelerator provides an implementation which operates at 520 MHz and occupies an area of 0.059 mm 2 , while the power consumption is ∼7.5 mW.

Download Full-text

Numerical simulation of flattened heat pipe with double heat sources for CPU and GPU cooling application in laptop computers

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwaa091 ◽

2020 ◽

Author(s):

Wisoot Sanhan ◽

Kambiz Vafai ◽

Niti Kammuang-Lue ◽

Pradit Terdtoon ◽

Phrut Sakulchangsatjatai

Keyword(s):

Experimental Data ◽

Heat Pipe ◽

Graphics Processing Unit ◽

Processing Unit ◽

Heat Sources ◽

Final Thickness ◽

Laptop Computers ◽

Central Processing ◽

Graphics Processing ◽

Good Agreement

Abstract An investigation of the effect of the thermal performance of the flattened heat pipe on its double heat sources acting as central processing unit and graphics processing unit in laptop computers is presented in this work. A finite element method is used for predicting the flattening effect of the heat pipe. The cylindrical heat pipe with a diameter of 6 mm and the total length of 200 mm is flattened into three final thicknesses of 2, 3, and 4 mm. The heat pipe is placed under a horizontal configuration and heated with heater 1 and heater 2, 40 W in combination. The numerical model shows good agreement compared with the experimental data with the standard deviation of 1.85%. The results also show that flattening the cylindrical heat pipe to 66.7 and 41.7% of its original diameter could reduce its normalized thermal resistance by 5.2%. The optimized final thickness or the best design final thickness for the heat pipe is found to be 2.5 mm.

Download Full-text

A Parallel-Computing Approach for Vector Road-Network Matching Using GPU Architecture

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120472 ◽

2018 ◽

Vol 7 (12) ◽

pp. 472 ◽

Cited By ~ 1

Author(s):

Bo Wan ◽

Lin Yang ◽

Shunping Zhou ◽

Run Wang ◽

Dezhi Wang ◽

...

Keyword(s):

Road Network ◽

Large Scale ◽

Graphics Processing Unit ◽

Road Networks ◽

Processing Unit ◽

Data Partition ◽

Matching Method ◽

The Road ◽

Central Processing ◽

Relaxation Matching

The road-network matching method is an effective tool for map integration, fusion, and update. Due to the complexity of road networks in the real world, matching methods often contain a series of complicated processes to identify homonymous roads and deal with their intricate relationship. However, traditional road-network matching algorithms, which are mainly central processing unit (CPU)-based approaches, may have performance bottleneck problems when facing big data. We developed a particle-swarm optimization (PSO)-based parallel road-network matching method on graphics-processing unit (GPU). Based on the characteristics of the two main stages (similarity computation and matching-relationship identification), data-partition and task-partition strategies were utilized, respectively, to fully use GPU threads. Experiments were conducted on datasets with 14 different scales. Results indicate that the parallel PSO-based matching algorithm (PSOM) could correctly identify most matching relationships with an average accuracy of 84.44%, which was at the same level as the accuracy of a benchmark—the probability-relaxation-matching (PRM) method. The PSOM approach significantly reduced the road-network matching time in dealing with large amounts of data in comparison with the PRM method. This paper provides a common parallel algorithm framework for road-network matching algorithms and contributes to integration and update of large-scale road-networks.

Download Full-text

Embedded GPU Implementation for High-Performance Ultrasound Imaging

Electronics ◽

10.3390/electronics10080884 ◽

2021 ◽

Vol 10 (8) ◽

pp. 884

Author(s):

Stefano Rossi ◽

Enrico Boni

Keyword(s):

High Performance ◽

Graphics Processing Unit ◽

Digital Signal ◽

Processing Unit ◽

Embedded Computing ◽

Field Programmable ◽

Peripheral Component Interconnect ◽

Programmable Gate Arrays ◽

Graphics Processing ◽

Signal Processors

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC.

Download Full-text

Implementación de redes neuronales en FPGAs usando tipos de datos de punto fijo

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jjii3a.4849 ◽

2020 ◽

Vol 8 ◽

Author(s):

Daniel Enériz Orta ◽

Nicolás Medrano Marqués ◽

Belén Calvo López

Keyword(s):

Field Programmable Gate Array ◽

A Priori ◽

Central Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Field Programmable ◽

Gate Array

La capacidad de estimar funciones no lineales hace que las redes neuronales sean una de las herramientas más usadas para aplicar fusión sensorial, permitiendo combinar la salida de diferentes sensores para obtener información de la que a priori no se dispone. Por otra parte, la capacidad de procesamiento paralelo de las FPGAs (Field-Programmable Gate Array) las hace idóneas para implementar redes neuronales ubicuas, permitiendo inferir resultados más rápido que una CPU (Central Processing Unit) sin necesidad de una conexión activa a internet. De esta forma, en este artículo se propone un flujo de trabajo para diseñar, entrenar e implementar una red neuronal en una FPGA Xilinx PYNQ Z2 que use tipos de dato de punto fijo para hacer fusión sensorial. Dicho flujo de trabajo es probado mediante el desarrollo de una red neuronal que combine las salidas de una nariz artificial de 16 sensores para obtener una estimación de las concentraciones de CH4 y C2H4.

Download Full-text

Lightweight Architecture for Real-Time Hand Pose Estimation with Deep Supervision

Symmetry ◽

10.3390/sym11040585 ◽

2019 ◽

Vol 11 (4) ◽

pp. 585

Author(s):

Yufei Wu ◽

Xiaofei Ruan ◽

Yu Zhang ◽

Huang Zhou ◽

Shengyu Du ◽

...

Keyword(s):

Real Time ◽

Pose Estimation ◽

Graphics Processing Unit ◽

Parallel Execution ◽

Processing Unit ◽

Network Efficiency ◽

Hand Pose Estimation ◽

Central Processing ◽

Deployment Optimization ◽

Hand Pose

The high demand for computational resources severely hinders the deployment of deep learning applications in resource-limited devices. In this work, we investigate the under-studied but practically important network efficiency problem and present a new, lightweight architecture for hand pose estimation. Our architecture is essentially a deeply-supervised pruned network in which less important layers and branches are removed to achieve a higher real-time inference target on resource-constrained devices without much accuracy compromise. We further make deployment optimization to facilitate the parallel execution capability of central processing units (CPUs). We conduct experiments on NYU and ICVL datasets and develop a demo1 using the RealSense camera. Experimental results show our lightweight network achieves an average running time of 32 ms (31.3 FPS, the original is 22.7 FPS) before deployment optimization. Meanwhile, the model is only about half parameters size of the original one with 11.9 mm mean joint error. After the further optimization with OpenVINO, the optimized model can run at 56 FPS on CPUs in contrast to 44 FPS running on a graphics processing unit (GPU) (Tensorflow) and it can achieve the real-time goal.

Download Full-text

A GPU-based 2D shallow water quality model

Journal of Hydroinformatics ◽

10.2166/hydro.2020.030 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1182-1197

Author(s):

Geovanny Gordillo ◽

Mario Morales-Hernández ◽

I. Echeverribar ◽

Javier Fernández-Pato ◽

Pilar García-Navarro

Keyword(s):

Water Quality ◽

Shallow Water ◽

Graphics Processing Unit ◽

Temporal Distribution ◽

Water Quality Model ◽

Quality Analysis ◽

Test Case ◽

Processing Unit ◽

Quality Model ◽

Central Processing

Abstract In this study, a 2D shallow water flow solver integrated with a water quality model is presented. The interaction between the main water quality constituents included is based on the Water Quality Analysis Simulation Program. Efficiency is achieved by computing with a combination of a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) device. This technique is intended to provide robust and accurate simulations with high computation speedups with respect to a single-core CPU in real events. The proposed numerical model is evaluated in cases that include the transport and reaction of water quality components over irregular bed topography and dry–wet fronts, verifying that the numerical solution in these situations conserves the required properties (C-property and positivity). The model can operate in any steady or unsteady form allowing an efficient assessment of the environmental impact of water flows. The field data from an unsteady river reach test case are used to show that the model is capable of predicting the measured temporal distribution of dissolved oxygen and water temperature, proving the robustness and computational efficiency of the model, even in the presence of noisy signals such as wind speed.

Download Full-text

Accelerating Event Detection with DGCNN and FPGAs

Electronics ◽

10.3390/electronics9101666 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1666

Author(s):

Zhe Han ◽

Jingfei Jiang ◽

Linbo Qiao ◽

Yong Dou ◽

Jinwei Xu ◽

...

Keyword(s):

Language Processing ◽

Event Detection ◽

Graphics Processing Unit ◽

Network Size ◽

Processing Unit ◽

Sigmoid Function ◽

Pipelined Architecture ◽

Proposed Model ◽

Field Programmable ◽

Graphics Processing

Recently, Deep Neural Networks (DNNs) have been widely used in natural language processing. However, DNNs are often computation-intensive and memory-expensive. Therefore, deploying DNNs in the real world is very difficult. In order to solve this problem, we proposed a network model based on the dilate gated convolutional neural network, which is very hardware-friendly. We further expanded the word representations and depth of the network to improve the performance of the model. We replaced the Sigmoid function to make it more friendly for hardware computation without loss, and we quantized the network weights and activations to compress the network size. We then proposed the first FPGA (Field Programmable Gate Array)-based event detection accelerator based on the proposed model. The accelerator significantly reduced the latency with the fully pipelined architecture. We implemented the accelerator on the Xilinx XCKU115 FPGA. The experimental results show that our model obtains the highest F1-score of 84.6% in the ACE 2005 corpus. Meanwhile, the accelerator achieved 95.2 giga operations (GOP)/s and 13.4 GOPS/W in performance and energy efficiency, which is 17/158 times higher than the Graphics Processing Unit (GPU).

Download Full-text