scholarly journals Exploiting Heterogeneous Parallelism on Hybrid Metaheuristics for Vector Autoregression Models

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1781
Author(s):  
Javier Cuenca ◽  
José-Matías Cutillas-Lozano ◽  
Domingo Giménez ◽  
Alberto Pérez-Bernabeu ◽  
José J. López-Espín

In the last years, the huge amount of data available in many disciplines makes the mathematical modeling, and, more concretely, econometric models, a very important technique to explain those data. One of the most used of those econometric techniques is the Vector Autoregression Models (VAR) which are multi-equation models that linearly describe the interactions and behavior of a group of variables by using their past. Traditionally, Ordinary Least Squares and Maximum likelihood estimators have been used in the estimation of VAR models. These techniques are consistent and asymptotically efficient under ideal conditions of the data and the identification problem. Otherwise, these techniques would yield inconsistent parameter estimations. This paper considers the estimation of a VAR model by minimizing the difference between the dependent variables in a certain time, and the expression of their own past and the exogenous variables of the model (in this case denoted as VARX model). The solution of this optimization problem is approached through hybrid metaheuristics. The high computational cost due to the huge amount of data makes it necessary to exploit High-Performance Computing for the acceleration of methods to obtain the models. The parameterized, parallel implementation of the metaheuristics and the matrix formulation ease the simultaneous exploitation of parallelism for groups of hybrid metaheuristics. Multilevel and heterogeneous parallelism are exploited in multicore CPU plus multiGPU nodes, with the optimum combination of the different parallelism parameters depending on the particular metaheuristic and the problem it is applied to.

Author(s):  
Alfonso L. Castano ◽  
Javier Cuenca ◽  
Jose Matias Cutillas Lozano ◽  
Domingo Gimenez ◽  
Jose J. Lopez-Espin ◽  
...  

Author(s):  
Breno A. de Melo Menezes ◽  
Nina Herrmann ◽  
Herbert Kuchen ◽  
Fernando Buarque de Lima Neto

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 627
Author(s):  
David Marquez-Viloria ◽  
Luis Castano-Londono ◽  
Neil Guerrero-Gonzalez

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.


2019 ◽  
Vol 38 (9) ◽  
pp. 4014-4039 ◽  
Author(s):  
Matheus F. Torquato ◽  
Marcelo A. C. Fernandes

1997 ◽  
Vol 6 (1) ◽  
pp. 127-152
Author(s):  
Eric De Sturler ◽  
Volker Strumpen

Recently, the first commercial High Performance Fortran (HPF) subset compilers have appeared. This article reports on our experiences with the xHPF compiler of Applied Parallel Research, version 1.2, for the Intel Paragon. At this stage, we do not expect very High Performance from our HPF programs, even though performance will eventually be of paramount importance for the acceptance of HPF. Instead, our primary objective is to study how to convert large Fortran 77 (F77) programs to HPF such that the compiler generates reasonably efficient parallel code. We report on a case study that identifies several problems when parallelizing code with HPF; most of these problems affect current HPF compiler technology in general, although some are specific for the xHPF compiler. We discuss our solutions from the perspective of the scientific programmer, and presenttiming results on the Intel Paragon. The case study comprises three programs of different complexity with respect to parallelization. We use the dense matrix-matrix product to show that the distribution of arrays and the order of nested loops significantly influence the performance of the parallel program. We use Gaussian elimination with partial pivoting to study the parallelization strategy of the compiler. There are various ways to structure this algorithm for a particular data distribution. This example shows how much effort may be demanded from the programmer to support the compiler in generating an efficient parallel implementation. Finally, we use a small application to show that the more complicated structure of a larger program may introduce problems for the parallelization, even though all subroutines of the application are easy to parallelize by themselves. The application consists of a finite volume discretization on a structured grid and a nested iterative solver. Our case study shows that it is possible to obtain reasonably efficient parallel programs with xHPF, although the compiler needs substantial support from the programmer.


Author(s):  
Семен Евгеньевич Попов ◽  
Вадим Петрович Потапов ◽  
Роман Юрьевич Замараев

Описывается программная реализация быстрого алгоритма поиска распределенных рассеивателей для задачи построения скоростей смещений земной поверхности на базе платформы Apache Spark. Рассматривается полная схема расчета скоростей смещений методом постоянных рассеивателей. Предложенный алгоритм интегрируется в схему после этапа совмещения с субпиксельной точностью стека изображений временн´ой серии радарных снимков космического аппарата Sentinel-1. Алгоритм не является итерационным и может быть реализован в парадигме параллельных вычислений. Применяемая платформа Apache Spark позволила распределенно обрабатывать массивы стека радарных данных (от 60 изображений) в памяти на большом количестве физических узлов в сетевой среде. Время поиска распределенных рассеивателей удалось снизить в среднем до десяти раз по сравнению с однопроцессорной реализацией алгоритма. Приведены сравнительные результаты тестирования вычислительной системы на демонстрационном кластере. Алгоритм реализован на языке программирования Python c подробным описанием методов и объектов The article describes implementation of the software for a fast algorithm which finds distributed scatterers for the problem of plotting displacement velocities of the earth’s surface based on the Apache Spark platform. The Persistent Scatterer (PS) method is widely used for estimating the displacement rates of the earth’s surface. It consists of the identification of coherent radar targets (interferogram pixels) that demonstrate high phase stability during the entire observation period. The most advanced algorithm for solving the identification problem is the SqueeSAR algorithm. It allows searching and processing Distributed Scatterers (DS) - specific reflectors, integrating them into the general scheme for calculating displacement velocities using the PS method. A careful analysis of the SqueeSAR algorithm has identified areas that are critical to its performance. The whole algorithm is based on an enumeration of the initial data, where nontrivial transformations are performed at each step. The stages of searching for adjacent points in the design window with multiple passes over the entire area of the image and solving the maximization problem when assessing the real values of the interferometric phases turned out to be noticeably costly. To speed up the processing of images, it is proposed to use the Apache Spark massively parallel computing platform. Specialized primitives (Resilient Distributed Data) for recurrent inmemory processing are available here. This provides multiple accesses to the radar data loaded into memory from each cluster node and allows logical dividing of the snapshot stack into subareas. Thus calculations are performed independently in massively parallel mode. Based on the SqueeSAR mathematical model, it is assumed that the radar image data and the calculated geophysical parameters calculated are common for each statistically homogeneous sample of nearby pixels. In accordance with this assumption, the uniformity (homogeneity) of the pixels is estimated within a given window. The search for distributed scatterers occurs independently by the sequence of shifts of the windows over the entire area of the image. The window is shifted along the width and height of the image with a step equal to the width and height of the window. Pairs of samples in the window are composed of vectors of complex pixel values in each of the N images. The validity of the Kolmogorov-Smirnov criterion is checked for each of the pairs. To estimate the values of the phases of homogeneous pixels, the maximization problem is solved. The method of maximum likelihood estimation (MLE) is considered. The construction of the correct MLE form is carried out by analyzing the statistical properties of the coherence matrix of all images using the complex Wishart distribution. The Apache Spark platform applied here permits processing of distributed radar data stack arrays in memory on a large number of physical nodes in a network environment. The average search time for distributed scatterers turned out to be 10 times less compared to the uniprocessor implementation of the algorithm. The algorithm is implemented in the Python programming language with a detailed description of the objects and methods of the algorithm. The proposed algorithm and its parallel implementation allows applying the developed approaches to other problems and types of satellite data for remote sensing of the earth from space


Geophysics ◽  
2021 ◽  
pp. 1-71
Author(s):  
Hongwei Liu ◽  
Yi Luo

The finite-difference solution of the second-order acoustic wave equation is a fundamental algorithm in seismic exploration for seismic forward modeling, imaging, and inversion. Unlike the standard explicit finite difference (EFD) methods that usually suffer from the so-called "saturation effect", the implicit FD methods can obtain much higher accuracy with relatively short operator length. Unfortunately, these implicit methods are not widely used because band matrices need to be solved implicitly, which is not suitable for most high-performance computer architectures. We introduce an explicit method to overcome this limitation by applying explicit causal and anti-causal integrations. We can prove that the explicit solution is equivalent to the traditional implicit LU decomposition method in analytical and numerical ways. In addition, we also compare the accuracy of the new methods with the traditional EFD methods up to 32nd order, and numerical results indicate that the new method is more accurate. In terms of the computational cost, the newly proposed method is standard 8th order EFD plus two causal and anti-causal integrations, which can be applied recursively, and no extra memory is needed. In summary, compared to the standard EFD methods, the new method has a spectral-like accuracy; compared to the traditional LU-decomposition implicit methods, the new method is explicit. It is more suitable for high-performance computing without losing any accuracy.


Author(s):  
Hui Huang ◽  
Jian Chen ◽  
Blair Carlson ◽  
Hui-Ping Wang ◽  
Paul Crooker ◽  
...  

Due to enormous computation cost, current residual stress simulation of multipass girth welds are mostly performed using two-dimensional (2D) axisymmetric models. The 2D model can only provide limited estimation on the residual stresses by assuming its axisymmetric distribution. In this study, a highly efficient thermal-mechanical finite element code for three dimensional (3D) model has been developed based on high performance Graphics Processing Unit (GPU) computers. Our code is further accelerated by considering the unique physics associated with welding processes that are characterized by steep temperature gradient and a moving arc heat source. It is capable of modeling large-scale welding problems that cannot be easily handled by the existing commercial simulation tools. To demonstrate the accuracy and efficiency, our code was compared with a commercial software by simulating a 3D multi-pass girth weld model with over 1 million elements. Our code achieved comparable solution accuracy with respect to the commercial one but with over 100 times saving on computational cost. Moreover, the three-dimensional analysis demonstrated more realistic stress distribution that is not axisymmetric in hoop direction.


Sign in / Sign up

Export Citation Format

Share Document