scholarly journals On relative errors of floating-point operations: Optimal bounds and applications

2017 ◽  
Vol 87 (310) ◽  
pp. 803-819 ◽  
Author(s):  
Claude-Pierre Jeannerod ◽  
Siegfried M. Rump
Author(s):  
Mak Andrlon ◽  
Peter Schachte ◽  
Harald Sondergaard ◽  
Peter J. Stuckey

2013 ◽  
Vol 23 (04) ◽  
pp. 1340008 ◽  
Author(s):  
LAURA CARRINGTON ◽  
MICHAEL LAURENZANO ◽  
ANANTA TIWARI

The analysis and understanding of large-scale application behavior is critical for effectively utilizing existing HPC resources and making design decisions for upcoming systems. In this work we utilize the information about the behavior of an MPI application at a series of smaller core counts to characterize its behavior at a much larger core count. Our methodology first captures the application's behavior via a set of features that are important for both performance and energy (cache hit rates, floating point intensity, ILP, etc.). We then find the best statistical fit from among a set of canonical functions in terms of how these features change across a series of small core counts. The models for a given feature can then be utilized to generate an extrapolated trace of the application at scale. The accuracy of the extrapolated traces is evaluated by calculating the error of the extrapolated trace relative to an actual trace for two large-scale applications, UH3D and SPECFEM3D. The accuracy of the fully extrapolated traces is further evaluated by comparing the results of building performance models using both the extrapolated trace along with an actual trace in order to predict application performance. For these two full-scale HPC applications, performance models built using the extrapolated traces predicted the runtime with absolute relative errors of less than 5%.


2019 ◽  
pp. 461-470
Author(s):  
Oleh Horyachyy ◽  
Leonid Moroz ◽  
Viktor Otenko

The purpose of this paper is to introduce a modification of Fast Inverse Square Root (FISR) approximation algorithm with reduced relative errors. The original algorithm uses a magic constant trick with input floating-point number to obtain a clever initial approximation and then utilizes the classical iterative Newton-Raphson formula. It was first used in the computer game Quake III Arena, causing widespread discussion among scientists and programmers, and now it can be frequently found in many scientific applications, although it has some drawbacks. The proposed algorithm has such parameters of the modified inverse square root algorithm that minimize the relative error and includes two magic constants in order to avoid one floating-point multiplication. In addition, we use the fused multiply-add function and iterative methods of higher order in the second iteration to improve the accuracy. Such algorithms do not require storage of large tables for initial approximation and can be effectively used on field-programmable gate arrays (FPGAs) and other platforms without hardware support for this function.


Computation ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. 41 ◽  
Author(s):  
Cezary J. Walczyk ◽  
Leonid V. Moroz ◽  
Jan L. Cieśliński

We present a new algorithm for the approximate evaluation of the inverse square root for single-precision floating-point numbers. This is a modification of the famous fast inverse square root code. We use the same “magic constant” to compute the seed solution, but then, we apply Newton–Raphson corrections with modified coefficients. As compared to the original fast inverse square root code, the new algorithm is two-times more accurate in the case of one Newton–Raphson correction and almost seven-times more accurate in the case of two corrections. We discuss relative errors within our analytical approach and perform numerical tests of our algorithm for all numbers of the type float.


2019 ◽  
pp. 9-13
Author(s):  
V.Ya. Mendeleyev ◽  
V.A. Petrov ◽  
A.V. Yashin ◽  
A.I. Vangonen ◽  
O.K. Taganov

Determining the surface temperature of materials with unknown emissivity is studied. A method for determining the surface temperature using a standard sample of average spectral normal emissivity in the wavelength range of 1,65–1,80 μm and an industrially produced Metis M322 pyrometer operating in the same wavelength range. The surface temperature of studied samples of the composite material and platinum was determined experimentally from the temperature of a standard sample located on the studied surfaces. The relative error in determining the surface temperature of the studied materials, introduced by the proposed method, was calculated taking into account the temperatures of the platinum and the composite material, determined from the temperature of the standard sample located on the studied surfaces, and from the temperature of the studied surfaces in the absence of the standard sample. The relative errors thus obtained did not exceed 1,7 % for the composite material and 0,5% for the platinum at surface temperatures of about 973 K. It was also found that: the inaccuracy of a priori data on the emissivity of the standard sample in the range (–0,01; 0,01) relative to the average emissivity increases the relative error in determining the temperature of the composite material by 0,68 %, and the installation of a standard sample on the studied materials leads to temperature changes on the periphery of the surface not exceeding 0,47 % for composite material and 0,05 % for platinum.


2020 ◽  
Vol 33 (109) ◽  
pp. 21-31
Author(s):  
І. Ya. Zeleneva ◽  
Т. V. Golub ◽  
T. S. Diachuk ◽  
А. Ye. Didenko

The purpose of these studies is to develop an effective structure and internal functional blocks of a digital computing device – an adder, that performs addition and subtraction operations on floating- point numbers presented in IEEE Std 754TM-2008 format. To improve the characteristics of the adder, the circuit uses conveying, that is, division into levels, each of which performs a specific action on numbers. This allows you to perform addition / subtraction operations on several numbers at the same time, which increas- es the performance of calculations, and also makes the adder suitable for use in modern synchronous cir- cuits. Each block of the conveyor structure of the adder on FPGA is synthesized as a separate project of a digital functional unit, and thus, the overall task is divided into separate subtasks, which facilitates experi- mental testing and phased debugging of the entire device. Experimental studies were performed using EDA Quartus II. The developed circuit was modeled on FPGAs of the Stratix III and Cyclone III family. An ana- logue of the developed circuit was a functionally similar device from Altera. A comparative analysis is made and reasoned conclusions are drawn that the performance improvement is achieved due to the conveyor structure of the adder. Implementation of arithmetic over the floating-point numbers on programmable logic integrated cir- cuits, in particular on FPGA, has such advantages as flexibility of use and low production costs, and also provides the opportunity to solve problems for which there are no ready-made solutions in the form of stand- ard devices presented on the market. The developed adder has a wide scope, since most modern computing devices need to process floating-point numbers. The proposed conveyor model of the adder is quite simple to implement on the FPGA and can be an alternative to using built-in multipliers and processor cores in cases where the complex functionality of these devices is redundant for a specific task.


2012 ◽  
Vol 1 (6) ◽  
pp. 67-68
Author(s):  
M. Somasekhar M. Somasekhar ◽  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document