On relative errors of floating-point operations: Optimal bounds and applications

Claude-Pierre Jeannerod; Siegfried M. Rump

doi:10.1090/mcom/3234

Optimal Bounds for Floating-Point Addition in Constant Time

2019 IEEE 26th Symposium on Computer Arithmetic (ARITH) ◽

10.1109/arith.2019.00038 ◽

2019 ◽

Author(s):

Mak Andrlon ◽

Peter Schachte ◽

Harald Sondergaard ◽

Peter J. Stuckey

Keyword(s):

Constant Time ◽

Floating Point ◽

Optimal Bounds

Download Full-text

CHARACTERIZING LARGE-SCALE HPC APPLICATIONS THROUGH TRACE EXTRAPOLATION

Parallel Processing Letters ◽

10.1142/s0129626413400082 ◽

2013 ◽

Vol 23 (04) ◽

pp. 1340008 ◽

Cited By ~ 8

Author(s):

LAURA CARRINGTON ◽

MICHAEL LAURENZANO ◽

ANANTA TIWARI

Keyword(s):

Large Scale ◽

Full Scale ◽

Floating Point ◽

Building Performance ◽

Performance Models ◽

Design Decisions ◽

Application Performance ◽

Small Core ◽

Application Behavior ◽

Relative Errors

The analysis and understanding of large-scale application behavior is critical for effectively utilizing existing HPC resources and making design decisions for upcoming systems. In this work we utilize the information about the behavior of an MPI application at a series of smaller core counts to characterize its behavior at a much larger core count. Our methodology first captures the application's behavior via a set of features that are important for both performance and energy (cache hit rates, floating point intensity, ILP, etc.). We then find the best statistical fit from among a set of canonical functions in terms of how these features change across a series of small core counts. The models for a given feature can then be utilized to generate an extrapolated trace of the application at scale. The accuracy of the extrapolated traces is evaluated by calculating the error of the extrapolated trace relative to an actual trace for two large-scale applications, UH3D and SPECFEM3D. The accuracy of the fully extrapolated traces is further evaluated by comparing the results of building performance models using both the extrapolated trace along with an actual trace in order to predict application performance. For these two full-scale HPC applications, performance models built using the extrapolated traces predicted the runtime with absolute relative errors of less than 5%.

Download Full-text

SIMPLE EFFECTIVE FAST INVERSE SQUARE ROOT ALGORITHM WITH TWO MAGIC CONSTANTS

International Journal of Computing ◽

10.47839/ijc.18.4.1616 ◽

2019 ◽

pp. 461-470

Author(s):

Oleh Horyachyy ◽

Leonid Moroz ◽

Viktor Otenko

Keyword(s):

Computer Game ◽

Initial Approximation ◽

Floating Point ◽

Square Root ◽

Original Algorithm ◽

Gate Arrays ◽

Field Programmable ◽

Floating Point Number ◽

Programmable Gate Arrays ◽

Relative Errors

The purpose of this paper is to introduce a modification of Fast Inverse Square Root (FISR) approximation algorithm with reduced relative errors. The original algorithm uses a magic constant trick with input floating-point number to obtain a clever initial approximation and then utilizes the classical iterative Newton-Raphson formula. It was first used in the computer game Quake III Arena, causing widespread discussion among scientists and programmers, and now it can be frequently found in many scientific applications, although it has some drawbacks. The proposed algorithm has such parameters of the modified inverse square root algorithm that minimize the relative error and includes two magic constants in order to avoid one floating-point multiplication. In addition, we use the fused multiply-add function and iterative methods of higher order in the second iteration to improve the accuracy. Such algorithms do not require storage of large tables for initial approximation and can be effectively used on field-programmable gate arrays (FPGAs) and other platforms without hardware support for this function.

Download Full-text

A Modification of the Fast Inverse Square Root Algorithm

Computation ◽

10.3390/computation7030041 ◽

2019 ◽

Vol 7 (3) ◽

pp. 41 ◽

Cited By ~ 1

Author(s):

Cezary J. Walczyk ◽

Leonid V. Moroz ◽

Jan L. Cieśliński

Keyword(s):

Analytical Approach ◽

Floating Point ◽

Square Root ◽

Approximate Evaluation ◽

Single Precision ◽

Seed Solution ◽

Numerical Tests ◽

Newton Raphson ◽

Relative Errors ◽

Magic Constant

We present a new algorithm for the approximate evaluation of the inverse square root for single-precision floating-point numbers. This is a modification of the famous fast inverse square root code. We use the same “magic constant” to compute the seed solution, but then, we apply Newton–Raphson corrections with modified coefficients. As compared to the original fast inverse square root code, the new algorithm is two-times more accurate in the case of one Newton–Raphson correction and almost seven-times more accurate in the case of two corrections. We discuss relative errors within our analytical approach and perform numerical tests of our algorithm for all numbers of the type float.

Download Full-text

Extension of floating-point filters to absolute and relative errors for numerical computation

Journal of Physics Conference Series ◽

10.1088/1742-6596/1218/1/012011 ◽

2019 ◽

Vol 1218 ◽

pp. 012011

Author(s):

Yuki Ohta ◽

Katsuhisa Ozaki

Keyword(s):

Numerical Computation ◽

Floating Point ◽

Relative Errors

Download Full-text

Java floating-point

JavaTech, an Introduction to Scientific and Technical Computing with Java ◽

10.1017/cbo9780511615948.028 ◽

2005 ◽

pp. 693-696

Keyword(s):

Floating Point

Download Full-text

New floating point impedance simulation using operational amplifiers

IEE Proceedings G Circuits Devices and Systems ◽

10.1049/ip-g-2.1989.0027 ◽

1989 ◽

Vol 136 (3) ◽

pp. 155 ◽

Cited By ~ 1

Author(s):

T.S. Rathore ◽

V. Singh

Keyword(s):

Operational Amplifiers ◽

Floating Point

Download Full-text

Application of pyrometer and standard sample to determine surface temperature of studied materials

Izmeritel`naya Tekhnika ◽

10.32446/0368-1025it.2019-12-9-13 ◽

2019 ◽

pp. 9-13

Author(s):

V.Ya. Mendeleyev ◽

V.A. Petrov ◽

A.V. Yashin ◽

A.I. Vangonen ◽

O.K. Taganov

Keyword(s):

Composite Material ◽

Surface Temperature ◽

Relative Error ◽

Wavelength Range ◽

Standard Sample ◽

A Priori ◽

Temperature Changes ◽

Surface Temperatures ◽

Relative Errors ◽

Normal Emissivity

Determining the surface temperature of materials with unknown emissivity is studied. A method for determining the surface temperature using a standard sample of average spectral normal emissivity in the wavelength range of 1,65–1,80 μm and an industrially produced Metis M322 pyrometer operating in the same wavelength range. The surface temperature of studied samples of the composite material and platinum was determined experimentally from the temperature of a standard sample located on the studied surfaces. The relative error in determining the surface temperature of the studied materials, introduced by the proposed method, was calculated taking into account the temperatures of the platinum and the composite material, determined from the temperature of the standard sample located on the studied surfaces, and from the temperature of the studied surfaces in the absence of the standard sample. The relative errors thus obtained did not exceed 1,7 % for the composite material and 0,5% for the platinum at surface temperatures of about 973 K. It was also found that: the inaccuracy of a priori data on the emissivity of the standard sample in the range (–0,01; 0,01) relative to the average emissivity increases the relative error in determining the temperature of the composite material by 0,68 %, and the installation of a standard sample on the studied materials leads to temperature changes on the periphery of the surface not exceeding 0,47 % for composite material and 0,05 % for platinum.

Download Full-text

CONVEYOR MODEL AND IMPLEMENTATION OF THE REAL NUMBERS ADDER ON FPGA

ELECTRICAL AND COMPUTER SYSTEMS ◽

10.15276/eltecs.33.109.2020.3 ◽

2020 ◽

Vol 33 (109) ◽

pp. 21-31

Author(s):

І. Ya. Zeleneva ◽

Т. V. Golub ◽

T. S. Diachuk ◽

А. Ye. Didenko

Keyword(s):

Performance Improvement ◽

Experimental Studies ◽

Production Costs ◽

Floating Point ◽

Computing Device ◽

Quartus Ii ◽

Functional Blocks ◽

Mental Testing ◽

Processor Cores ◽

Floating Point Numbers

The purpose of these studies is to develop an effective structure and internal functional blocks of a digital computing device – an adder, that performs addition and subtraction operations on floating- point numbers presented in IEEE Std 754TM-2008 format. To improve the characteristics of the adder, the circuit uses conveying, that is, division into levels, each of which performs a specific action on numbers. This allows you to perform addition / subtraction operations on several numbers at the same time, which increas- es the performance of calculations, and also makes the adder suitable for use in modern synchronous cir- cuits. Each block of the conveyor structure of the adder on FPGA is synthesized as a separate project of a digital functional unit, and thus, the overall task is divided into separate subtasks, which facilitates experi- mental testing and phased debugging of the entire device. Experimental studies were performed using EDA Quartus II. The developed circuit was modeled on FPGAs of the Stratix III and Cyclone III family. An ana- logue of the developed circuit was a functionally similar device from Altera. A comparative analysis is made and reasoned conclusions are drawn that the performance improvement is achieved due to the conveyor structure of the adder. Implementation of arithmetic over the floating-point numbers on programmable logic integrated cir- cuits, in particular on FPGA, has such advantages as flexibility of use and low production costs, and also provides the opportunity to solve problems for which there are no ready-made solutions in the form of stand- ard devices presented on the market. The developed adder has a wide scope, since most modern computing devices need to process floating-point numbers. The proposed conveyor model of the adder is quite simple to implement on the FPGA and can be an alternative to using built-in multipliers and processor cores in cases where the complex functionality of these devices is redundant for a specific task.

Download Full-text

Floating Point Operations in PipeRench CGRA

International Journal of Scientific Research ◽

10.15373/22778179/nov2012/24 ◽

2012 ◽

Vol 1 (6) ◽

pp. 67-68

Author(s):

M. Somasekhar M. Somasekhar ◽

Keyword(s):

Floating Point

Download Full-text