Conversion of Mersenne Twister to double-precision floating-point numbers

We study the strange behavior in floating-point arithmetic of a function proposed by Nicholas Higham, consisting of repeated square roots extraction followed by the same number of times squaring and find its fixpoints. For IEEE standard double precision floating point numbers the fixpoints have the form \[ x \in \left\{\left( 1+k\mathrm{eps}\right) ^{\frac{1}{\mathrm{eps}}},\quad k=\left[ -745:\frac{1}{2}:-\frac{1}{2},0:709\right]\right\} \cup \{0\} , \] where \mathrm{eps} is the machine epsilon."

Download Full-text

Scaling Up and Down of 3-D Floating-point Data in Quantum Computation

10.21203/rs.3.rs-757209/v1 ◽

2021 ◽

Author(s):

Meiyu Xu ◽

Dayong Lu ◽

Xiaoyun Sun

Keyword(s):

Quantum Computation ◽

Scaling Up ◽

Floating Point ◽

Geometric Transformation ◽

Double Precision ◽

Quantum Image ◽

Image Scaling ◽

Trilinear Interpolation ◽

Point Data ◽

Floating Point Numbers

Abstract In the past few decades, quantum computation has become increasingly attractivedue to its remarkable performance. Quantum image scaling is considered a common geometric transformation in quantum image processing, however, the quantum floating-point data version of which does not exist. Is there a corresponding scaling for 2-D and 3-D floating-point data? The answer is yes.In this paper, we present quantum scaling up and down scheme for floating-point data by using trilinear interpolation method in 3-D space. This scheme offers better performance (in terms of the precision of floating-point numbers) for realizing the quantum floating-point algorithms compared to previously classical approaches. The Converter module we proposed can solve the conversion of fixed-point numbers to floating-point numbers of arbitrary size data with p + q qubits based on IEEE-754 format, instead of 32-bit single-precision, 64-bit double precision or 128-bit extended-precision. Usually, we use nearest neighbor interpolation and bilinear interpolation to achieve quantum image scaling algorithms, which are not applicable in high-dimensional space. This paper proposes trilinear interpolation of floating-point numbers in 3-D space to achieve quantum algorithms of scaling up and down for 3-D floating-point data. Finally, the circuits of quantum scaling up and down for 3-D floating-point data are designed.

Download Full-text

A Latency-Effective Pipelined Divider for Double-Precision Floating-Point Numbers

IEEE Access ◽

10.1109/access.2020.3022657 ◽

2020 ◽

Vol 8 ◽

pp. 165740-165747

Author(s):

Juwon Yun ◽

Jinyoung Lee ◽

Woo-Nam Chung ◽

Cheong Ghil Kim ◽

Woo-Chan Park

Keyword(s):

Floating Point ◽

Double Precision ◽

Floating Point Numbers

Download Full-text

An IEEE 754 double-precision floating-point multiplier for denormalized and normalized floating-point numbers

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) ◽

10.1109/asap.2015.7245706 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ross Thompson ◽

James E. Stine

Keyword(s):

Floating Point ◽

Double Precision ◽

Floating Point Numbers

Download Full-text

A PRNG Specialized in Double Precision Floating Point Numbers Using an Affine Transition

Monte Carlo and Quasi-Monte Carlo Methods 2008 ◽

10.1007/978-3-642-04107-5_38 ◽

2009 ◽

pp. 589-602 ◽

Cited By ~ 11

Author(s):

Mutsuo Saito ◽

Makoto Matsumoto

Keyword(s):

Floating Point ◽

Double Precision ◽

Floating Point Numbers

Download Full-text

CONVEYOR MODEL AND IMPLEMENTATION OF THE REAL NUMBERS ADDER ON FPGA

ELECTRICAL AND COMPUTER SYSTEMS ◽

10.15276/eltecs.33.109.2020.3 ◽

2020 ◽

Vol 33 (109) ◽

pp. 21-31

Author(s):

І. Ya. Zeleneva ◽

Т. V. Golub ◽

T. S. Diachuk ◽

А. Ye. Didenko

Keyword(s):

Performance Improvement ◽

Experimental Studies ◽

Production Costs ◽

Floating Point ◽

Computing Device ◽

Quartus Ii ◽

Functional Blocks ◽

Mental Testing ◽

Processor Cores ◽

Floating Point Numbers

The purpose of these studies is to develop an effective structure and internal functional blocks of a digital computing device – an adder, that performs addition and subtraction operations on floating- point numbers presented in IEEE Std 754TM-2008 format. To improve the characteristics of the adder, the circuit uses conveying, that is, division into levels, each of which performs a specific action on numbers. This allows you to perform addition / subtraction operations on several numbers at the same time, which increas- es the performance of calculations, and also makes the adder suitable for use in modern synchronous cir- cuits. Each block of the conveyor structure of the adder on FPGA is synthesized as a separate project of a digital functional unit, and thus, the overall task is divided into separate subtasks, which facilitates experi- mental testing and phased debugging of the entire device. Experimental studies were performed using EDA Quartus II. The developed circuit was modeled on FPGAs of the Stratix III and Cyclone III family. An ana- logue of the developed circuit was a functionally similar device from Altera. A comparative analysis is made and reasoned conclusions are drawn that the performance improvement is achieved due to the conveyor structure of the adder. Implementation of arithmetic over the floating-point numbers on programmable logic integrated cir- cuits, in particular on FPGA, has such advantages as flexibility of use and low production costs, and also provides the opportunity to solve problems for which there are no ready-made solutions in the form of stand- ard devices presented on the market. The developed adder has a wide scope, since most modern computing devices need to process floating-point numbers. The proposed conveyor model of the adder is quite simple to implement on the FPGA and can be an alternative to using built-in multipliers and processor cores in cases where the complex functionality of these devices is redundant for a specific task.

Download Full-text

Functions for manipulating floating-point numbers

ACM SIGNUM Newsletter ◽

10.1145/1057514.1057515 ◽

1979 ◽

Vol 14 (4) ◽

pp. 11-13 ◽

Cited By ~ 4

Author(s):

John Reid

Keyword(s):

Floating Point ◽

Floating Point Numbers

Download Full-text

Ultralow-Latency VLSI Architecture Based on a Linear Approximation Method for Computing Nth Roots of Floating-Point Numbers

IEEE Transactions on Circuits and Systems I Regular Papers ◽

10.1109/tcsi.2020.3038417 ◽

2020 ◽

pp. 1-13

Author(s):

Fei Lyu ◽

Xiaoqi Xu ◽

Yu Wang ◽

Yuanyong Luo ◽

Yuxuan Wang ◽

...

Keyword(s):

Linear Approximation ◽

Approximation Method ◽

Vlsi Architecture ◽

Floating Point ◽

Linear Approximation Method ◽

Floating Point Numbers

Download Full-text