II  Floating-Point Arithmetic

According to the IEEE floating-point standard [IEEE08], each of the elementary arithmetic operations, including addition, multiplication, division, square root extraction, and fused multiply-add,

... shall be performed as if it first produced an intermediate result correct to infinite precision and then rounded that result according to one of the [supported] modes ...
Since the operands (or operand, in the case of square root) and final result are represented as bit vectors, the relationship between inputs and outputs is as diagrammed in Figure 3.1.

Figure 3.1: IEEE Specification
\begin{figure}
\setlength{\unitlength}{2mm}
\begin{picture}(64,35)(0,-8)
\put(5...
...(53.5,27.5){\large Rounded}
\put(54,25){\large Result}
\end{picture}\end{figure}

That is, the pure mathematical operation to be implemented is applied to the decoded values of the inputs, and the result of this operation is rounded and encoded as the output. Of course, an implementation is not actually required to generate the output by such a procedure, and in fact, in most cases an explicit representation of the precise unrounded result is infeasible. In Chapters 4 and 5, we describe the common schemes for encoding real numbers as bit vectors. Chapter 6 addresses the problem of rounding.



Subsections
David Russinoff 2017-08-01