13  FMA-Based Division

The subject of this chapter is a multiplicative technique for floating-point division that was developed for IBM RISC processors in the late 1980's and remains in widespread use today. The method begins with an initial approximation of the reciprocal of the divisor, derived from tables in read-only memory. A sequence of Newton-Raphson refinements of this approximation is interleaved with a sequence of refinements of the quotient. Central to this process is a hardware fused multiply-accumulate (FMA) operation, which is assumed to be implemented in support of the standard FMA machine instructions. Of course, a critical feature of this operation is that it has the effect of performing two arithmetic operations with a single rounding.

Implementations of this method are generally slower than those based on subtractive methods of division such as those discussed in Chapter 14, but have the advantage of lower hardware requirements. In fact, the computations are typically performed by either microcode or software.

We shall present proofs of correctness of two representative algorithms based on this approach, which operate on single- and double-precision floating-point numbers, respectively. The inputs to each are a pair of operands, $ a$ and $ b$, and a rounding mode, $ {\cal R}$. The operands are $ p$-exact real numbers in the interval $ [1,2)$, where $ p=24$ or 53. $ {\cal R}$ may be any of the IEEE rounding modes, although since $ {\cal R}$ is applied here only to positive arguments, there is no distinction between $ \mathit{RTZ}$ and RDN. The returned value, as specified by IEEE 754 (see Figure 3.1), is the rounded quotient $ {\cal R}(\frac{a}{b}, p)$.

The algorithms are based on two primitive functions:

  1. A function rcp24, which computes a 24-exact approximation of the reciprocal of a 24-exact number in the interval $ [1,2)$ with relative error bounded by $ 2^{-23}$. The definition of rcp24, which is based on table reference and interpolation, is not presented here. Its accuracy, which has been verified by exhaustive computation, is stated without proof as Lemma 13.3.1.

  2. An atomic FMA operation, which computes the rounded value $ {\cal R}(xy + z, p)$ for $ p$-exact operands $ x$, $ y$, and $ z$ and rounding mode $ {\cal R}$. No restriction is imposed on the exponents of the operands.

The relevant theory, which is largely based on the work of Markstein [MAR90] and Harrison [HAR00], is developed in Sections 13.1 and 13.2. The algorithms are presented in Section 13.3 along with proofs of their correctness.

David Russinoff 2017-08-01