13.3  Examples

The single-precision algorithm is given by the following sequence of operations. The spacing of the steps is intended to denote groups of operations that may be executed in parallel. Note that the rounding mode $ \mathit{RNE}$ is used for all intermediate steps and the input mode $ {\cal R}$ is used for the final step.

Definition 13.3.1   (divsp) $ q = \mathit{divsp}(a, b, {\cal R})$ is the result of the following sequence of computations:
$\displaystyle y_0$ $\displaystyle =$ $\displaystyle \bigskip\mathit{rcp24}(b)$  
$\displaystyle q_0$ $\displaystyle =$ $\displaystyle \mathit{RNE}(ay_0, 24)$  
$\displaystyle e_0$ $\displaystyle =$ $\displaystyle \mathit{RNE}(1 - by_0, 24)$  
$\displaystyle y_1$ $\displaystyle =$ $\displaystyle \mathit{RNE}(y_0 + e_0y_0, 24)$  
$\displaystyle r_0$ $\displaystyle =$ $\displaystyle \mathit{RNE}(a - bq_0, 24)$  
$\displaystyle q_1$ $\displaystyle =$ $\displaystyle \mathit{RNE}(q_0 + r_0y_1, 24)$  
$\displaystyle r_1$ $\displaystyle =$ $\displaystyle \mathit{RNE}(a - bq_1, 24)$  
$\displaystyle q$ $\displaystyle =$ $\displaystyle {\cal R}(q_1 + r_1y_1, 24)$  

The following properties of the function rcp24 have been verified by exhaustive computation.

Lemma 13.3.1   (rcp24-spec) If $ b$ is 24-exact, $ 1 \leq b < 2$, and $ y_0 = \mathit{rcp24}(b)$, then $ \frac{1}{2} < y_0 \leq 1$ and

$\displaystyle \vert 1 - by_0\vert < 2^{-23}.
$

The next lemma, which has been similarly verified, will be used in conjunction with Lemma 13.2.2 in the proof of Theorem 13.1.

Lemma 13.3.2   For $ k = 1,\ldots,7$, let $ b = 2 - 2^{-23}k$. If $ y_1$ is computed as in Definition 13.3.1, then $ \vert 1 - by_1\vert < 2^{-24}$.

  (divsp-correct) Let $ a$ and $ b$ be 24-exact with $ 1 \leq a < 2$ and $ 1 \leq b < 2$. If $ {\cal R}$ is an IEEE rounding mode and $ q = \mathit{divsp}(a, b, {\cal R})$, then

$\displaystyle q = {\cal R}\left(\frac{a}{b}, 24\right).
$

PROOF: According to Lemma 13.1.2, we need only establish the inequalities (ii) $ \vert 1 - by_1\vert < 2^{-24}$ and (i) $ \vert\frac{a}{b} - q_1\vert < 2^{e-23}$, where $ e$ is defined as in the lemma.

Let

$\displaystyle \epsilon_0$ $\displaystyle =$ $\displaystyle 2^{-23},$  
$\displaystyle \epsilon'_1$ $\displaystyle =$ $\displaystyle \epsilon_0(\epsilon_0 + 2^{-24}(1 + \epsilon_0)),$  
$\displaystyle \epsilon_1$ $\displaystyle =$ $\displaystyle \epsilon'_1 + 2^{-24}(1 + \epsilon'_1),$  
$\displaystyle y'_1$ $\displaystyle =$ $\displaystyle y_0 + e_0y_0,$  
and     $\displaystyle d$ $\displaystyle =$ $\displaystyle \lceil 2^{48}\epsilon'_1 \rceil.$  

By Lemma 13.3.1, $ \vert 1 - by_0\vert \leq \epsilon_0$. By Lemma 13.2.1 (under the substitutions of $ y_0$ for both $ y_1$ and $ y_2$, $ \epsilon_0$ for both $ \epsilon_1$ and $ \epsilon_2$, and $ y'_1$ for $ y'_3$), $ \vert 1 - by'_1\vert \leq \epsilon'_1$ and $ \vert 1 - by_1\vert \leq \epsilon_1$. It is easily verified by direct computation that $ \epsilon'_1 < 2^{-25}$ and $ d = 6$. The required inequality (ii) then follows from Lemmas 13.2.2 and 13.3.2.

Let $ \delta_0 = \epsilon_0 + 2^{-24}(1 + \epsilon_0)$. By Lemma 13.1.1,

$\displaystyle \left\vert 1 - \frac{b}{a}q_0\right\vert \leq \delta_0.
$

Since $ \delta_0\epsilon_1 + 2^{-24}\delta_0(1 + \epsilon_1) < 2^{-25}$ (by direct computation), we may apply Lemma 13.1.3 (substituting $ y_1$, $ q_0$, $ \epsilon_1$, and $ \delta_0$ for $ y$, $ q_2$, $ \epsilon$, and $ \delta$, respectively) to conclude that (i) holds as well. 


The double-precision algorithm follows:

Definition 13.3.2   (divdp) $ q = \mathit{divdp}(a, b, {\cal R}$ is the result of the following sequence of computations:
$\displaystyle y_0$ $\displaystyle =$ $\displaystyle \mathit{rcp24}(\mathit{RTZ}(b, 24))$  
$\displaystyle q_0$ $\displaystyle =$ $\displaystyle \mathit{RNE}(ay_0, 53)$  
$\displaystyle e_0$ $\displaystyle =$ $\displaystyle \mathit{RNE}(1 - by_0, 53)$  
$\displaystyle r_0$ $\displaystyle =$ $\displaystyle \mathit{RNE}(a - bq_0, 53)$  
$\displaystyle y_1$ $\displaystyle =$ $\displaystyle \mathit{RNE}(y_0 + e_0y_0, 53)$  
$\displaystyle e_1$ $\displaystyle =$ $\displaystyle \mathit{RNE}(1 - by_1, 53)$  
$\displaystyle y_2$ $\displaystyle =$ $\displaystyle \mathit{RNE}(y_0 + e_0y_1, 53)$  
$\displaystyle q_1$ $\displaystyle =$ $\displaystyle \mathit{RNE}(q_0 + r_0y_1, 53)$  
$\displaystyle y_3$ $\displaystyle =$ $\displaystyle \mathit{RNE}(y_1 + e_1y_2, 53)$  
$\displaystyle r_1$ $\displaystyle =$ $\displaystyle \mathit{RNE}(a - bq_1, 53)$  
$\displaystyle q$ $\displaystyle =$ $\displaystyle {\cal R}(q_1 + r_1y_3, 53)$  

Note that in this case, the initial approximation is based on a truncation of the denominator, which increases its relative error:13.2

  (rcp24-rtz-error) If $ b$ be 53-exact, $ 1 \leq b < 2$, and $ y_0 = \mathit{rcp24}(\mathit{RTZ}(b, 24))$, then

$\displaystyle \vert 1 - by_0\vert \leq 2^{-22}.
$

PROOF: Let $ b_0 = \mathit{RTZ}(b, 24)$. Then $ b_0 \leq b < b_0 + 2^{-23}$ and by Lemma 13.3.3,

$\displaystyle \vert 1 - by_0\vert \leq \vert 1 - b_0y_0\vert + \vert y_0(b - b_0)\vert \leq 2^{-23} + 2^{-23} = 2^{-22}. $

The relative error of the final reciprocal approximation must be computed explicitly in 1027 cases:

Lemma 13.3.4   For $ k = 1,\ldots,1027$, let $ b = 2 - 2^{-52}k$. If $ y_1$ is computed as in Definition 13.3.2, then $ \vert 1 - by_1\vert < 2^{-53}$.

  (divdp-correct) Let $ a$ and $ b$ be 53-exact with $ 1 \leq a < 2$ and $ 1 \leq b < 2$. If $ {\cal R}$ is an IEEE rounding mode and $ q = \mathit{divdp}(a, b, {\cal R})$, then

$\displaystyle q = {\cal R}\left(\frac{a}{b}, 53\right).
$

PROOF: Applying Lemma 13.1.2 once again, we need only establish the two inequalities (ii)  $ \vert 1 - by_1\vert < 2^{-53}$ and (i)  $ \vert\frac{a}{b} - q_1\vert < 2^{e-52}$.

Let

$\displaystyle \epsilon_0$ $\displaystyle =$ $\displaystyle 2^{-22},$  
$\displaystyle \epsilon'_1$ $\displaystyle =$ $\displaystyle \epsilon_0(\epsilon_0 + 2^{-53}(1 + \epsilon_0)),$  
$\displaystyle \epsilon_1$ $\displaystyle =$ $\displaystyle \epsilon'_1 + 2^{-53}(1 + \epsilon'_1),$  
$\displaystyle \epsilon'_2$ $\displaystyle =$ $\displaystyle \epsilon_0(\epsilon_1 + 2^{-53}(1 + \epsilon_1)),$  
$\displaystyle \epsilon_2$ $\displaystyle =$ $\displaystyle \epsilon'_2 + 2^{-53}(1 + \epsilon'_2),$  
$\displaystyle \epsilon'_3$ $\displaystyle =$ $\displaystyle \epsilon_1(\epsilon_2 + 2^{-53}(1 + \epsilon_2)),$  
$\displaystyle y'_3$ $\displaystyle =$ $\displaystyle y_1 + e_1y_2,$  
and     $\displaystyle d$ $\displaystyle =$ $\displaystyle \lceil 2^{106}\epsilon'_3 \rceil.$  

Let $ b_0 = \mathit{RTZ}(b, 24)$. Then $ b_0 \leq b < b_0 + 2^{-23}$ and by Lemma 13.3.3,

$\displaystyle \vert 1 - by_0\vert \leq \vert 1 - b_0y_0\vert + \vert y_0(b - b_0)\vert \leq 2^{-23} + 2^{-23} = \epsilon_0.
$

By Lemma 13.3.3, $ \vert 1 - by_0\vert \leq \epsilon_0$. By repeated applications of Lemma 13.2.1, $ \vert 1 - by_1\vert \leq \epsilon_1$, $ \vert 1 - by_2\vert \leq \epsilon_2$, and $ \vert 1 - by'_3\vert \leq \epsilon'_3$. It is easily verified by direct computation that $ \epsilon'_3 < 2^{-54}$ and $ d = 1027$, and (ii) follows from Lemmas 13.2.2 and 13.3.4.

Let $ \delta_0 = \epsilon_0 + 2^{-53}(1 + \epsilon_0)$. Direct computation yields $ \delta_0\epsilon_1 + 2^{-53}\delta_0(1 + \epsilon_1) < 2^{-54}$, and (i) again follows from Lemma 13.1.3

David Russinoff 2017-08-01