6.5  IEEE Rounding

The IEEE standard prescribes four rounding modes: “round to nearest” (RNE), “round toward 0” ( $ \mathit{RTZ}$), “round toward $ +\infty$', and “round toward $ -\infty$”. The last two are formalized here by the functions RUP and RDN.

Definition 6.5.1   (rup) For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$,

$\displaystyle \mathit{RUP}(x,n) = \left\{\begin{array}{ll}
RAZ(x,n) & \mbox{if $x \geq 0$}\\
\mathit{RTZ}(x,n) & \mbox{if $x < 0$.}\end{array} \right.$

Definition 6.5.2   (rdn) For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$,

$\displaystyle \mathit{RDN}(x,n) = \left\{\begin{array}{ll}
\mathit{RTZ}(x,n) & \mbox{if $x \geq 0$}\\
RAZ(x,n) & \mbox{if $x < 0$.}
\end{array} \right.$

The essential properties of these modes are given by the following two results.

  (rup-lower-bound) For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$,

$\displaystyle \mathit{RUP}(x,n) \geq x.$

PROOF: If $ x \geq 0$, then

$\displaystyle \mathit{RUP}(x,n) = RAZ(x,n) = \vert RAZ(x,n)\vert \geq \vert x\vert = x;$

if $ x<0$, then

$\displaystyle \mathit{RUP}(x,n) = \mathit{RTZ}(x,n) = -\vert\mathit{RTZ}(x,n)\vert \geq -\vert x\vert = x.$

  (rdn-upper-bound) For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$,

$\displaystyle \mathit{RDN}(x,n) \leq x.$

PROOF: If $ x \geq 0$, then

$\displaystyle \mathit{RDN}(x,n) = \mathit{RTZ}(x,n) = \vert\mathit{RTZ}(x,n)\vert \leq \vert x\vert = x;$

if $ x<0$, then

$\displaystyle \mathit{RUP}(x,n) = RAZ(x,n) = -\vert RAZ(x,n)\vert \leq -\vert x\vert = x.$

In this section, we collect a set of general results that pertain to all of the IEEE rounding modes, most of which are essentially restatements of lemmas that are proved in earlier sections. Since these results also hold for two of the other modes that we have discussed, RAZ and $ RNA$, we shall state them as generally as possible. For this purpose, we define a common rounding mode to be any of the six, i.e., $ \mathit{RTZ}$, RAZ, RNE, $ RNA$, RUP, or RDN.

  (rnd-choice) Let $ {\cal R}$ be a common rounding mode. For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$, either

$\displaystyle {\cal R}(x,n) = \mathit{RTZ}(x,n)$

or

$\displaystyle {\cal R}(x,n) = RAZ(x,n).$

PROOF: This is an immediate consequence of Definitions 6.3.1, 6.3.2, 6.5.1, and 6.5.2


Most of following results are consequences of Lemma 6.5.3 and the corresponding lemmas pertaining to $ \mathit{RTZ}$ and RAZ.

  (sgn-rnd) Let $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$. Let $ {\cal R}$ be a common rounding mode. If $ n > 0$, then

$\displaystyle sgn({\cal R}(x,n)) = sgn(x).$

PROOF: See Lemmas 6.1.2 and 6.2.1


Note that RUP and RDN do not share the oddness property held by the other modes. A generalization is given by the following lemma.

  (rnd-minus) Let $ {\cal R}$ be a common rounding mode and let

$\displaystyle \hat{\cal R} = \left\{\begin{array}{ll}
\mathit{RDN} & \mbox{if ...
... ${\cal R} = \mathit{RDN}$}\\
{\cal R} & \mbox{otherwise.}\end{array} \right.$

For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$,

$\displaystyle {\cal R}(-x,n) = -\hat{\cal R}(x,n).$

PROOF: Suppose, for example, that $ {\cal R} = \mathit{RUP}$ and $ x>0$. Then since $ -x < 0$,

$\displaystyle {\cal R}(-x,n) = \mathit{RUP}(-x,n) = \mathit{RTZ}(-x,n) = -\mathit{RTZ}(x,n) = -\hat{\cal R}(x,n).$

The other cases are handled similarly. 

  (rnd-exactp-a) Let $ {\cal R}$ be a common rounding mode. For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$, $ {\cal R}(x,n)$ is $ n$-exact.

PROOF: See Lemmas 6.1.9 and 6.2.11

  (rnd-exactp-b) Let $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$ and let $ {\cal R}$ be a common rounding mode. If $ n > 0$ and x is $ n$-exact, then

$\displaystyle {\cal R}(x,n) = x.$

PROOF: See Lemmas 6.1.10 and 6.2.13

  (rnd-exactp-c,rnd-exactp-d) Let $ x \in \mathbb{R}$, $ a \in \mathbb{R}$, and $ n \in \mathbb{N}$, $ n > 0$, and let $ {\cal R}$ be a common rounding mode. Suppose $ a$ is $ n$-exact.

(a) If $ a \geq x$, then $ a \geq {\cal R}(x,n)$.
(b) If $ a \leq x$, then $ a \leq {\cal R}(x,n)$.

PROOF: See Lemmas 6.1.11 and 6.2.14

  (rnd-diff) Let $ x \in \mathbb{R}$, $ n \in \mathbb{N}$, $ n > 0$, and let $ {\cal R}$ be a common rounding mode. Then

$\displaystyle \vert x - {\cal R}(x,n)\vert < 2^{expo(x)-n+1}.$

PROOF: See Lemmas 6.1.8 and 6.2.7

  (expo-rnd) Let $ {\cal R}$ be a common rounding mode. For all $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$, if $ \vert{\cal R}(x,n)\vert \neq 2^{expo(x)+1}$, then

$\displaystyle expo({\cal R}(x,n)) = expo(x).$

PROOF: See Lemmas 6.1.6 and 6.2.10

  (rnd-monotone) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, and $ n \in \mathbb{N}$. Let $ {\cal R}$ be a common rounding mode. If $ x \leq y$ and $ n > 0$, then

$\displaystyle {\cal R}(x,n) \leq {\cal R}(y,n).$

PROOF: The modes $ \mathit{RTZ}$, RAZ, RNE, and $ RNA$ are covered by Lemmas 6.1.13, 6.2.17, 6.3.17, and 6.3.36. For the modes RUP and RDN, we may assume, using Lemma 6.5.4, that $ x$ and $ y$ are either both positive or both negative. It follows that either

$\displaystyle {\cal R}(x,n) = \mathit{RTZ}(x,n) \leq \mathit{RTZ}(y,n) = {\cal R}(y,n)$

or

$\displaystyle {\cal R}(x,n) = RAZ(x,n) \leq RAZ(y,n) = {\cal R}(y,n).$

  (rnd-shift) Let $ {\cal R}$ be a common rounding mode. For all $ x \in \mathbb{R}$, $ n \in \mathbb{N}$, and $ k \in \mathbb{Z}$,

$\displaystyle {\cal R}(2^kx,n) = 2^k {\cal R}(x,n).$

PROOF: The modes $ \mathit{RTZ}$, RAZ, RNE, and $ RNA$ are covered by Lemmas 6.1.4, 6.2.3, 6.3.8, and 6.3.29. For the modes RUP and RDN, we may assume, using Lemma 6.5.5, that $ x>0$, in which case

$\displaystyle \mathit{RUP}(2^kx,n) = RAZ(2^kx,n) = 2^kRAZ(x,n) = 2^k\mathit{RUP}(x,n)$

and

$\displaystyle \mathit{RDN}(2^kx,n) = \mathit{RTZ}(2^kx,n) = 2^k\mathit{RTZ}(x,n) = 2^k\mathit{RDN}(x,n).$

  (plus-rnd) Let $ {\cal R}$ be a common rounding mode, $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, and $ k \in \mathbb{Z}$ with $ x \geq 0$ and $ y \geq 0$. Let $ k' = k+expo(x)-expo(y)$ and $ k” = k+expo(x+y)-expo(y)$. If $ x$ is $ (k'-1)$-exact, then

$\displaystyle x+{\cal R}(y,k) = {\cal R}(x+y,k+expo(x+y)-expo(y)).$

PROOF: See Lemmas 6.1.16 and 6.2.21

  (rnd<=raz, rnd>=rtz) Let $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$ and let $ {\cal R}$ be a common rounding mode. If $ x \geq 0$, then

$\displaystyle \mathit{RTZ}(x,n) \leq {\cal R}(x,n) \leq RAZ(x,n).$

PROOF: This is an immediate consequence of Lemmas 6.1.5 and 6.2.5:

$\displaystyle \mathit{RTZ}(x,n) = \vert\mathit{RTZ}(x,n)\vert \leq \vert x\vert \leq \vert RAZ(x,n)\vert = RAZ(x,n).$

If $ x$ is not $ (n+1)$-exact and $ y$ is sufficiently close to $ x$, then $ x$ and $ y$ round to the same value under any common rounding mode:

  (rnd<equal, rnd>equal) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, and $ n \in \mathbb{N}$ with $ x>0$ and $ n > 0$. Let $ {\cal R}$ be a common rounding mode. Assume that $ x$ is not $ (n+1)$-exact. If either

(a) $ \mathit{RTZ}(x, n+1) < y < x$ or

(b) $ x < y < \mathit{RAZ}(x, n+1)$,

then $ {\cal R}(x, n) = {\cal R}(y, n)$.

PROOF: We show the proof of (a); (b) is similar.

By Lemma 6.5.11, we need only show that $ {\cal R}(x,n) \leq {\cal R}(y,n)$. We also note that by Lemmas 6.1.15 and 6.1.5,

$\displaystyle \mathit{RTZ}(x, n) = \mathit{RTZ}(\mathit{RTZ}(x, n+1),n) \leq \mathit{RTZ}(x, n+1) < y.
$

Case 1: $ {\cal R} = \mathit{RTZ}$ or $ {\cal R} = \mathit{RDN}$

by Lemma 6.1.9, $ \mathit{RTZ}(x,n)$ is $ n$-exact. By Lemma 6.1.11 (with $ \mathit{RTZ}(x,n)$ and $ y$ substituted for $ a$ and $ x$) and Definition 6.5.2,

$\displaystyle {\cal R}(x, n) = \mathit{RTZ}(x, n) \leq \mathit{RTZ}(y, n) = {\cal R}(y, n).
$

Case 2: $ {\cal R} = \mathit{RAZ}$ or $ {\cal R} = \mathit{RUP}$

By Lemmas 6.2.20 and 6.2.18,

$\displaystyle \mathit{RTZ}(x, n) < y < \mathit{RAZ}(x, n+1) \leq \mathit{RAZ}(\mathit{RAZ}(x, n+1),n) = \mathit{RAZ}(x, n) = fp^+(\mathit{RTZ}(x, n)),
$

and by Lemma 6.2.15 and Definition 6.5.1,

$\displaystyle {\cal R}(y, n) = \mathit{RAZ}(y, n) = fp^+(\mathit{RTZ}(x, n)) = \mathit{RAZ}(x, n) = {\cal R}(x, n).
$

Case 3: $ {\cal R} = \mathit{RNE}$ or $ {\cal R} = \mathit{RNA}$

By Lemmas 6.2.5 and 6.2.18,

$\displaystyle \mathit{RTZ}(x, n+1) < y < x < \mathit{RAZ}(x, n+1) = fp^+(\mathit{RTZ}(x, n+1)).
$

The claim follows from Lemmas 6.3.19 and 6.3.38

  (rnd-near-equal) Let $ x \in \mathbb{R}$ and $ n \in \mathbb{N}$ with $ x>0$ and $ n > 0$. Assume that $ x$ is not $ (n+1)$-exact. There exists $ \epsilon \in \mathbb{R}$, $ \epsilon > 0$, such that for all $ y \in \mathbb{R}$ and every common rounding mode $ {\cal R}$, if $ \vert x-y\vert < \epsilon$, then $ {\cal R}(x, n) = {\cal R}(y, n)$.

PROOF: According to Lemma 6.5.15, this holds for $ \epsilon = \mathit{min}(x - \mathit{RTZ}(x,n+1), \mathit{RAZ}(x,n+1)-x)$

  (rnd-rto) Let $ {\cal R}$ be a common rounding mode, $ m \in \mathbb{N}$, $ n \in \mathbb{N}$, and $ x \in \mathbb{R}$. If $ n \geq m+2$ and $ m>0$, then

$\displaystyle {\cal R}(x,m) = {\cal R}(RTO(x,n),m).$

PROOF: The modes $ \mathit{RTZ}$ and RAZ are handled by Lemma 6.4.9; RNE and $ RNA$ are covered by Lemma 6.4.10; RUP and RDN are reduced to $ \mathit{RTZ}$ and RAZ by Lemma 6.4.1


  (rnd-up) Let $ {\cal R}$ be a common rounding mode, $ x \in \mathbb{R}$, $ k \in \mathbb{N}$, $ m \in \mathbb{N}$, and $ n \in \mathbb{N}$, with $ 0 < m < n$ and $ \vert x\vert < 2^k$. If $ \vert\mathit{rnd}(x, {\cal R}, n)\vert = 2^k$, then $ \vert\mathit{rnd}(x, {\cal R}, m)\vert = 2^k$.

PROOF: This is a consequence of Lemmas 6.2.16, 6.3.12, 6.3.33, and 6.1.5

For the sake of simplicity, our main result pertaining to constant injection is formulated in the context of bit vector rounding.

  (rnd-const-thm) Let $ {\cal R}$ be a common rounding mode, $ n \in \mathbb{N}$, $ n>1$, and $ x \in \mathbb{N}$ with $ expo(x) \geq n$. Then

$\displaystyle {\cal R}(x,n) = \mathit{RTZ}(x+{\cal C},\nu),$

where

$\displaystyle {\cal C} = \left\{\begin{array}{ll}
2^{expo(x)-n} & \mbox{if ${\...
...{if ${\cal M} = \mathit{RTZ}$ or ${\cal M} = \mathit{RDN}$}\end{array} \right.$

and

$\displaystyle \nu = \left\{\begin{array}{ll}
n-1 & \mbox{if ${\cal R} = RNE$ ...
...s $(n+1)$-exact but not $n$-exact}\\
n & \mbox{otherwise.}\end{array} \right.$

PROOF: For the modes RAZ and RUP, the identity follows from Lemma 6.2.24, with $ m=expo(x)+1$, and Lemma 4.2.7. For RNE and RNA, it reduces to Lemmas 6.3.42 and 6.3.42. For $ \mathit{RTZ}$ and RDN, the lemma is trivial. 


Another common implementation of rounding is provided by the following result. Suppose our objective is a correct $ n$-bit rounding (with respect to some common rounding mode) of a precise result $ z>0$, and that we have a bit-vector representation of $ x = \lfloor z \rfloor$. Then the desired result may be derived by rounding $ x$ in the direction determined as follows:

  (roundup-pos-thm) Let $ {\cal R}$ be a common rounding mode, $ z \in \mathbb{R}$, and $ n \in \mathbb{N}$. Assume that $ 0<n$ and $ 2^n \leq z$. Let $ x = \lfloor z \rfloor$ and $ e = \mathit{expo}(x)$.

(a) $ z$ is $ n$-exact iff $ x[e-n:0] = 0$ and $ z \in \mathbb{Z}$.

(b) If any of the following conditions holds, then $ {\cal R}(z,n) = \mathit{fp}^+(\mathit{RTZ}(x,n), n)$, and otherwise $ {\cal R}(z,n) = \mathit{RTZ}(x,n)$:

PROOF: It is clear that $ \mathit{expo}(z) = e$. Definition 6.1.1,

$\displaystyle \mathit{RTZ}(z, e+1) = \lfloor 2^{(e+1)-1}\mathit{sig}(z)\rfloor2^{e-(e+1)+1} = \lfloor 2^{e}\mathit{sig}(z)\rfloor = \lfloor z \rfloor = x,
$

and hence, by Lemma 6.1.15, $ \mathit{RTZ}(z, n) = \mathit{RTZ}(x, n)$.


(a) According to Lemmas 6.1.9 and 6.1.10, it suffices to show that $ z \neq \mathit{RTZ}(z, n)$ iff either $ x[e-n:0] \neq 0$ or $ x \neq z$. By Lemma 6.1.5,

$\displaystyle \mathit{RTZ}(x, n) \leq x = \mathit{RTZ}(z, k) \leq z,
$

and it is clear that $ z \neq \mathit{RTZ}(z, n)$ iff either $ x - \mathit{RTZ}(x, n) > 0$ or $ x \neq z$. Now since $ x$ is $ k$-exact, Lemma 4.2.5 implies that $ x$ is $ (e+1)$-exact, and therefore, by Lemmas 6.1.9 and 6.1.20,

$\displaystyle x - \mathit{RTZ}(x, n) = \mathit{RTZ}(x, e+1) - \mathit{RTZ}(x, n) = x[e-n:0].
$

(b) Since $ RDN(z,n) = \mathit{RTZ}(z, n)$ and $ RUP(z, n) = RAZ(z, n)$, we need only consider RAZ, RNE, and RNA.

Suppose $ {\cal R} = RAZ$. By Lemmas 6.2.18, 6.1.9, and 6.2.11,

$\displaystyle \mathit{RAZ}(z, n) = \left\{\begin{array}{ll}
\mathit{RTZ}(x,n) ...
...mathit{RTZ}(x,n),n) & \mbox{if $z \neq \mathit{RTZ}(z, n)$.}\end{array}\right.
$

Thus, $ \mathit{RAZ}(z, n) = \mathit{RTZ}(x,n)$ iff $ z = \mathit{RTZ}(z, n)$. But we have shown that this holds iff $ x[e-n:0] = 0$ and $ z \in \mathbb{Z}$, as stated by the lemma.

For the remaining cases, RNE, and RNA, we refer directly to Definitions 6.3.1 and 6.3.2. Let

$\displaystyle y = \lfloor 2^{n-1}sig(z) \rfloor = \lfloor 2^{n-1-e}z \rfloor = \lfloor x/2^{e+1-n} \rfloor$

and

$\displaystyle f = 2^{n-1}sig(z) - y = 2^{n-1-e}z - y.$

By Definition 2.0.1, $ y = x[e:1+e-n]$. By Definition 6.1.1, $ \mathit{RTZ}(z, n) = 2^{1+e-n}y$ and consequently $ 2^{1+e-n}f = z - \mathit{RTZ}(z, n)$ and $ f = 2^{n-e-1}(z - \mathit{RTZ}(z, n))$. By Lemma 6.1.20,

$\displaystyle z - \mathit{RTZ}(z, n) = (\mathit{RTZ}(x, n+1) - \mathit{RTZ}(x, n)) + (z - \mathit{RTZ}(z, n+1)) = 2^{e-n}x[e-n] + (z - \mathit{RTZ}(z, n+1)).
$

Thus,

$\displaystyle f = 2^{n-e-1}(z - \mathit{RTZ}(z, n)) = \frac{1}{2} x[e-n] + 2^{n-e-1} (z - \mathit{RTZ}(z, n+1)).
$

By Lemma 6.1.8,

$\displaystyle 2^{n-e-1} (z - \mathit{RTZ}(z, n+1)) < 2^{n-e-1}2^{e-n} = \frac{1}{2}.
$

This leads to the following observations:
  1. $ f \geq \frac{1}{2}$ iff $ x[e-n] = 1$.
  2. $ f > \frac{1}{2}$ iff $ x[e-n] = 1$ and $ z - \mathit{RTZ}(z, n+1) > 0$, but by Lemma 6.1.20,
    $\displaystyle z - \mathit{RTZ}(z, n+1)$ $\displaystyle =$ $\displaystyle (z - x) + (x - \mathit{RTZ}(z, n+1))$  
      $\displaystyle =$ $\displaystyle (z - x) + (\mathit{RTZ}(x, e+1) - \mathit{RTZ}(x, n+1))$  
      $\displaystyle =$ $\displaystyle (z - x) + x[e-1-n:0],$  

    and hence $ f > \frac{1}{2}$ iff $ x[e-n] = 1$ and either $ x[e-1-n:0] \neq 0$ or $ z \neq x$.
Referring to Definition 6.3.2, we see that (1) is sufficient to complete the proof for the case $ {\cal R} = RNA$.

For the case $ {\cal R} = RNE$, it is clear from Definition 6.3.1 and (1) and (2) above that we need only show that $ y$ is even iff $ x[1+e-n] = 0$. But this is a consequence of the above equation $ z = x[e:1+e-n]$


The final result of this section is a variation of Lemma 6.5.20 that allows us to compute the absolute value of a rounded result when the unrounded value $ z$ is negative, given a signed integer (twos' complement) encoding $ x$ of $ \lfloor z \rfloor$, i.e., $ x = \lfloor z \rfloor + 2^k$, where $ z \geq -2^k$.

  (roundup-neg-thm) Let $ {\cal R}$ be a common rounding mode, $ z \in \mathbb{R}$, $ n \in \mathbb{N}$, and $ k \in \mathbb{N}$. Assume that $ 0 < n < k$ and $ -2^k \leq z < -2^n$. Let $ x = \lfloor z \rfloor + 2^k$, $ \tilde{x} = 2^k - x - 1$, and $ e = \mathit{expo}(\tilde{x})$.

(a) If $ \mathit{expo}(z) \neq e$, then $ z = -2^{e+1}$.

(b) $ z$ is $ n$-exact iff $ x[e-n:0] = 0$ and $ z \in \mathbb{Z}$.

(c) If any of the following conditions holds, then $ \vert{\cal R}(z,n)\vert = \mathit{fp}^+(\mathit{RTZ}(\tilde{x},n), n)$, and otherwise $ \vert{\cal R}(z,n)\vert = \mathit{RTZ}(\tilde{x},n)$:

PROOF: Let $ f = z - \lfloor z \rfloor$. Then $ x = (z - f) + 2^k$,

$\displaystyle \tilde{x} = 2^k - x - 1 = 2^k - (z - f + 2^k) - 1 = -z - (1 - f) = \vert z\vert - (1 - f),
$

and $ \vert z\vert = \tilde{x} + (1 - f)$.

To prove (a), note that if $ \mathit{expo}(z) \neq e$, then since $ \tilde{x} \leq 2^{e+1}-1$, we must have $ \tilde{x} = 2^{e+1}-1$, $ f=0$, and $ \vert z\vert = 2^{e+1}$.

For the proof of (b) and (c), by Lemmas 2.2.22 and 6.1.18, we have

$\displaystyle \vert z\vert$ $\displaystyle =$ $\displaystyle \tilde{x} + (1 - f)$  
  $\displaystyle =$ $\displaystyle 2^{e+1-n}\tilde{x}[e:e+1-n] + \tilde{x}[e-n:0] + (1 - f)$  
  $\displaystyle =$ $\displaystyle \mathit{RTZ}(\tilde{x}, n) + \tilde{x}[e-n:0] + (1 - f).$  

Suppose first that $ f = x[e-n:0] = 0$. Then

$\displaystyle \vert z\vert = \mathit{RTZ}(\tilde{x}, n) + (2^{e+1-n} - 1) + 1 = \mathit{fp}^+(\mathit{RTZ}(\tilde{x},n), n) \in \mathbb{Z}
$

and the lemma claims that (b) $ z$ is $ n$-exact and (c)  $ \vert{\cal R}(z,n)\vert = \mathit{fp}^+(\mathit{RTZ}(\tilde{x},n), n)$ for all $ {\cal R}$. Both claims follow trivially from Lemmas 4.2.15, 6.1.9, and 6.5.7.

In the remaining case, $ \mathit{RTZ}(\tilde{x}, n) < \vert z\vert < \mathit{fp}^+(\mathit{RTZ}(\tilde{x},n), n)$. By Lemma 4.2.16, $ z$ is not $ n$-exact, and (b) follows. The claim (c) will be derived from Lemma 6.5.20. Note that if $ \hat{\cal R}$ is defined as in Lemma 6.5.5, then

$\displaystyle \vert{\cal R}(z,n)\vert = -{\cal R}(z,n) = \hat{\cal R}(-z,n) = \hat{\cal R}(\vert z\vert,n),
$

and (c) may be restated as follows:

$ (c')$ If any of the following conditions holds, then $ \hat{\cal R}(\vert z\vert,n) = \mathit{fp}^+(\mathit{RTZ}(\tilde{x},n), n)$, and otherwise $ \hat{\cal R}(\vert z\vert,n) = \mathit{RTZ}(\tilde{x},n)$:

We invoke Lemma 6.5.20 with $ \vert z\vert$ and $ \hat{\cal R}$ substituted for $ z$ and $ {\cal R}$, respectively. This yields the following:

$ (c”)$ If any of the following conditions holds, then $ \hat{\cal R}(\vert z\vert,n) = \mathit{fp}^+(\mathit{RTZ}(\lfloor \vert z\vert \rfloor,n), n)$, and otherwise $ \hat{\cal R}(\vert z\vert,n) = \mathit{RTZ}(\lfloor \vert z\vert \rfloor,n)$:

We must show that $ (c”)$ implies $ (c')$.

Suppose $ f>0$. Then $ z \notin \mathbb{Z}$ and by Lemma 1.1.7,

$\displaystyle \tilde{x} = 2^k - x - 1 = -\lfloor z \rfloor - 1 = \lfloor \vert z\vert \rfloor
$

and we may replace $ \tilde{x}$ with $ \lfloor \vert z\vert \rfloor$ in $ (c')$. Furthermore, $ \vert z\vert[e-n] = \tilde{x}[e-n]$, which implies $ \vert z\vert[e-n] \neq x[e-n]$, and the claim $ (c')$ follows trivially from $ (c”)$.

In the final case, $ f=0$ and $ x[e-n:0] \neq 0$. Thus, $ \vert z\vert = \tilde{x} + 1$ and $ \tilde{x}[e-n:0] < 2^{e+1-n}-1$. Since

$\displaystyle \vert z\vert$ $\displaystyle =$ $\displaystyle \tilde{x} + 1$  
  $\displaystyle =$ $\displaystyle 2^{e+1-n}\tilde{x}[e:e+1-n] + \tilde{x}[e-n:0] + 1$  
  $\displaystyle \leq$ $\displaystyle 2^{e+1-n}(2^n - 1) + (2^{e+1-n} - 2) + 1$  
  $\displaystyle =$ $\displaystyle 2^{e+1} - 1,$  

$ \mathit{expo}(z) = e$. Since

$\displaystyle \vert z\vert[e:e+1-n] = \left\lfloor \frac{\vert z\vert}{2^{e+1-n}} \right\rfloor = \tilde{x}[e:e+1-n],
$

$ \vert z\vert[e+1-n] = \tilde{x}[e+1-n]$, which implies $ \vert z\vert[e+1-n] \neq x[e+1-n]$. Lemma 6.1.18 implies $ \mathit{RTZ}(\vert z\vert, n) = \mathit{RTZ}(\tilde{x}, n)$, and again we may replace $ \tilde{x}$ with $ \lfloor \vert z\vert \rfloor$ in $ (c')$. Furthermore,

$\displaystyle \vert z\vert[e-n:0] = \vert z\vert \bmod 2^{e+1-n} = \tilde{x}[e-n:0] + 1 = 2^{e+1-n} - x[e-n:0].
$

If $ x[e-n-1:0] = 0$, then $ x[e-n] = 1$and

$\displaystyle \vert z\vert[e-n:0] = 2^{e+1-n} - x[e-n:0] = 2^{e+1-n} - 2^{e-n} = 2^{e-n},
$

which implies $ \vert z\vert[e-n] = 1$ and $ \vert z\vert[e-n-1:0] = 0$. On the other hand, if $ x[e-n-1:0] \neq 0$, then $ \tilde{x}[e-n-1:0] < 2^{e-n}-1$, which implies

$\displaystyle \vert z\vert[e-n] = \tilde{x}[e-n] \neq x[e-n]
$

and

$\displaystyle \vert z\vert[e-n-1:0] = \tilde{x}[e-n-1] + 1 \neq 0.
$

In either case, $ (c')$ again follows easily from $ (c”)$

David Russinoff 2017-08-01