4.2  Exactness

In order for a significand, which has the form

$\displaystyle (1.\beta_1\beta_2 \cdots)_2,$

to be represented by an $ n$-bit field, we must have $ \beta_k = 0$ for all $ k \geq n$, or equivalently, a right shift of the radix point by $ n-1$ places must result in an integer. This motivates our definition of exactness.

Definition 4.2.1   (exactp) Let $ x \in \mathbb{R}$ and $ n \in \mathbb{Z}$. Then

$\displaystyle exactp(x,n) \Leftrightarrow sig(x)2^{n-1} \in \mathbb{Z}.$

If $ exactp(x,n)$, then we shall say that $ x$ is $ n$-exact.

The definition may be restated as follows.

Lemma 4.2.1   (exactp2) For all $ x \in \mathbb{R}$ and $ n \in \mathbb{Z}$, $ x$ is $ n$-exact if and only if $ x2^{n-1-expo(x)} \in \mathbb{Z}$.

The next three lemmas are consequences of the observation that $ n$-exactness of $ x$ is a property of $ sig(x)$.

Lemma 4.2.2   (exactp-sig) For all $ x \in \mathbb{R}$ and $ n \in \mathbb{Z}$, $ x$ is $ n$-exact if and only if $ sig(x)$ is $ n$-exact.

  (minus-exactp, exactp-neg) For all $ x \in \mathbb{R}$ and $ n \in \mathbb{Z}$, $ x$ is $ n$-exact if and only if $ -x$ is $ n$-exact.

PROOF: By Lemma 4.1.13, $ sig(-x) = sig(x)$

  (exactp-shift) For all $ x \in \mathbb{R}$, $ n \in \mathbb{Z}$, and $ k \in \mathbb{Z}$, $ x$ is $ n$-exact if and only if $ 2^kx$ is $ n$-exact.

PROOF: By Lemma 4.1.14, $ sig(2^kx) = sig(x)$


It is clear that is $ sig(x)$ is representable in a given field of bits, then it is also representable in any wider field.

  (exactp-<=) For all $ x \in \mathbb{R}$, $ n \in \mathbb{Z}$, and $ m \in \mathbb{Z}$, if $ m < n$ and $ x$ is $ m$-exact, then $ x$ is $ n$-exact.

PROOF: Since $ 2^{n-m} \in \mathbb{Z}$ and $ x\cdot 2^{m-1-expo(x)} \in \mathbb{Z}$,

$\displaystyle x\cdot 2^{n-1-expo(x)} = 2^{n-m} \cdot x\cdot 2^{m-1-expo(x)} \in \mathbb{Z}.$

Any power of 2 has a 1-bit significand.

  (exactp-2**n) Let $ n \in \mathbb{Z}$ and $ m \in \mathbb{Z}$. If $ m>0$, then $ 2^n$ is $ m$-exact.

PROOF: By Lemma 4.1.4,

$\displaystyle 2^n\cdot 2^{m-1-expo(2^n)} = 2^n\cdot 2^{m-1-n} = 2^{m-1} \in \mathbb{Z}.$

A bit vector is exact with respect to its width.

  (bvecp-exactp) Let $ x \in \mathbb{R}$ and $ n \in \mathbb{Z}$. If $ x$ is a bit vector of width $ n$, then $ x$ is $ n$-exact.

PROOF: Since 0 is $ n$-exact for all $ n \in \mathbb{Z}$, we may assume $ x>0$, and therefore $ n > 0$. Now since $ 0 < x < 2^n$, Lemma 4.1.3 implies $ expo(x) \leq n-1$, and hence $ x\cdot 2^{n-1-expo(x)}\in \mathbb{Z}$


We have the following formula for the exactness of a product.

  (exactp-prod) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, $ m \in \mathbb{Z}$, and $ n \in \mathbb{Z}$. If $ x$ is $ m$-exact and $ y$ is $ n$-exact, then $ xy$ is $ (m+n)$-exact.

PROOF: If $ x2^{m-1-expo(x)}$ and $ y2^{n-1-expo(y)}$ are integers, then by Lemma 4.1.15, so is

$\displaystyle x2^{m-1-expo(x)}y2^{n-1-expo(y)}2^{expo(x)+expo(y)+1-expo(xy)} =
xy2^{m+n-1-expo(xy)}.$

In particular, if $ x$ is $ n$-exact, then $ x^2$ is $ 2n$-exact. The converse also holds.

  (exactp-x2) Let $ x \in \mathbb{R}$ and $ n \in \mathbb{Z}$. If $ x^2$ is $ 2n$-exact, then $ x$ is $ n$-exact.

PROOF: We shall require an elementary fact of arithmetic:

If $ r \in \mathbb{R}$ and $ 2r^2 \in \mathbb{Z}$, then $ r \in \mathbb{Z}$.
In order to establish this result, let $ r = a/b$, where $ a$ and $ b$ are integers with no common prime factor, and assume that $ 2r^2 = 2a^2/b^2 \in
\mathbb{Z}$. If $ b$ were even, say $ b = 2c$, then $ 2a^2/b^2 =
a^2/2c^2 \in \mathbb{Z}$, implying that $ a$ is even, which is impossible. Thus, $ b$ cannot have a prime factor that does not also divide $ a$, and hence $ b = 1$.

Now to show that $ x$ is $ n$-exact, i.e., $ x2^{n-1-expo(x)} \in \mathbb{Z}$, it will suffice to show that

$\displaystyle 2\left(x2^{n-1-expo(x)}\right)^2 = x^22^{2n-1-2 expo(x)} \in \mathbb{Z}.$

But since $ x^2$ is $ 2n$-exact, we have $ x^22^{2n-1-expo(x^2)} \in \mathbb{Z}$, and since Lemma 4.1.15 implies $ expo(x^2) \geq 2expo(x)$,

$\displaystyle x^22^{2n-1-2 expo(x)} = x^22^{2n-1-expo(x^2)}2^{expo(x^2)-2expo(x)} \in \mathbb{Z}.$

  (exactp-factors) Let $ n \in \mathbb{N}$, $ x \in \mathbb{R}$, and $ y \in \mathbb{R}$. Assume that $ x$ and $ y$ are non-zero and $ k$-exact for some $ k \in \mathbb{N}$. If $ xy$ is $ n$-exact, then so are $ x$ and $ y$.

PROOF: Let $ p$ and $ q$ be the smallest integers such that $ x' = 2^p*x$ and $ y' = 2^q*y$ are integers. Thus, $ x'$ and $ y'$ are odd. By Lemma 4.2.4, $ x$, $ y$, or $ xy$, respectively, is $ n$-exact iff $ x'$, $ y'$, or $ x'y'$ is $ n$-exact. Consequently,, we may replace $ x$ and $ y$ with $ x'$ and $ y'$. That is, we may assume without loss of generality that $ x$ and $ y$ are odd integers.

By Lemma 4.2.1, an odd integer $ z$ is $ n$-exact iff $ expo(z) \leq n-1$, or equivalently, acording to Lemma 4.1.3, $ \vert z\vert < 2^n$. Thus, since $ xy$ is $ n$-exact, $ xy < 2^n$, which implies $ x < 2^n$ and $ y < 2^n$, and hence $ x$ and $ y$ are $ n$-exact. 


Exactness of a bit vector may be formulated in various ways.

  (exact-bits-1, exact-bits-2, exact-bits-3) Let $ x \in \mathbb{N}$, $ k \in \mathbb{N}$, and $ n \in \mathbb{N}$. If $ expo(x) = n-1$ and $ k<n$, then the following are equivalent:

(a) $ x$ is $ (n-k)$-exact
(b) $ x/2^k \in \mathbb{Z}$
(c) $ x[n-1:k] = x/2^k$
(d) $ x[k-1:0] = 0$.

PROOF: The equivalence between (a) and (b) follows from Lemma 4.2.1, since

$\displaystyle x2^{(n-k)-1-expo(x)} = x2^{n-k-1-(n-1)} = x/2^k.$

Now by Lemmas 2.2.5 and 2.2.22,

$\displaystyle x = x[n-1:0] = 2^kx[n-1:k] + x[k-1:0],$

and hence

$\displaystyle x/2^k = x[n-1:k] + x[k-1:0]/2^k.$

But by Lemma 2.2.1, $ 0 \leq x[k-1:0]/2^k < 1$, which implies that (b), (c), and (d) are all equivalent. 


Corollary 4.2.12   (exact-k+1) Let $ x \in \mathbb{N}$, $ k \in \mathbb{N}$, and $ n \in \mathbb{N}$ such that $ expo(x) = n-1$ and $ k<n$, Assume that $ x$ is $ (n-k)$-exact. Then $ x$ is $ (n-k-1)$-exact if and only if $ x[k] = 0$.

The next lemma gives a formula for exactness of a difference.

  (exactp-diff) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, $ n \in \mathbb{Z}$, and $ k \in \mathbb{Z}$. Assume that $ n > 0$ and $ n>k$. If x and y are both n-exact and $ expo(x-y)+k \leq min(expo(x),expo(y))$, then $ x-y$ is $ (n-k)$-exact.

PROOF: Since $ x$ is $ n$-exact and $ expo(x-y)+k \leq expo(x)$,

$\displaystyle x2^{n-1-(expo(x-y)+k)} = x2^{n-1-expo(x)}2^{expo(x)-(expo(x-y)+k}\in \mathbb{Z}$

by Lemma 4.2.1.. Similarly, $ y2^{n-1-(expo(x-y)+k)} \in \mathbb{Z}$. Thus,

$\displaystyle (x-y)2^{(n-k)-1-expo(x-y)} = x2^{n-1-(expo(x-y)+k)} -
y2^{n-1-(expo(x-y)+k)} \in \mathbb{Z}.$

Lemma 4.2.13 is often applied with $ k=0$, in the following weaker form.

Corollary 4.2.14   (exactp-diff-cor) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, and $ n \in \mathbb{Z}$ with $ n > 0$. If x and y are both n-exact and $ \vert x-y\vert \leq min(\vert x\vert,\vert y\vert)$, then $ x-y$ is $ n$-exact.

If $ x$ is positive and $ n$-exact, then the least $ n$-exact number exceeding $ x$ is computed as follows.

Definition 4.2.2   (fp+) Let $ x \in \mathbb{R}$, $ x>0$, and $ n \in \mathbb{N}$, $ n > 0$. Then

$\displaystyle \mathit{fp}^+(x,n) = x + 2^{expo(x)+1-n}.$

  (fp+1) Let $ x \in \mathbb{R}$, $ x>0$, and $ n \in \mathbb{N}$, $ n > 0$. If $ x$ is $ n$-exact, then $ \mathit{fp}^+(x,n)$ is $ n$-exact.

PROOF: Let $ e = expo(x)$ and $ x^+ = \mathit{fp}^+(x,n) = x + 2^{e+1-n}$. Since $ x$ is $ n$-exact, $ x\cdot 2^{n-1-e} \in \mathbb{Z}$, and by Lemma 4.1.3, $ x < 2^{e+1}$. Thus,

$\displaystyle x\cdot 2^{n-1-e} < 2^{e+1}\cdot 2^{n-1-e} = 2^n,$

which implies $ x\cdot 2^{n-1-e} \leq 2^n-1$. Therefore, $ x \leq 2^{e+1}-2^{e+1-n}$ and $ x^+ = x + 2^{e+1-n} \leq 2^{e+1}$. If $ x^+ = 2^{e+1}$, then $ x^+$ is $ n$-exact by Lemma 4.2.6. Otherwise, $ x^+ < 2^{e+1}$, $ expo(x^+) = e$, and

$\displaystyle x^+ \cdot 2^{n-1-expo(x^+)} = (x + 2^{e+1-n})2^{n-1-e} = x\cdot 2^{e+1-n}+1 \in \mathbb{Z}.$

  (fp+2) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, $ y>x>0$, and $ n \in \mathbb{N}$, $ n > 0$. If $ x$ and $ y$ are $ n$-exact, then $ y \geq \mathit{fp}^+(x,n)$.

PROOF: Let $ e = expo(x)$ and $ x^+ = \mathit{fp}^+(x,n) = x + 2^{e+1-n}$. Since $ x$ is $ n$-exact, $ x\cdot 2^{n-1-e} \in \mathbb{Z}$. Similar, since $ y$ is $ n$-exact, $ y\cdot 2^{n-1-expo(y)} \in \mathbb{Z}$. But $ expo(y) \geq e$ by Corollary 4.1.5, and hence

$\displaystyle y\cdot 2^{n-1-e} = y\cdot 2^{n-1-expo(y)}\cdot 2^{expo(y)-e} \in \mathbb{Z}$

as well. Now since $ y > x$, $ y\cdot 2^{n-1-e} > x\cdot 2^{n-1-e}$, which implies

$\displaystyle y\cdot 2^{n-1-e} \geq x\cdot 2^{n-1-e} + 1.$

Thus,

$\displaystyle y \geq (x\cdot 2^{n-1-e} + 1)/2^{n-1-e} = x + 2^{e+1-n} = x^+.$

  (fp+expo) Let $ x \in \mathbb{R}$, $ x>0$, and $ n \in \mathbb{N}$, $ n > 0$. If $ x$ is $ n$-exact and $ expo(fp^+(x,n)) \neq expo(x)$, then $ fp^+(x,n) = 2^{expo(x)+1}$.

PROOF: Since $ fp^+(x,n) > x$, Lemma 4.1.5 implies $ expo(fp^+(x,n)) \geq expo(x)$. Therefore, we have $ expo(fp^+(x,n)) > expo(x)$, which, according to Lemma 4.1.3, implies $ fp^+(x,n) \geq 2^{expo(x)+1}$. On the other hand, since $ 2^{expo(x)+1}$ is $ n$-exact by Lemma 4.2.6, Lemma 4.2.16 implies $ 2^{expo(x)+1} \geq fp^+(x,n)$

  (expo-diff-min) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, and $ n \in \mathbb{N}$ with $ n > 0$. If x and y are n-exact and $ x \neq y$, then

$\displaystyle expo(x-y) \geq min(expo(x),expo(y)) + 1 - n$

PROOF: Since the case $ xy<0$ is trivial, we shall assume that $ xy>0$ and, without loss of generality, that $ y>x>0$. But then by Lemma 4.2.16,

$\displaystyle \mathit{expo}(y-x) \geq \mathit{expo}(2^{expo(x)+1-n}) = \mathit{expo}(x)+1-n. \Box
$

Similarly, the following definition computes the greatest $ n$-exact number that is less than a given $ n$-exact number.

Definition 4.2.3   (fp-) Let $ x \in \mathbb{R}$, $ x>0$, and $ n \in \mathbb{N}$, $ n > 0$. Then

$\displaystyle \mathit{fp}^-(x,n) = \left\{\begin{array}{ll}
x - 2^{expo(x)-n} ...
...}\\
x - 2^{expo(x)+1-n} & \mbox{if $x \neq 2^{expo(x)}$.}
\end{array}\right.$

.

  (exactp-fp-) Let $ x \in \mathbb{R}$, $ x>0$, and $ n \in \mathbb{N}$, $ n > 0$. If $ x$ is $ n$-exact, then $ \mathit{fp}^-(x,n)$ is $ n$-exact.

PROOF: Let $ e = expo(x)$ and $ x^- = \mathit{fp}^-(x,n)$.

Suppose first that $ x = 2^e$. Then $ x^- = x-2^{e-n}$, and since

$\displaystyle 2^e > x^- \geq 2^e - 2^{e-1} = 2^{e-1},$

Definition 4.1.1 implies $ expo(x^-) = e-1$. Therefore,

$\displaystyle x^-2^{n-1-expo(x^-)} = x^-2^{n-e} = (x-2^{e-n})2^{n-e} = (2^e-2^{e-n})2^{n-e} = 2^n-1 \in \mathbb{Z},$

and $ x^-$ is $ n$-exact by Lemma 4.2.1.

Now suppose $ x \neq 2^e$. Then by Lemma 4.2.15,

$\displaystyle x \geq \mathit{fp}^+(2^e,n) = 2^e + 2^{e+1-n}.$

Now $ x^- = x - 2^{e+1-n} \geq 2^e$, and hence $ expo(x^-) = e$. Thus,

$\displaystyle x^- 2^{n-1-expo(x^-)} = x^-2^{n-1-e} = (x-2^{e+1-n})2^{n-1-e} = x2^{n-1-e} - 1 \in \mathbb{Z},$

which implies $ x^-$ is $ n$-exact. 

  (fp+-, fp-+) Let $ x \in \mathbb{R}$, $ x>0$, and $ n \in \mathbb{N}$, $ n > 0$. If $ x$ is $ n$-exact, then

$\displaystyle \mathit{fp}^+(\mathit{fp}^-(x,n),n) = \mathit{fp}^-(\mathit{fp}^+(x,n),n) = x.$

PROOF: Let $ e = expo(x)$ $ x^- = \mathit{fp}^-(x,n)$, and $ x^+ = \mathit{fp}^+(x,n)$. We shall prove first that $ \mathit{fp}^+(x^-,n) = x$.

As noted in the proof of Lemma 4.2.15,

$\displaystyle expo(x^-) = \left\{\begin{array}{ll}
e-1 & \mbox{if $x = 2^e$}\\
e & \mbox{if $x > 2^e$.}
\end{array}\right.$

If $ x = 2^e$, then

$\displaystyle \mathit{fp}^+(x^-,n) = x^- + 2^{(e-1)+1-n} = x - 2^{e-n} + 2^{e-n} = x.$

But if $ x > 2^e$, then

$\displaystyle \mathit{fp}^+(x^-,n) = x^- + 2^{e+1-n} = x - 2^{e+1-n} + 2^{e+1-n} = x.$

Next, we prove that $ \mathit{fp}^-(x^+,n) = x$. Note that $ x^+ = x + 2^{e+1-n}$. If $ x^+ < 2^{e+1}$, then $ expo(x^+) = e$, and since $ x^+ \neq 2^e$,

$\displaystyle \mathit{fp}^-(x^+,n) = x^+ - 2^{e+1-n} = x + 2^{e+1-n} - 2^{e+1-n} = x.$

We may assume, therefore, that $ x^+ \geq 2^{e+1}$. By Lemma 4.2.15, since $ 2^{e+1}$ is $ n$-exact, $ 2^{e+1} \geq x^+$, and hence, $ x^+ = 2^{e+1}$. Thus,

$\displaystyle \mathit{fp}^-(x^+,n) = x^+ - 2^{e+1-n} = x + 2^{e+1-n} - 2^{e+1-n} = x.$

  (fp-2) Let $ x \in \mathbb{R}$, $ y \in \mathbb{R}$, $ x>y>0$, and $ n \in \mathbb{N}$, $ n > 0$. If $ x$ and $ y$ are $ n$-exact, then $ y \leq \mathit{fp}^-(x,n)$.

PROOF: Suppose $ y > \mathit{fp}^-(x,n)$. Then since $ \mathit{fp}^-(x,n)$ and $ y$ are both $ n$-exact, Lemma 4.2.15 implies $ y \geq \mathit{fp}^+(\mathit{fp}^-(x,n),n) = x$, a contradiction. 

David Russinoff 2017-08-01