next up previous contents
Next: Unbiased Rounding Up: Rounding Previous: Truncation   Contents


Rounding Away from Zero

The dual of truncation is defined similarly, using the ceiling instead of the floor.

Definition 5.2.1   (away) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{Z}$ ,

$\displaystyle away(x,n) = sgn(x)\lceil 2^{n-1}sig(x) \rceil 2^{expo(x)-n+1}.$

Example: Let $ x = 45/8 = \verb!b!101.101$ and $ n = 5$ . Then $ sgn(x) = 1$ , $ expo(x) = 2$ , $ sig(x) = \verb!b!1.01101$ ,

$\displaystyle \lceil 2^{n-1}sig(x) \rceil 2^{1-n} = \lceil \verb!b!10110.1 \rceil 2^{-4}
= \verb!b!10111 \cdot 2^{-4} = \verb!b!1.0111,$

and

$\displaystyle away(x,n) =\lceil 2^{n-1}sig(x) \rceil 2^{1-n}2^{expo(x)}
= \verb!b!1.0111 \cdot 2^{2} = \verb!b!101.11.$

Note that this value is the smallest 5-exact number not exceeded by $ x$ .

The negative-precision case is less than intuitive.

  (away-to-0) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{Z}$ , if $ n \leq 0$ , then

$\displaystyle away(x,n) = sgn(x)2^{expo(x)+1-n}.$

PROOF: By Lemma 4.1.7,

$\displaystyle 0 < 2^{n-1}sig(x) \leq sig(x)/2 < 1,$

and hence, by Lemma 1.1.8,

$\displaystyle away(x,n) = sgn(x)\lceil 2^{n-1}sig(x) \rceil 2^{expo(x)+1-n} = sgn(x)2^{expo(x)+1-n}.$

The next three lemmas list simple properties of away that it shares with trunc.

Lemma 5.2.2   (sgn-away) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ n>0$ , then

$\displaystyle sgn(away(x,n)) = sgn(x).$

Lemma 5.2.3   (away-minus) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle away(-x,n) = -away(x,n).$

  (away-shift) For all $ x \in \mathbb{Q}$ , $ n \in \mathbb{N}$ , and $ k \in \mathbb{Z}$ ,

$\displaystyle away(2^kx,n) = 2^k away(x,n).$

PROOF: By Lemma 4.1.12,

$\displaystyle away(2^kx,n)$ $\displaystyle =$ $\displaystyle sgn(2^kx)\lceil 2^{n-1}sig(2^kx) \rceil 2^{expo(2^kx)-n+1}$  
  $\displaystyle =$ $\displaystyle sgn(x)\lceil 2^{n-1}sig(x) \rceil 2^{expo(x)+k-n+1}$  
  $\displaystyle =$ $\displaystyle 2^k away(x,n).$  

We have the following bounds on $ away(x,n)$ .

  (away-lower-bound) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle \vert away(x,n)\vert \geq \vert x\vert.$

PROOF: By Lemmas 1.1.8 and 4.1.1,

$\displaystyle \vert away(x,n)\vert \geq 2^{n-1}sig(x) 2^{expo(x)-n+1} = sig(x)2^{expo(x)} = \vert x\vert.$

  (away-upper-bound,away-upper-2)
If $ x \in \mathbb{Q}$ , $ x \neq 0$ , and $ n \in \mathbb{N}$ , then

$\displaystyle \vert away(x,n)\vert < \vert x\vert + 2^{expo(x)-n+1} \leq \vert x\vert(1+2^{1-n}).$

PROOF: By Definitions 5.2.1 and 1.1.1 and Lemma 4.1.1,

$\displaystyle \vert away(x,n)\vert < (2^{n-1}sig(x)+1)2^{expo(x)-n+1} = \vert x\vert + 2^{expo(x)-n+1}.$

The second inequality follows from Definition 4.1.1

  (away-diff) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle \vert away(x,n)-x\vert < 2^{expo(x)-n+1}.$

PROOF: By Lemmas 5.2.2 and 5.2.5,

$\displaystyle \vert away(x,n)-x\vert = \vert\vert away(x,n)\vert-\vert x\vert\vert = \vert away(x,n)\vert-\vert x\vert.$

The corollary now follows from Lemma 5.2.6


Unlike trunc, away is not guaranteed to preserve the exponent of its argument, but the only exception is the case in which a number is rounded up to a power of 2.

  (away-expo-upper) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle \vert away(x,n)\vert \leq 2^{expo(x) + 1}.$

PROOF: By Lemma 4.1.7,

$\displaystyle \vert away(x,n)\vert$ $\displaystyle =$ $\displaystyle \lceil 2^{n-1}sig(x) \rceil 2^{expo(x)-n+1}$  
  $\displaystyle \leq$ $\displaystyle \lceil 2^{n} \rceil 2^{expo(x)-n+1}$  
  $\displaystyle =$ $\displaystyle 2^{n} 2^{expo(x)-n+1}$  
  $\displaystyle =$ $\displaystyle 2^{expo(x) + 1}.$  

  (expo-away-lower-bound,expo-away-upper-bound)
For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle expo(x) \leq expo(away(x,n)) \leq expo(x)+1.$

PROOF: The first inequality follows from Lemmas 5.2.5 and 4.1.4, and the second from Lemmas 5.2.8 and 4.1.2

Corollary 5.2.10   (expo-away) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ , if $ \vert away(x,n)\vert \neq 2^{expo(x)+1}$ , then

$\displaystyle expo(away(x,n)) = expo(x).$

The standard rounding mode properties may now be derived.

  (away-exactp-a) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ , $ away(x,n)$ is $ n$ -exact.

PROOF: By Corollary 5.2.10 and Lemma 4.2.6, we may assume that

$\displaystyle expo(away(x,n)) = expo(x).$

Consequently, it suffices to observe that

$\displaystyle away(x,n)2^{n-1-expo(x)} = sgn(x)\lceil 2^{n-1}sig(x) \rceil \in \mathbb{Z}.$

  (away-diff-expo) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ n>0$ and x is not $ n$ -exact, then

$\displaystyle expo(away(x,n)-x) \leq expo(x)-n.$

PROOF: According to Lemma 5.2.11, the hypothesis implies that $ x-away(x,n) \neq 0$ , and hence Lemma 4.1.2 applies. 

  (away-exactp-b) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ n>0$ and x is $ n$ -exact, then

$\displaystyle away(x,n) = x.$

PROOF: By Definition 4.2.1 and Lemma 1.1.9,

$\displaystyle \lceil 2^{n-1}sig(x) \rceil = 2^{n-1}sig(x),$

and hence by Definition 5.2.1 and Lemma 4.1.1,
$\displaystyle away(x,n)$ $\displaystyle =$ $\displaystyle sgn(x)\lceil 2^{n-1}sig(x) \rceil 2^{expo(x)-n+1}$  
  $\displaystyle =$ $\displaystyle sgn(x)2^{n-1}sig(x)2^{expo(x)-n+1}$  
  $\displaystyle =$ $\displaystyle sgn(x)sig(x)2^{expo(x)}$  
  $\displaystyle =$ $\displaystyle x.$  

  (away-exactp-c) Let $ x \in \mathbb{Q}$ , $ a \in \mathbb{Q}$ , and $ n \in \mathbb{N}$ , $ n>0$ . If $ a$ is $ n$ -exact and $ a \geq x$ , then $ a \geq away(x,n)$ .

PROOF: If $ x<0$ , then

$\displaystyle x = -\vert x\vert \geq -\vert away(x,n)\vert = away(x,n)$

by Lemmas 5.2.2 and 5.2.5. Therefore, we may assume that $ x \geq 0$ . Suppose $ a < away(x,n)$ . Then by Lemmas 4.2.15 and 4.1.4,

$\displaystyle away(x,n) \geq a + 2^{expo(a)-n+1} \geq x + 2^{expo(x)-n+1},$

contradicting Lemma 5.2.6

  (away-monotone) Let $ x \in \mathbb{Q}$ , $ y \in \mathbb{Q}$ , and $ n \in \mathbb{N}$ . If $ x \leq y$ , then $ away(x,n) \leq trunc(y,n)$ .

PROOF: First suppose $ x>0$ . By Lemma 5.2.5, $ away(y,n) \geq y \geq x$ . Since $ away(x,n)$ is $ n$ -exact by Lemma 5.2.11, Lemma 5.2.14 implies

$\displaystyle away(y,n) \geq away(x,n).$

Now suppose $ x \leq 0$ . By Lemma 5.2.2, we may assume that $ x \leq y < 0$ . Thus, since $ 0 < -y \leq -x$ , we have $ away(-y,n) \leq away(-x,n)$ and by Lemma 5.2.3,

$\displaystyle away(x,n) = -away(-x,n) \leq -away(-y,n) = away(y,n).$

If $ x$ is not $ n$ -exact, then $ trunc(x,n)$ and $ away(x,n)$ are successive $ n$ -exact numbers.

  (trunc-away) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ x>0$ , $ n>0$ , and $ x$ is not $ n$ -exact, then

$\displaystyle away(x,n) = trunc(x,n) + 2^{expo(x)+1-n}.$

PROOF: Let $ a = trunc(x,n) + 2^{expo(x)+1-n}$ . Since $ expo(trunc(x,n)) = expo(x)$ , $ a =$   fp$ ^+(trunc(x,n),n)$ . Since $ trunc(x,n)$ and $ away(x,n)$ are both $ n$ -exact and $ trunc(x,n) <
away(x,n)$ , Lemmas 4.2.14 and 4.2.15 imply that $ a$ is $ n$ -exact and $ away(x,n) \geq a$ . But by Lemma 5.1.7, $ a > x$ , and therefore, by Lemma 5.2.14, $ a \geq away(x,n)$


The next four results correspond to Lemmas 5.1.14, 5.1.15 5.1.16, and 5.1.17 of the preceding section.

  (away-midpoint) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ x$ is $ (n+1)$ -exact but not $ n$ -exact, then

$\displaystyle away(x,n) = x + sgn(x)2^{expo(x)-n}.$

PROOF: For the case $ n=0$ , we have $ sig(x) = 1$ by Definition 4.2.1, and by Lemmas 4.1.1 and 5.2.1,

$\displaystyle x + sgn(x)2^{expo(x)-n}$ $\displaystyle =$ $\displaystyle sgn(x)2^{expo(x)} + sgn(x)2^{expo(x)}$  
  $\displaystyle =$ $\displaystyle sgn(x)2^{expo(x)+1}$  
  $\displaystyle =$ $\displaystyle away(x,n).$  

Thus, we may assume $ n>0$ , and by Lemma 5.2.3, we may also assume $ x>0$ . Let $ a = x-2^{expo(x)-n}$ and $ b = x+2^{expo(x)-n}$ . As noted in the proof of Lemma 5.1.14, $ a$ and $ b$ are both $ n$ -exact and $ b =$   fp$ ^+(a,n)$ . Now by Lemma 5.2.14, $ b \geq away(x,n)$ , but if $ b =$   fp$ ^+(a,n) > away(x,n)$ , then since $ away(x,n)$ is $ n$ -exact, Lemma 4.2.15 would imply $ a \geq away(x,n)$ , contradicting $ a<x$ . Therefore, $ a = away(x,n)$

  (away-away) Let $ x \in \mathbb{Q}$ , $ m \in \mathbb{N}$ , and $ n \in \mathbb{N}$ . If $ m \leq n$ , then

$\displaystyle away(away(x,n),m) = away(x,m).$

PROOF: We may assume $ x>0$ . Consider first the case

$\displaystyle away(x,n) = 2^{expo(x)+1}.$

In this case, $ away(x,n)$ is $ m$ -exact, so that

$\displaystyle away(away(x,n),m) = away(x,n) = 2^{expo(x)+1}.$

By Lemma 5.2.8, we need only show that $ away(x,m) \geq
2^{expo(x)+1}$ . But since $ m \leq n$ , $ away(x,m)$ is $ n$ -exact, and since $ away(x,m) \geq x$ , $ away(x,m) \geq away(x,n)$ by Lemma 5.2.14.

Thus, we may assume $ away(x,n) < 2^{expo(x)+1}$ . By Corollary 5.2.10,

$\displaystyle expo(away(x,n)) = expo(x),$

and hence by Lemma 1.1.13,
$\displaystyle {away(away(x,n),m)}$
  $\displaystyle =$ $\displaystyle \lceil 2^{m-1-expo(x)} (\lceil 2^{n-1-expo(x)}x\rceil 2^{expo(x)+1-n})\rceil 2^{expo(x)+1-m}$  
  $\displaystyle =$ $\displaystyle \lceil \lceil 2^{n-1-expo(x)}x\rceil/2^{n-m} \rceil 2^{expo(x)+1-m}$  
  $\displaystyle =$ $\displaystyle \lceil 2^{n-1-expo(x)}x/2^{n-m} \rceil 2^{expo(x)+1-m}$  
  $\displaystyle =$ $\displaystyle \lceil 2^{m-1-expo(x)}x\rceil 2^{expo(x)+1-m}$  
  $\displaystyle =$ $\displaystyle away(x,m).$  

  (plus-away) Let $ x \in \mathbb{Q}$ , $ y \in \mathbb{Q}$ , and $ k \in \mathbb{Z}$ . If $ x \geq 0$ , $ y \geq 0$ , and $ x$ is $ (k+expo(x)-expo(y))$ -exact, then

$\displaystyle x+away(y,k) = away(x+y,k+expo(x+y)-expo(y)).$

PROOF: Let $ n = k+expo(x)-expo(y)$ . Since $ x$ is $ n$ -exact,

$\displaystyle x2^{k-1-expo(y)} = x2^{n-1-expo(x)} \in \mathbb{Z}.$

Let $ k' = k+expo(x+y)-expo(y)$ . Then by Lemma 1.1.12,
$\displaystyle x+away(y,k)$ $\displaystyle =$ $\displaystyle x+\lceil 2^{k-1-expo(y)}y \rceil 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle (x2^{k-1-expo(y)}+\lceil 2^{k-1-expo(y)}y \rceil) 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle \lceil 2^{k-1-expo(y)}(x+y) \rceil 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle \lceil 2^{k'-1-expo(x+y)}(x+y) \rceil 2^{expo(x+y)+1-k'}$  
  $\displaystyle =$ $\displaystyle away(x+y,k').$  

  (minus-trunc-2) Let $ x \in \mathbb{Q}$ , $ y \in \mathbb{Q}$ , and $ k \in \mathbb{Z}$ . If $ x>y>0$ , $ k + expo(x-y) - expo(y) > 0$ , and $ x$ is $ (k+expo(x)-expo(y))$ -exact, then

$\displaystyle x-trunc(y,k) = away(x-y,k+expo(x-y)-expo(y)).$

PROOF: Let $ n = k+expo(x)-expo(y)$ . Since $ x$ is $ n$ -exact,

$\displaystyle x2^{k-1-expo(y)} = x2^{n-1-expo(x)} \in \mathbb{Z}.$

Let $ k' = k+expo(x-y)-expo(y)$ . Then by Lemma 1.1.6,
$\displaystyle x-trunc(y,k)$ $\displaystyle =$ $\displaystyle x-\lfloor 2^{k-1-expo(y)}y \rfloor 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle -(\lfloor 2^{k-1-expo(y)}y \rfloor - x2^{k-1-expo(y)}) 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle -\lfloor 2^{k-1-expo(y)}(y-x) \rfloor 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle -\lfloor 2^{k'-1-expo(y-x)}(y-x) \rfloor 2^{expo(y-x)+1-k'}$  
  $\displaystyle =$ $\displaystyle -trunc(y-x,k')$  
  $\displaystyle =$ $\displaystyle trunc(x-y,k').$  

The following result combines Lemmas 5.1.17 and 5.2.20.

  (trunc-plus-minus) Let $ x \in \mathbb{Q}$ and $ y \in \mathbb{Q}$ such that $ x \neq 0$ , $ y \neq 0$ , and $ x+y \neq 0$ . Let $ k \in \mathbb{Z}$ ,

$\displaystyle k' = k+expo(x)-expo(y),$

and

$\displaystyle k” = k+expo(x+y)-expo(y).$

If $ k”>0$ , and $ x$ is $ k'$ -exact, then

$\displaystyle x+trunc(y,k) = \left\{\begin{array}{ll}
trunc(x+y,k”) & \mbox{if...
...gn(y)$}\\
away(x+y,k”) & \mbox{if $sgn(x+y)\neq sgn(y)$}. \end{array} \right.$

PROOF: By Lemmas 5.1.3 and 5.2.3, we may assume that $ x>0$ . The case $ y > 0$ is handled by Lemma 5.1.16. For the case $ y<0$ , Lemmas 5.1.17 and 5.2.20 cover the subcases $ -y>x$ and $ -y<x$ , respectively. 


We turn now to the problem of bit-level implementation of rounding. Truncation, according to Lemma 5.1.18, is equivalent to a bit slice operation, which may be implemented as a logical operation using Corollary 5.1.19. Other rouding modes may be reduced to the case of truncation by a method known as constant injection. Let $ x$ be $ m$ -exact with $ expo(x) = e$ , say

$\displaystyle x = \verb!b!1.\beta_1\beta_2\cdots\beta_{n-1}\cdot 2^e,$

to be rounded to $ n$ bits, where $ n \leq m$ , according to a rounding mode $ {\cal R}$ . Our goal is to construct a rounding constant $ {\cal C}$ , depending on $ m$ , $ n$ , $ e$ , and $ {\cal R}$ , such that

$\displaystyle {\cal R}(x,n) = trunc(x+{\cal C},n).$

The appropriate constant for the case $ {\cal R} = away$ is

$\displaystyle {\cal C} = 2^e(2^{-(n-1)} - 2^{-(m-1)}) = 2^{e+1}(2^{-n}-2^{-m}),$

which consists of a string of 1's at the bit positions corresponding to the least significant $ m-n$ bits of $ x$ , as illustrated below.


\begin{picture}(120,55)(-25,20)\setlength{\unitlength}{2mm}
\par
\put(10,10){...
...\put(33.4,8){$\times \:\:\:2^e$}
\par
\put(10,6.8){\line(1,0){27}}
\end{picture}

As suggested by the diagram, the addition $ x+{\cal C}$ generates a carry into the position of $ \beta_{n-1}$ unless $ \beta_n = \cdots =
\beta_{m-1} = 0$ , i.e., unless $ x$ is $ n$ -exact, and $ x$ is rounded up accordingly. This observation is formalized by the following lemma.

  (away-imp) Let $ x \in \mathbb{Q}$ , $ m \in \mathbb{N}$ , and $ n \in \mathbb{N}$ . If $ x$ is $ m$ -exact, $ x>0$ , and $ m \geq n > 0$ , then

$\displaystyle away(x,n) = trunc(x+2^{expo(x)+1}(2^{-n}-2^{-m}),n).$

PROOF: Let $ a = trunc(x+2^{expo(x)+1}(2^{-n}-2^{-m}),n)$ . Since $ a$ and $ away(x,n$ are both $ n$ -exact and

$\displaystyle a < x+2^{expo(x)+1-n} \leq away(x,n) + 2^{expo(away(x,n))+1-n},$

$ a \leq away(x,n)$ by Lemma 4.2.15.

If $ x$ is $ n$ -exact, then $ a \geq trunc(x,n) = x = away(x,n)$ , and hence $ a = away(x,n)$ . Thus, we may assume $ x$ is not $ n$ -exact. But then since $ x>trunc(x,n)$ and $ x$ is $ m$ -exact,

$\displaystyle x \geq trunc(x,n) + 2^{expo(x)+1-m}$

and hence

$\displaystyle x+2^{expo(x)+1}(2^{-n}-2^{-m}) \geq trunc(x,n) + 2^{expo(x)+1-n} = away(x,n),$

which implies $ a \geq away(x,n)$



next up previous contents
Next: Unbiased Rounding Up: Rounding Previous: Truncation   Contents
David Russinoff 2007-01-02