next up previous contents
Next: Rounding Away from Zero Up: Rounding Previous: Rounding   Contents


Truncation

The most basic rounding mode, which the IEEE standard calls “round toward 0”, may be described as a three-step operation on the significand of its first argument. Thus, $ trunc(x,n)$ has the same sign and exponent as $ x$ , while its significand is computed from $ sig(x)$ as follows:

(1) Shift the binary point of $ sig(x)$ by $ n-1$ bits to the right.

(2) Extract the floor of the result.

(3) Shift the binary point by $ n-1$ bits to the left.

In other words, $ trunc(x,n)$ is the result of replacing $ sig(x)$ by $ \lfloor 2^{n-1}sig(x) \rfloor 2^{1-n}$ . This is the content of the following definition.

Definition 5.1.1   (trunc) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{Z}$ ,

$\displaystyle trunc(x,n) = sgn(x)\lfloor 2^{n-1}sig(x) \rfloor 2^{expo(x)-n+1}.$

Example: Let $ x = 45/8 = \verb!b!101.101$ and $ n = 5$ . Then $ sgn(x) = 1$ , $ expo(x) = 2$ , $ sig(x) = \verb!b!1.01101$ ,

$\displaystyle \lfloor 2^{n-1}sig(x) \rfloor 2^{1-n} = \lfloor \verb!b!10110.1 \rfloor 2^{-4}
= \verb!b!10110 \cdot 2^{-4} = \verb!b!1.011,$

and

$\displaystyle trunc(x,n) =\lfloor 2^{n-1}sig(x) \rfloor 2^{1-n}2^{expo(x)}
= \verb!b!1.011 \cdot 2^{2} = \verb!b!101.1.$

Note that this value is the largest 5-exact number that does not exceed $ x$ .

Although the second argument of trunc is normally positive, the following lemma is occasionally useful.

  (trunc-to-0) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{Z}$ , if $ n \leq 0$ , then

$\displaystyle trunc(x,n) = 0.$

PROOF: By Lemma 4.1.7,

$\displaystyle 0 < 2^{n-1}sig(x) \leq sig(x)/2 < 1,$

which implies $ \lfloor 2^{n-1}sig(x) \rfloor = 0$


Like all rounding modes, trunc preserves sign.

Lemma 5.1.2   (sgn-trunc) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ n>0$ , then

$\displaystyle sgn(trunc(x,n)) = sgn(x).$

This particular mode also has the property of oddness, which, as we shall see, is not common to all IEEE modes.

Lemma 5.1.3   (trunc-minus) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle trunc(-x,n) = -trunc(x,n).$

All of the modes that we shall consider commute with the shift operation.

  (trunc-shift) For all $ x \in \mathbb{Q}$ , $ n \in \mathbb{N}$ , and $ k \in \mathbb{Z}$ ,

$\displaystyle trunc(2^kx,n) = 2^k trunc(x,n).$

PROOF: By Lemma 4.1.12,

$\displaystyle trunc(2^kx,n)$ $\displaystyle =$ $\displaystyle sgn(2^kx)\lfloor 2^{n-1}sig(2^kx) \rfloor 2^{expo(2^kx)-n+1}$  
  $\displaystyle =$ $\displaystyle sgn(x)\lfloor 2^{n-1}sig(x) \rfloor 2^{expo(x)+k-n+1}$  
  $\displaystyle =$ $\displaystyle 2^k trunc(x,n).$  

The inequality stated in the next lemma is the defining characteristic of trunc as a rounding mode.

  (trunc-upper-bound) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle \vert trunc(x,n)\vert \leq \vert x\vert.$

PROOF: By Definition 1.1.1 and Lemma 4.1.1,

$\displaystyle \vert trunc(x,n)\vert \leq 2^{n-1}sig(x) 2^{expo(x)-n+1} = sig(x)2^{expo(x)} = \vert x\vert.$

As we have noted, trunc also preserves exponent. This may be proved as a consequence of Lemma 5.1.5.

  (expo-trunc) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ n>0$ , then

$\displaystyle expo(trunc(x,n)) = expo(x).$

PROOF: Since $ \vert trunc(x,n)\vert \leq \vert x\vert$ , $ expo(trunc(x,n)) \leq expo(x)$ by Lemma 4.1.4. But by Lemmas 4.1.7 and 1.1.5,

$\displaystyle \vert trunc(x,n)\vert = \lfloor 2^{n-1}sig(x) \rfloor 2^{expo(x)-n+1}
\geq 2^{n-1} 2^{expo(x)-n+1} = 2^{expo(x)},$

and hence, by Lemma 4.1.2,

$\displaystyle expo(trunc(x,n)) \geq expo(x).$

The following inequality complements Lemma 5.1.5, confining $ trunc(x,n)$ to an interval.

  (trunc-lower-bound,trunc-lower-2)
If $ x \in \mathbb{Q}$ , $ x \neq 0$ , and $ n \in \mathbb{N}$ , then

$\displaystyle \vert trunc(x,n)\vert > \vert x\vert - 2^{expo(x)-n+1} \geq \vert x\vert(1-2^{1-n}).$

PROOF: By Definitions 5.1.1 and 1.1.1 and Lemma 4.1.1,

$\displaystyle \vert trunc(x,n)\vert > (2^{n-1}sig(x)-1)2^{expo(x)-n+1} = \vert x\vert - 2^{expo(x)-n+1}.$

The second inequality follows from Definition 4.1.1.. 

  (trunc-diff) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ ,

$\displaystyle \vert x-trunc(x,n)\vert < 2^{expo(x)-n+1}.$

PROOF: By Lemmas 5.1.2 and 5.1.5,

$\displaystyle \vert x-trunc(x,n)\vert = \vert\vert x\vert-\vert trunc(x,n)\vert\vert = \vert x\vert-\vert trunc(x,n)\vert.$

The corollary now follows from Lemma 5.1.7


With the next four lemmas, we establish trunc as a true rounding mode, as defined by the axioms listed at the beginning of this chapter.

  (trunc-exactp-a) For all $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ , $ trunc(x,n)$ is $ n$ -exact.

PROOF: Since $ expo(trunc(x,n)) = expo(x)$ , it suffices to observe that

$\displaystyle trunc(x,n)2^{n-1-expo(x)} = sgn(x)\lfloor 2^{n-1}sig(x) \rfloor \in \mathbb{Z}.$

  (trunc-diff-expo) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ n>0$ and x is not $ n$ -exact, then

$\displaystyle expo(x-trunc(x,n)) \leq expo(x)-n.$

PROOF: According to Lemma 5.1.9, the hypothesis implies that $ x-trunc(x,n) \neq 0$ , and hence Lemma 4.1.2 applies. 

  (trunc-exactp-b) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ n>0$ and x is $ n$ -exact, then

$\displaystyle trunc(x,n) = x.$

PROOF: By Definition 4.2.1 and Lemma 1.1.1,

$\displaystyle \lfloor 2^{n-1}sig(x) \rfloor = 2^{n-1}sig(x),$

and hence by Definition 5.1.1 and Lemma 4.1.1,
$\displaystyle trunc(x,n)$ $\displaystyle =$ $\displaystyle sgn(x)\lfloor 2^{n-1}sig(x) \rfloor 2^{expo(x)-n+1}$  
  $\displaystyle =$ $\displaystyle sgn(x)2^{n-1}sig(x)2^{expo(x)-n+1}$  
  $\displaystyle =$ $\displaystyle sgn(x)sig(x)2^{expo(x)}$  
  $\displaystyle =$ $\displaystyle x.$  

  (trunc-exactp-c) Let $ x \in \mathbb{Q}$ , $ a \in \mathbb{Q}$ , and $ n \in \mathbb{N}$ . If $ a$ is $ n$ -exact and $ a \leq x$ , then $ a \leq trunc(x,n)$ .

PROOF: If $ x<0$ , then $ x \leq trunc(x,n)$ by Lemma 5.1.5. Therefore, we may assume that $ x \geq 0$ . Suppose $ a > trunc(x,n)$ . Then by Lemma 4.2.15,

$\displaystyle trunc(x,n) \leq a - 2^{expo(trunc(x,n))-n+1} = a - 2^{expo(x)-n+1} \leq x - 2^{expo(x)-n+1},$

contradicting Lemma 5.1.7

  (trunc-monotone) Let $ x \in \mathbb{Q}$ , $ y \in \mathbb{Q}$ , and $ n \in \mathbb{N}$ . If $ x \leq y$ , then $ trunc(x,n) \leq trunc(y,n)$ .

PROOF: First suppose $ x>0$ . By Lemma 5.1.5, $ trunc(x,n) \leq x \leq y$ . Since $ trunc(x,n)$ is $ n$ -exact by Lemma 5.1.9, Lemma 5.1.12 implies

$\displaystyle trunc(x,n) \leq trunc(y,n).$

Now suppose $ x \leq 0$ . By Lemma 5.1.2, we may assume that $ x \leq y < 0$ . Thus, since $ 0 < -y \leq -x$ , we have $ trunc(-y,n) \leq trunc(-x,n)$ and by Lemma 5.1.3,

$\displaystyle trunc(x,n) = -trunc(-x,n) \leq -trunc(-y,n) = trunc(y,n).$

If $ x$ is $ (n+1)$ -exact but not $ n$ -exact, then $ x$ is equidistant from two successive $ n$ -exact numbers. In this case, we have an explicit formula for $ trunc(x,n)$ .

  (trunc-midpoint) Let $ x \in \mathbb{Q}$ and $ n \in \mathbb{N}$ . If $ x$ is $ (n+1)$ -exact but not $ n$ -exact, then

$\displaystyle trunc(x,n) = x - sgn(x)2^{expo(x)-n}.$

PROOF: For the case $ n=0$ , we have $ sig(x) = 1$ by Definition 4.2.1, and by Lemmas 4.1.1 and 5.1.1,

$\displaystyle x - sgn(x)2^{expo(x)-n} = sgn(x)2^{expo(x)} - sgn(x)2^{expo(x)} = 0 = trunc(x,n).$

Thus, we may assume $ n>0$ , and by Lemma 5.1.3, we may also assume $ x>0$ .

Let $ a = x-2^{expo(x)-n}$ and $ b = x+2^{expo(x)-n}$ . Since $ x>2^{expo(x)}$ , $ x \geq 2^{expo(x)}+2^{expo(x)+1-n}$ by Lemma 4.2.15, and hence $ a \geq 2^{expo(x)}$ and $ expo(a) = expo(x)$ . It follows that $ b =$   fp$ ^+(a,n)$ .

By hypothesis, $ x2^{n-expo(x)} \in \mathbb{Z}$ but $ x2^{n-expo(x)}/2 = x2^{n-1-expo(x)} \notin \mathbb{Z}$ , and therefore, $ x2^{n-expo(x)}$ is odd. Let $ x2^{n-expo(x)} = 2k+1$ . Then

$\displaystyle a2^{n-1-expo(a)} = (x-2^{expo(x)-n})2^{n-1-expo(x)} = (2k+1)/2 - 1/2 = k \in \mathbb{Z}.$

Thus, $ a$ is $ n$ -exact, and by Lemma 4.2.14, so is $ a+2^{expo(a)+1-n}=b$ . Now by Lemma 5.1.12, $ a \leq trunc(x,n)$ , but if $ a < trunc(x,n)$ , then since $ trunc(x,n)$ is $ n$ -exact, Lemma 4.2.15 would imply $ b \leq trunc(x,n)$ , contradicting $ x<b$ . Therefore, $ a = trunc(x,n)$


The following lemma states an intuitively obvious property of truncation, which may be derived directly from Definition 5.1.1.

  (trunc-trunc) Let $ x \in \mathbb{Q}$ , $ m \in \mathbb{N}$ , and $ n \in \mathbb{N}$ . If $ m \leq n$ , then

$\displaystyle trunc(trunc(x,n),m) = trunc(x,m).$

PROOF: We assume $ x \geq 0$ ; the case $ x<0$ follows easily from Lemma 5.1.3. Applying Lemma 1.1.7, we have

$\displaystyle {trunc(trunc(x,n),m)}$
  $\displaystyle =$ $\displaystyle \lfloor 2^{m-1-expo(x)} (\lfloor 2^{n-1-expo(x)}x\rfloor 2^{expo(x)+1-n})\rfloor 2^{expo(x)+1-m}$  
  $\displaystyle =$ $\displaystyle \lfloor \lfloor 2^{n-1-expo(x)}x\rfloor/2^{n-m} \rfloor 2^{expo(x)+1-m}$  
  $\displaystyle =$ $\displaystyle \lfloor 2^{n-1-expo(x)}x/2^{n-m} \rfloor 2^{expo(x)+1-m}$  
  $\displaystyle =$ $\displaystyle \lfloor 2^{m-1-expo(x)}x\rfloor 2^{expo(x)+1-m}$  
  $\displaystyle =$  

Figure 5.1 is provided as a visual aid in understanding the following lemma, which formulates the precise conditions under which a truncated sum may be computed by truncating one of the summands in advance of the addition.

Figure 5.1: Lemma 5.1.16
\begin{figure}\begin{picture}(120,115)(-58,-2)\setlength{\unitlength}{2mm}
\p...
...                          }}_{k+expo(x+y)-expo(y)}$}}
\end{picture}
\end{figure}

  (plus-trunc) Let $ x \in \mathbb{Q}$ , $ y \in \mathbb{Q}$ , and $ k \in \mathbb{Z}$ . If $ x \geq 0$ , $ y \geq 0$ , and $ x$ is $ (k+expo(x)-expo(y))$ -exact, then

$\displaystyle x+trunc(y,k) = trunc(x+y,k+expo(x+y)-expo(y)).$

PROOF: Let $ n = k+expo(x)-expo(y)$ . Since $ x$ is $ n$ -exact,

$\displaystyle x2^{k-1-expo(y)} = x2^{n-1-expo(x)} \in \mathbb{Z}.$

Let $ k' = k+expo(x+y)-expo(y)$ . Then by Lemma 1.1.6,
$\displaystyle x+trunc(y,k)$ $\displaystyle =$ $\displaystyle x+\lfloor 2^{k-1-expo(y)}y \rfloor 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle (x2^{k-1-expo(y)}+\lfloor 2^{k-1-expo(y)}y \rfloor) 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle \lfloor 2^{k-1-expo(y)}(x+y) \rfloor 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle \lfloor 2^{k'-1-expo(x+y)}(x+y) \rfloor 2^{expo(x+y)+1-k'}$  
  $\displaystyle =$ $\displaystyle trunc(x+y,k').$  

Lemma 5.1.16 holds for subtraction as well, but only if the summands are properly ordered.

  (minus-trunc-1) Let $ x \in \mathbb{Q}$ , $ y \in \mathbb{Q}$ , and $ k \in \mathbb{Z}$ . If $ y>x>0$ , $ k + expo(x-y) - expo(y) > 0$ , and $ x$ is $ (k+expo(x)-expo(y))$ -exact, then

$\displaystyle x-trunc(y,k) = trunc(x-y,k+expo(x-y)-expo(y)).$

PROOF: Let $ n = k+expo(x)-expo(y)$ . Since $ x$ is $ n$ -exact,

$\displaystyle x2^{k-1-expo(y)} = x2^{n-1-expo(x)} \in \mathbb{Z}.$

Let $ k' = k+expo(x-y)-expo(y)$ . Then by Lemma 1.1.6,
$\displaystyle x-trunc(y,k)$ $\displaystyle =$ $\displaystyle x-\lfloor 2^{k-1-expo(y)}y \rfloor 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle -(\lfloor 2^{k-1-expo(y)}y \rfloor - x2^{k-1-expo(y)}) 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle -\lfloor 2^{k-1-expo(y)}(y-x) \rfloor 2^{expo(y)+1-k}$  
  $\displaystyle =$ $\displaystyle -\lfloor 2^{k'-1-expo(y-x)}(y-x) \rfloor 2^{expo(y-x)+1-k'}$  
  $\displaystyle =$ $\displaystyle -trunc(y-x,k')$  
  $\displaystyle =$ $\displaystyle trunc(x-y,k').$  

Truncation of a bit vector may be described in terms of the bit slice operator.

  (bits-trunc) Let $ x \in \mathbb{N}$ and $ k \in \mathbb{N}$ , and let $ n = expo(x)+1$ . If $ 0<k \leq n$ , then

$\displaystyle trunc(x,k) = 2^{n-k}x[n-1:n-k].$

PROOF: By Lemmas 2.2.11 and 2.2.13,

$\displaystyle trunc(x,k)$ $\displaystyle =$ $\displaystyle \lfloor 2^{k-1-expo(x)}x \rfloor 2^{expo(x)+1-k}$  
  $\displaystyle =$ $\displaystyle \lfloor x/2^{n-k} \rfloor 2^{n-k}$  
  $\displaystyle =$ $\displaystyle 2^{n-k}\lfloor x/2^{n-k} \rfloor [k-1:0]$  
  $\displaystyle =$ $\displaystyle 2^{n-k}x[n-1:n-k].$  

  (trunc-land) Let $ x \in \mathbb{N}$ , $ m \in \mathbb{N}$ , and $ k \in \mathbb{N}$ , and let $ n = expo(x)+1$ . If $ 0<k<n\leq m$ , then

$\displaystyle trunc(x,k) = x[n-1:0] \;\verb!&!\; (2^{m} - 2^{n-k})[n-1:0].$

PROOF: By Lemmas 5.1.18 and 3.2.18,

$\displaystyle trunc(x,k) = 2^{n-k}x[n-1:n-k] = x[n-1:0] \;\verb!&!\; (2^{n}-2^{n-k})[n-1:0].$

But since

$\displaystyle 2^m - 2^{n-k} = 2^n(2^{m-n}-1) + 2^n - 2^{n-k},$


$\displaystyle (2^{m}-2^{n-k})[n-1:0]$ $\displaystyle =$ $\displaystyle mod(2^{m}-2^{n-k}, 2^n)$  
  $\displaystyle =$ $\displaystyle 2^n - 2^{n-k}$  
  $\displaystyle =$ $\displaystyle (2^{n}-2^{n-k})[n-1:0].$  

  (trunc-split) Let $ x \in \mathbb{N}$ , $ m \in \mathbb{N}$ , and $ k \in \mathbb{N}$ , and let $ n = expo(x)+1$ . If $ x \geq 0$ and $ 0<k < m \leq n$ , then

$\displaystyle trunc(x,m) = trunc(x,k) + 2^{n-m} x[n-k-1:n-m].$

PROOF: By Lemmas 5.1.18 and2.2.19,

$\displaystyle trunc(x,m)$ $\displaystyle =$ $\displaystyle 2^{n-m} x[n-1:n-m]$  
  $\displaystyle =$ $\displaystyle 2^{n-m}(2^{m-k}x[n-1:n-k] + x[n-k-1:n-m])$  
  $\displaystyle =$ $\displaystyle 2^{n-k}x[n-1:n-k] + 2^{n-m}x[n-k-1:n-m]$  
  $\displaystyle =$ $\displaystyle trunc(x,k) + 2^{n-m} x[n-k-1:n-m].$  


next up previous contents
Next: Rounding Away from Zero Up: Rounding Previous: Rounding   Contents
David Russinoff 2007-01-02