next up previous contents
Next: Rebiasing Exponents Up: Floating-Point Formats Previous: Representations with Implicit Leading   Contents

Denormal Representations

Of the two values of the exponent field that lie outside of the range of normal encodings, the upper extreme $ 2^q-1$ is reserved for the encoding of infinities and other non-numerical entities, which will not be discussed here, while an exponent field of 0 is used to encode numerical values that lie below the normal range.

If the exponent and significand fields of an encoding with respect to a format with implicit MSB are both 0, with exponent field 0, then the encoded value itself is 0. If the exponent field is 0 and the significand field is not, then if the significand field is also 0, then the encoded value itself the encoding is said to be denormal.

Definition 5.3.1   (dencodingp) Let $ p \in
\mathbb{N}$ and $ q \in \mathbb{N}$ with $ p>0$ and $ q>0$ , and let $ x$ be a $ (p+q)$ -bit vector. Then

$\displaystyle dencodingp(x,p,q) \Leftrightarrow \mathit{iexpof}(x,p,q) = 0$    and $\displaystyle \mathit{isigf}(x,p,q) \neq 0.$

Theere are two differences between the decoding formulas for denormal and normal representations:

  1. For a denormal encoding, the implicit MSB is taken to be 0 rather than 1, so that the value represented by the significand field is $ \mathit{isigf}(x,p)/2^{p-1}$ .

  2. The power of 2 represented by the zero exponent field of a denormal encoding, which might be expected to be $ 2^{-bias(q)}$ , is instead the same as the minimum value of the normal range, $ 2^{1-bias(q)}$ .

Definition 5.3.2   (ddecode) Let $ x \in \mathbb{N}$ , $ p \in
\mathbb{N}$ , and $ q \in \mathbb{N}$ with $ p>0$ and $ q>0$ . If $ dencodingp(x,p,q)$ , then

$\displaystyle ddecode(x,p,q) = (-1)^{\mbox{\scriptsize {\it isgnf}}(x,p,q)}\mathit{isigf}(x,p)2^{2\mbox{-}p\mbox{-}bias(q)}.$

  (sgn-ddecode, expo-ddecode, sig-ddecode) Let $ x \in \mathbb{N}$ , $ p \in
\mathbb{N}$ , and $ q \in \mathbb{N}$ with $ p>1$ and $ q>0$ . Assume that $ dencodingp(x,p,q)$ and let $ \hat{x} = ddecode(x,p,q)$ .

(a) $ sgn(\hat{x}) = (-1)^{\mbox{\scriptsize {\it isgnf}}(x,p,q)}$
(b) $ expo(\hat{x}) = expo(\mathit{isigf}(x,p))-bias(q) + 2 - p$
(c) $ sig(\hat{x}) = sig(\mathit{isigf}(x,p))$ .

PROOF: (a) is trivial; (b) and (c) follow from Lemmas 4.1.11 and 4.1.12


The class of reals that are representable as denormal encodings is recognized by the following predicate.

Definition 5.3.3   (drepp) Let $ r \in \mathbb{Q}$ , $ p \in
\mathbb{N}$ , and $ q \in \mathbb{N}$ with $ p>1$ and $ q>0$ . Then $ drepp(r,p,q)$ if and only if all of the following are true:

(a) $ r \neq 0$ ,
(b) $ 2-p \leq expo(r)+bias(q) \leq 0$ , and
(c) $ r$ is $ (p-2+2^{q-1}+expo(r))$ -exact.

If a number is so representable, then its encoding is constructed as follows.

Definition 5.3.4   (dencode) Let $ r \in \mathbb{Q}$ , $ p \in
\mathbb{N}$ , and $ q \in \mathbb{N}$ with $ p>0$ and $ q>0$ . If $ drepp(r,p,q)$ , then

$\displaystyle dencode(r,p,q) = \{1\verb!'!neg(r), q\verb!'b!0, (2^{p\mbox{-}2+expo(r)+bias(q)}sig(r))[p\mbox{-}2:0]\},$

where

$\displaystyle neg(r) = \left\{\begin{array}{ll}
0 & \mbox{if $r > 0$}\\
1 & \mbox{if $r < 0$.}
\end{array}\right.$

Next, we examine the relationship between the decoding and encoding functions.

  (drepp-ddecode, dencode-ddecode) Let $ x \in \mathbb{N}$ , $ p \in
\mathbb{N}$ , and $ q \in \mathbb{N}$ with $ p>1$ and $ q>0$ . If $ dencodingp(x,p,q)$ , then

$\displaystyle drepp(ddecode(x,p,q),p,q)$

and

$\displaystyle dencode(ddecode(x,p,q),p,q) = x.$

PROOF: Let $ \hat{x} = ddecode(x,p,q)$ . Since $ 1 \leq \mathit{isigf}(x,p) < 2^{p-1}$ ,

$\displaystyle 2^{2-p-bias(q)} \leq \vert\hat{x}\vert = \mathit{isigf}(x,p)2^{2\mbox{-}p\mbox{-}bias(q)} < 2^{1-bias(q)},$

and by Lemma 4.1.2,

$\displaystyle 2-p-bias(q) \leq expo(\hat{x}) < 1-bias(q),$

which is equivalent to Definition 5.3.3(b). In order to prove (c), we must show, according to Definition 4.2.1, that

$\displaystyle 2^{p-2+2^{q-1}+expo(\hat{x})-1}sig(\hat{x}) = 2^{p-2+bias(q)+expo(\hat{x})}sig(\hat{x}) \in \mathbb{Z}.$

But
$\displaystyle 2^{p-2+bias(q)+expo(\hat{x})}sig(\hat{x})$ $\displaystyle =$ $\displaystyle 2^{p-2+bias(q)+expo(\hat{x})}\vert\hat{x}\vert 2^{-expo(\hat{x})}$  
  $\displaystyle =$ $\displaystyle 2^{p-2+bias(q)}\vert\hat{x}\vert$  
  $\displaystyle =$ $\displaystyle 2^{p-2+bias(q)}\mathit{isigf}(x,p)2^{2\mbox{-}p\mbox{-}bias(q)}$  
  $\displaystyle =$ $\displaystyle \mathit{isigf}(x,p) \in \mathbb{Z}.$  

This establishes $ drepp(\hat(x),p,q)$ .

Now by Definition 5.3.2,

$\displaystyle \mathit{isgnf}(x,p,q) = \left\{\begin{array}{ll}
0 & \mbox{if $\hat{x} > 0$}\\
1 & \mbox{if $\hat{x} < 0$.}
\end{array}\right.$

Therefore, by Definitions 5.3.1, 5.3.4, and 5.2.1 and Lemmas 2.4.9 and 2.2.5,
$\displaystyle {dencode(\hat{x},p,q)}$
  $\displaystyle =$ $\displaystyle \{1\verb!'!\mathit{isgnf}(x,p,q), q\verb!'b!0, (2^{p\mbox{-}2+expo(\hat{x})+bias(q)}sig(\hat{x}))[p\mbox{-}2:0]\}$  
  $\displaystyle =$ $\displaystyle \{1\verb!'!\mathit{isgnf}(x,p,q), \mathit{iexpof}(x,p,q)[q-1:0], \mathit{isigf}(x,p)[p-2:0]\}$  
  $\displaystyle =$ $\displaystyle \{x[p+q-1], x[p+q-2:p-1], x[p-2:0]\}$  
  $\displaystyle =$ $\displaystyle x[p+q-1:0]$  
  $\displaystyle =$ $\displaystyle x.$  

  (dencodingp-dencode, ddecode-dencode) Let $ r \in \mathbb{Q}$ , $ p \in
\mathbb{N}$ , and $ q \in \mathbb{N}$ with $ p>1$ and $ q>0$ . If $ drepp(r,p,q)$ , then

$\displaystyle dencodingp(dencode(r,p,q),p,q)$

and

$\displaystyle ddecode(dencode(r,p,q),p,q) = r.$

PROOF: Let $ x = dencode(r,p,q)$ . By Lemma 2.4.1, $ x$ is a $ (p+q)$ -bit vector and by Lemma 2.4.7,

$\displaystyle \mathit{isgnf}(x,p,q) = x[p+q-1] = \left\{\begin{array}{ll}
0 & \mbox{if $r > 0$}\\
1 & \mbox{if $r < 0$,}
\end{array}\right.$

$\displaystyle \mathit{iexpof}(x,p,q) = x[p+q-2:p-1] = 0,$

and

$\displaystyle \mathit{isigf}(x,p) = x[p-2:0] = (2^{p-2+expo(r)+bias(q)}sig(r))[p-2:0].$

Since $ r$ is $ (p-2+2^{q-1}+expo(r))$ -exact,

$\displaystyle 2^{p-2+expo(r)+bias(q)}sig(r) = 2^{(p-2+2^{q-1}+expo(r))-1}sig(r) = \in \mathbb{Z}$

and since $ expo(r) + bias(q) \leq 0$ ,

$\displaystyle 2^{p-2+expo(r)+bias(q)}sig(r) < 2^{p-2}\cdot 2 = 2^{p-1}$

by Lemma 4.1.7, which implies

$\displaystyle (2^{p-2+expo(r)+bias(q)}sig(r))[p-2:0] = 2^{p-2+expo(r)+bias(q)}sig(r).$

Finally, according to Definition 5.3.2,
$\displaystyle ddecode(x,p,q)$ $\displaystyle =$ $\displaystyle (-1)^{\mbox{\scriptsize {\it isgnf}}(x,p,q)}\mathit{isigf}(x,p)2^{2\mbox{-}p\mbox{-}bias(q)}$  
  $\displaystyle =$ $\displaystyle sgn(r)2^{p-2+expo(r)+bias(q)}sig(r)2^{2\mbox{-}p\mbox{-}bias(q)}$  
  $\displaystyle =$ $\displaystyle sgn(r)sig(r)2^{expo(r)}$  
  $\displaystyle =$ $\displaystyle r.$  

The smallest positive denormal is computed by the following function:

Definition 5.3.5   (spd) For all $ p \in
\mathbb{N}$ and $ q \in \mathbb{N}$ , $ spd(p,q) = 2^{2-bias(q)-p}$ .

  (positive-spd, drepp-spd, spd-smallest)
Let $ p \in
\mathbb{N}$ and $ q \in \mathbb{N}$ with $ p>1$ and $ q>0$ .

(a) $ spd(p,q) > 0$
(b) $ drepp(spd(p,q),p,q)$
(c) If $ r \in \mathbb{Q}$ , $ r>0$ , and $ drepp(r,p,q)$ , then $ r \geq spd(p,q)$ .

PROOF: It is clear that $ spd(p,q)$ is positive. To show that $ spd(p,q)$ is $ (p-2+2^{q-1}+expo(spd(p,q)))$ -exact, we need only observe that

$\displaystyle p-2+2^{q-1}+expo(spd(p,q)) = p-2+2^{q-1}+2-(2^{q-1}-1)-p = 1.$

Finally, since

$\displaystyle expo(spd(p,q)) + bias(q) = 2-p < 0,$

$ drepp(spd(p,q)$ holds and moreover, $ spd(p,q)$ is the smallest positive $ r$ that satisfies $ 2-p \leq expo(r)+bias(q)$


Every number with a denormal representation is a multiple of the smallest positive denormal.

  (spd-mult) Let $ r \in \mathbb{Q}$ , $ p \in
\mathbb{N}$ , and $ q \in \mathbb{N}$ with $ p>1$ and $ q>0$ . Then $ drepp(r,p,q)$ if and only if $ r = m \cdot spd(p,q)$ for some $ m \in \mathbb{N}$ , $ 1 \leq m < 2^{p-1}$ .

PROOF: For $ 1 \leq m \leq p-1$ , let $ a_m = m \cdot spd(p,q)$ . Then $ a_1 = spd(p,q)$ and

$\displaystyle a_{2^{p-1}} = 2^{p-1}spd(p,q) = 2^{p-1}2^{2-bias(q)-p} = 2^{1-bias(q)} = spn(q).$

We shall show, by induction on $ m$ , that $ drepp(a_m,p,q)$ for $ 1 \leq m<2^{n-1}$ . First note that for all such $ m$ ,
$\displaystyle {fp^+(a_m,p+expo(a_m)-expo(spn(q)))}$
  $\displaystyle =$ $\displaystyle a_m + 2^{expo(a_m)+1-(p+expo(a_m)-expo(spn(q)))}$  
  $\displaystyle =$ $\displaystyle a_m + 2^{expo(spn(q))-(p-1)}$  
  $\displaystyle =$ $\displaystyle a_m + spd(p,q)$  
  $\displaystyle =$ $\displaystyle a_{m+1}.$  

Suppose that $ drepp(a_{m-1},p,q)$ for some $ m$ , $ 1 < m < 2^{p-1}$ . Then $ a_{m-1}$ is $ (p+expo(a_{m-1})-expo(spn(q)))$ -exact, and by Lemma 4.2.16, so is $ a_m$ . But since $ expo(a_m) \geq expo(a_{m-1}$ , it follows from Lemma 4.2.5 that $ a_m$ is also $ (p+expo(a_m)-expo(spn(q)))$ -exact. Since

$\displaystyle a_m < a_{2^{p-1}} = spn(q) = 2^{1-bias(q)},$

$ expo(a_m) < 1-bias(q)$ , i.e., $ expo(a_m) + bias(q) \leq 0$ , and hence, $ drepp(a_m,p,q)$ .

Now suppose that $ z \in {mathbb Q}$ and $ drepp(z,p,q)$ . Let $ m = \lfloor z/a_1 \rfloor$ . Clearly, $ 1 \leq m < 2^{p-1}$ , and $ a_m \leq z < a_{m+1}$ . It follows from Lemma 4.2.17 that $ expo(z) = expo(a_m)$ , and consequently, $ z$ is $ (p+expo(a_m)-expo(spn(q)))$ -exact. Thus, by Lemma 4.2.16, $ z = a_m$



next up previous contents
Next: Rebiasing Exponents Up: Floating-Point Formats Previous: Representations with Implicit Leading   Contents
david.m.russinoff 2015-04-17