5.3  Denormals and Zeroes

An exponent field of 0 is used to encode numerical values that lie below the normal range. If the exponent and significand fields of an encoding are both 0, then the encoded value itself is 0 and the encoding is said to be a zero. If the exponent field is 0 and the significand field is not, then the encoding is either denormal or pseudo-denormal:

Definition 5.3.1   (zerp, denormp, pseudop) Let $ x$ be an encoding for a format $ F$ with $ \mathit{expf}(x, F) = 0$.

(a) If $ \mathit{sigf}(x,F) = 0$, then $ x$ is a zero encoding for F.

(b) If $ \mathit{sigf}(x,F) \neq 0$ and either $ F$ is implicit or $ x[\mathit{prec}(F)-1] = 0$, then $ x$ is a denormal encoding for F.

(c) If $ F$ is explicit and $ x[\mathit{prec}(F)-1] = 1$, then $ x$ is a pseudo-denormal encoding for F.

Note that a zero can have either sign:

Definition 5.3.2   (zencode) Let $ F$ be a format and let $ s \in \{0, 1\}$. Then

$\displaystyle \mathit{zencode}(s, F) = \{1\verb!'!s, (\mathit{expw}(F)+\mathit{sigw}(F))\verb!'!0\}.
$

There are two differences between the decoding formulas for denormal and normal encodings:

  1. For a denormal encoding for an implicit format, the integer bit is taken to be 0 rather than 1.

  2. The power of 2 represented by the zero exponent field of a denormal or pseudo-denormal encoding is $ 2^{1-\mathit{bias}(F)}$ rather than $ 2^{-\mathit{bias}(F)}$.

Definition 5.3.3   (ddecode) If $ x$ is an encoding for a format $ F$ with $ \mathit{expf}(x, F) = 0$, then
$\displaystyle \mathit{ddecode}(x,F)$ $\displaystyle =$ $\displaystyle (-1)^{\mathit{sgnf}(x,F)}\left(2^{1-\mathit{prec}(F)}\mathit{sigf}(x,F)\right)2^{1-\mathit{bias}(F)}$  
  $\displaystyle =$ $\displaystyle (-1)^{\mathit{sgnf}(x,F)}\mathit{sigf}(x,F)2^{2-\mathit{prec}(F)-\mathit{bias}(F)}$  

We also define a general decoding function:

Definition 5.3.4   (decode) Let $ x$ be an encoding for a format $ F$. If $ \mathit{expf}(x, F) \neq 2^{\mathit{expw}(F)}-1$, then

$\displaystyle \mathit{decode}(x,F) = \left\{\begin{array}{ll}
\mathit{ndecode}...
... \mathit{ddecode}(x,F) & \mbox{if $\mathit{expf}(x,F) = 0$.}\end{array}\right.
$

  (sgn-ddecode, expo-ddecode, sig-ddecode) Let $ x$ be a denormal encoding for a format $ F$ and let $ \hat{x} = ddecode(x,F)$.

(a) $ \mathit{sgn}(\hat{x}) = (-1)^{\mathit{sgnf}(x,F)}$.

(b) $ \mathit{expo}(\hat{x}) = \mathit{expo}(\mathit{sigf}(x,F))-\mathit{bias}(F) + 2 - \mathit{prec}(F)$.

(c) $ \mathit{sig}(\hat{x}) = \mathit{sig}(\mathit{sigf}(x,F))$.

PROOF: (a) is trivial; (b) and (c) follow from Lemmas 4.1.13 and 4.1.14


The class of numbers that are representable as denormal encodings is recognized by the following predicate.

Definition 5.3.5   (drepp) Let $ F$ be a format and let $ r \in \mathbb{Q}$. Then $ r$ is representable as a denormal in $ F$ iff the following conditions hold:

(a) $ r \neq 0$;

(b) $ 2-\mathit{prec}(F) \leq \mathit{expo}(r)+\mathit{bias}(F) \leq 0$;

(c) $ r$ is $ (\mathit{prec}(F)-2+2^{\mathit{expw}(F)-1}+\mathit{expo}(r))$-exact.

If a number is so representable, then its encoding is constructed as follows.

Definition 5.3.6   (dencode) If $ r$ is representable as a denormal in $ F$, then

$\displaystyle \mathit{dencode}(r,F) = \{1\verb!'!s, \mathit{expw}(F)\verb!'!0, ...
...b!'!2^{\mathit{prec}(F)-2+\mathit{expo}(r)+\mathit{bias}(F)}\mathit{sig}(r)\},
$

where $ s = \left\{\begin{array}{ll}
0 & \mbox{if $r > 0$}\\
1 & \mbox{if $r < 0$.}
\end{array}\right.$

Next, we examine the relationship between the decoding and encoding functions.

  (drepp-ddecode, dencode-ddecode) If $ x$ is a denormal encoding for a format $ F$, then $ \mathit{ddecode}(x,F)$ is representable as a denormal in $ F$ and

$\displaystyle \mathit{dencode}(\mathit{ddecode}(x,F),F) = x.$

PROOF: Let $ p = \mathit{prec}(F)$, $ q = \mathit{expw}(F)$, $ B = \mathit{bias}(F)$, and $ \hat{x} = \mathit{ddecode}(x,F)$. Since $ 1 \leq \mathit{sigf}(x,F) < 2^{p-1}$,

$\displaystyle 2^{2-p-B} \leq \vert\hat{x}\vert = \mathit{sigf}(x,p)2^{2\mbox{-}p\mbox{-}B} < 2^{1-B},
$

and by Lemma 4.1.3,

$\displaystyle 2-p-B \leq expo(\hat{x}) < 1-B,$

which is equivalent to Definition 5.3.5(b). In order to prove (c), we must show, according to Definition 4.2.1, that

$\displaystyle 2^{p-2+2^{q-1}+expo(\hat{x})-1}sig(\hat{x}) = 2^{p-2+bias(F)+expo(\hat{x})}sig(\hat{x}) \in \mathbb{Z}.$

But
$\displaystyle 2^{p-2+bias(F)+expo(\hat{x})}sig(\hat{x})$ $\displaystyle =$ $\displaystyle 2^{p-2+B+expo(\hat{x})}\vert\hat{x}\vert 2^{-expo(\hat{x})}$  
  $\displaystyle =$ $\displaystyle 2^{p-2+B}\vert\hat{x}\vert$  
  $\displaystyle =$ $\displaystyle 2^{p-2+B}\mathit{sigf}(x,F)2^{2\mbox{-}p\mbox{-}B}$  
  $\displaystyle =$ $\displaystyle \mathit{sigf}(x,F) \in \mathbb{Z}.$  

This establishes that $ \hat{x}$ is representable as a denormal.

Now by Definition 5.3.3, $ \mathit{sgnf}(x,F) = \left\{\begin{array}{ll}
0 & \mbox{if $\hat{x} > 0$}\\
1 & \mbox{if $\hat{x} < 0$.}
\end{array}\right.$
Therefore, by Definitions 5.3.1, 5.3.6, and 5.1.4 and Lemmas 2.4.9 and 2.2.5,

$\displaystyle dencode(\hat{x},F)$ $\displaystyle =$ $\displaystyle \{1\verb!'!\mathit{sgnf}(x,F), q\verb!'!0, (2^{p\mbox{-}2+expo(\hat{x})+bias(F)}sig(\hat{x}))[p\mbox{-}2:0]\}$  
  $\displaystyle =$ $\displaystyle \{1\verb!'!\mathit{sgnf}(x,F), q\verb!'!\mathit{expf}(x,F), \mathit{sigw}(F)\mathit{sigf}(x,F)\}$  
  $\displaystyle =$ $\displaystyle x.$  

  (denormp-dencode, ddecode-dencode) If $ r$ be representable as a denormal in $ F$, then $ \mathit{dencode}(r,F)$ is a denormal encoding for $ F$ andand

$\displaystyle \mathit{ddecode}(\mathit{dencode}(r,F),F) = r.$

PROOF: Let $ p = \mathit{prec}(F)$, $ q = \mathit{expw}(F)$, $ B = \mathit{bias}(F)$, and $ x = dencode(r,F)$. By Lemma 2.4.1, $ x$ is a $ (p+q)$-bit vector and by Lemma 2.4.7,

$\displaystyle \mathit{sgnf}(x,F) = x[p+q-1] = \left\{\begin{array}{ll}
0 & \mbox{if $r > 0$}\\
1 & \mbox{if $r < 0$,}
\end{array}\right.$

$\displaystyle \mathit{expf}(x,F) = x[p+q-2:p-1] = 0,$

and

$\displaystyle \mathit{sigf}(x,p) = x[p-2:0] = (2^{p-2+\mathit{expo}(r)+\mathit{bias}(F)}\mathit{sig}(r))[p-2:0].$

Since $ r$ is $ (p-2+2^{q-1}+\mathit{expo}(r))$-exact,

$\displaystyle 2^{p-2+\mathit{expo}(r)+B}\mathit{sig}(r) = 2^{(p-2+2^{q-1}+\mathit{expo}(r))-1}\mathit{sig}(r) = \in \mathbb{Z}$

and since $ \mathit{expo}(r) + B \leq 0$,

$\displaystyle 2^{p-2+\mathit{expo}(r)+B}\mathit{sig}(r) < 2^{p-2}\cdot 2 = 2^{p-1}$

by Lemma 4.1.8, which implies

$\displaystyle (2^{p-2+\mathit{expo}(r)+B}sig(r))[p-2:0] = 2^{p-2+\mathit{expo}(r)+B}\mathit{sig}(r).$

Finally, according to Definition 5.3.3,
$\displaystyle \mathit{ddecode}(x,F)$ $\displaystyle =$ $\displaystyle (-1)^{\mathit{sgnf}(x,F)}\mathit{sigf}(x,p)2^{2-p-B}$  
  $\displaystyle =$ $\displaystyle \mathit{sgn}(r)2^{p-2+\mathit{expo}(r)+B}\mathit{sig}(r)2^{2\mbox{-}p\mbox{-}B}$  
  $\displaystyle =$ $\displaystyle \mathit{sgn}(r)\mathit{sig}(r)2^{\mathit{expo}(r)}$  
  $\displaystyle =$ $\displaystyle r.$  

The smallest positive denormal is computed by the following function:

Definition 5.3.7   (spd) For any format $ F$, $ \mathit{spd}(F) = 2^{2-bias(F)-\mathit{prec}(F)} = 2^{3-2^{\mathit{expw}(F)}-\mathit{prec}(F)}$.

  (positive-spd, drepp-spd, spd-smallest)
For any format $ F$,

(a) $ \mathit{spd}(F) > 0$;

(b) $ \mathit{spd}(F)$ is representable as a denormal in $ F$;

(c) If $ r$ is representable as a denormal in $ F$, then $ \vert r\vert \geq \mathit{spd}(F)$.

PROOF: Let $ p = \mathit{prec}(F)$, $ q = \mathit{expw}(F)$, and $ B = \mathit{bias}(F)$. It is clear that $ \mathit{spd}(F)$ is positive. To show that $ \mathit{spd}(F)$ is $ (p-2+2^{q-1}+\mathit{expo}(\mathit{spd}(F)))$-exact, we need only observe that

$\displaystyle p-2+2^{q-1}+\mathit{expo}(\mathit{spd}(F)) = p-2+2^{q-1}+2-(2^{q-1}-1)-p = 1.$

Finally, since

$\displaystyle \mathit{expo}(\mathit{spd}(F))+B = 2-p < 0,$

$ drepp(\mathit{spd}(F)$ holds and moreover, $ \mathit{spd}(F)$ is the smallest positive $ r$ that satisfies $ 2-p \leq \mathit{expo}(r)+B$


Every number with a denormal representation is a multiple of the smallest positive denormal.

  (spd-mult) If $ r \in \mathbb{Q}$ and let $ F$ be a format, then $ r$ is representable as a denormal in $ F$ iff $ r = m \cdot \mathit{spd}(F)$ for some $ m \in \mathbb{N}$, $ 1 \leq m < 2^{\mathit{prec}(F)-1}$.

PROOF: Let $ p = \mathit{prec}(F)$ and $ B = \mathit{bias}(F)$. For $ 1 \leq m \leq p-1$, let $ a_m = m \cdot \mathit{spd}(F)$. Then $ a_1 = \mathit{spd}(F)$ and

$\displaystyle a_{2^{p-1}} = 2^{p-1}\mathit{spd}(F) = 2^{p-1}2^{2-B-p} = 2^{1-B} = \mathit{spn}(F).$

We shall show, by induction on $ m$, that $ a_m$ is representable as a denormal for $ 1 \leq m<2^{n-1}$. First note that for all such $ m$,
$\displaystyle \mathit{fp}^+(a_m,p+\mathit{expo}(a_m)-\mathit{expo}(\mathit{spn}(F)))$ $\displaystyle =$ $\displaystyle a_m + 2^{\mathit{expo}(a_m)+1-(p+\mathit{expo}(a_m)-\mathit{expo}(\mathit{spn}(F)))}$  
  $\displaystyle =$ $\displaystyle a_m + 2^{\mathit{expo}(\mathit{spn}(F))-(p-1)}$  
  $\displaystyle =$ $\displaystyle a_m + \mathit{spd}(F)$  
  $\displaystyle =$ $\displaystyle a_{m+1}.$  

Suppose that $ a_{m-1}$ is representable as a denormal for some $ m$, $ 1 < m < 2^{p-1}$. Then $ a_{m-1}$ is $ (p+\mathit{expo}(a_{m-1})-\mathit{expo}(\mathit{spn}(F)))$-exact, and by Lemma 4.2.16, so is $ a_m$. But since $ \mathit{expo}(a_m) \geq \mathit{expo}(a_{m-1}$, it follows from Lemma 4.2.5 that $ a_m$ is also $ (p+\mathit{expo}(a_m)-\mathit{expo}(\mathit{spn}(F)))$-exact. Since

$\displaystyle a_m < a_{2^{p-1}} = \mathit{spn}(F) = 2^{1-B},$

$ \mathit{expo}(a_m) < 1-B$, i.e., $ \mathit{expo}(a_m) +B \leq 0$, and hence, $ a_m$ is representable as a denormal.

Now suppose that $ z$ is representable as a denormal. Let $ m = \lfloor z/a_1 \rfloor$. Clearly, $ 1 \leq m < 2^{p-1}$, and $ a_m \leq z < a_{m+1}$. It follows from Lemma 4.2.17 that $ \mathit{expo}(z) = \mathit{expo}(a_m)$, and consequently, $ z$ is $ (p+\mathit{expo}(a_m)-\mathit{expo}(\mathit{spn}(F)))$-exact. Thus, by Lemma 4.2.16, $ z = a_m$


David Russinoff 2017-08-01