The rounding of numbers that lie below the normal range of a floating-point format is conveniently defined by the following function. Note that its arguments include the format itself, both parameters of which are required to compute the precision of the result.
It is not true in general that , since a non-zero denormal may be rounded to 0. However, we do have the following analog of Lemma 6.5.5.
PROOF: This follows from Lemmas 6.5.5 and 4.1.13.
A denormal is always rounded to a representable number, which may be denormal, 0, or the smallest representable normal.
(c) is representable as a denormal in .
PROOF: We may assume , as the general result follows easily from this case.
Let . Suppose first that . Then by Lemmas 6.5.3, 6.1.1, and 6.2.4, either
PROOF: Let . By Definition 5.3.5, is -exact. Therefore, by Lemma 6.5.7,
PROOF: Suppose . Since , Lemma 6.6.2 guarantees that is representable as a denormal, which implies that is -exact.
On the other hand, if is -exact, then is representable by Definition 5.3.5 and by Lemma 6.6.3.
(a) If , then .
(b) If , then .
PROOF: By Definition 5.3.5, is -exact. The result follows from Lemma 6.5.8.
The defining characteristics of the directed rounding modes are inherited by denormal rounding.
PROOF: This is a consequence of Lemmas 6.1.5, 6.2.5, 6.5.2, and 6.5.1.
Denormal rounding error is bounded by the distance between successive representable numbers (see Lemma 5.3.5).
. By Lemma 6.5.9,
Naturally, unbiased denormal rounding returns the representable number that is closest to its argument.
PROOF: By Definition 5.3.5, is -exact. The result follows from Lemmas 6.3.14 and 6.3.35.
PROOF: See Lemma 6.5.17.
The next lemma, which pertains to the detection of floating-point underflow, warrants some motivation. Let be the precise numerical result of an arithmetic operation, to be rounded according to a mode and encoded in a format with precision . Most implementations first compute the value , using an internal format with a sufficient wide exponent field to accommodate this result. According to the x86 architectural definition, underflow occurs when . If this occurs and the underflow mask is set, then the value is computed and returned, and the underflow flag is set iff .
There are, however, implementations that compute directly in the event that , without computing . The requirement of correctly setting the underflow flag then presents a problem, since may lie below the normal range when does not. Thus, for such an implementation, if , then extra logic is required to determine whether . On the other hand, there is no such ambiguity requiring extra logic for the case , since the following lemma guarantees that as well.
PROOF: Let . By Lemma 6.5.10, , and hence
The next lemma pertains to the setting of the precision flag in the event of underflow and is relevant to the formal architectural specifications discussed in Part III: if is exact, then so is :
PROOF Let . By lemma6.6.2, is representable as a denormal in , which implies the is -exactp. Since , , and by Lemma 4.2.5, is -exact, and the claim follows from Lemma 6.5.7.
We have the following alternative formulation of drnd.
PROOF: Let . We first consider the case and apply Lemma 6.5.13, substituting for , for , and for . Thus,
Lemma 6.6.12 is used in the proof of the final lemma of this chapter, which pertains to the rounding of numbers smaller than the smallest representable denormal.
(a) If , then
(b) If , then
(c) If , then
PROOF: Let , , and
By Lemma 6.2.5,
First note that by Lemma 4.2.20,
By Lemmas 6.3.1 and 6.3.22, is either or , and hence is either 0 or , respectively.
As a consequence of Lemma 6.6.13 (a), for any given rounding mode, two sufficiently small numbers produce the same rounded result.
David Russinoff 2017-08-01