The IEEE standard prescribes four rounding modes: “round to nearest” (RNE), “round toward 0” ( ), “round toward ', and “round toward ”. The last two are formalized here by the functions RUP and RDN.
The essential properties of these modes are given by the following two results.
PROOF: If , then
PROOF: If , then
In this section, we collect a set of general results that pertain to all of the IEEE rounding modes, most of which are essentially restatements of lemmas that are proved in earlier sections. Since these results also hold for two of the other modes that we have discussed, RAZ and , we shall state them as generally as possible. For this purpose, we define a common rounding mode to be any of the six, i.e., , RAZ, RNE, , RUP, or RDN.
PROOF: This is an immediate consequence of Definitions 6.3.1, 6.3.2, 6.5.1, and 6.5.2.
Most of following results are consequences of Lemma 6.5.3 and the corresponding lemmas pertaining to and RAZ.
PROOF: See Lemmas 6.1.2 and 6.2.1.
Note that RUP and RDN do not share the oddness property held by the other modes. A generalization is given by the following lemma.
PROOF: Suppose, for example, that and . Then since ,
PROOF: See Lemmas 6.1.9 and 6.2.11.
PROOF: See Lemmas 6.1.10 and 6.2.13.
(a) If , then . |
(b) If , then . |
PROOF: See Lemmas 6.1.11 and 6.2.14.
PROOF: See Lemmas 6.1.8 and 6.2.7.
PROOF: See Lemmas 6.1.6 and 6.2.10.
PROOF: The modes , RAZ, RNE, and are covered by Lemmas 6.1.13, 6.2.17, 6.3.17, and 6.3.36. For the modes RUP and RDN, we may assume, using Lemma 6.5.4, that and are either both positive or both negative. It follows that either
PROOF: The modes , RAZ, RNE, and are covered by Lemmas 6.1.4, 6.2.3, 6.3.8, and 6.3.29. For the modes RUP and RDN, we may assume, using Lemma 6.5.5, that , in which case
PROOF: See Lemmas 6.1.16 and 6.2.21.
PROOF: This is an immediate consequence of Lemmas 6.1.5 and 6.2.5:
If is not -exact and is sufficiently close to , then and round to the same value under any common rounding mode:
(a) or
(b) ,
then .
PROOF: We show the proof of (a); (b) is similar.
By Lemma 6.5.11, we need only show that . We also note that by Lemmas 6.1.15 and 6.1.5,
by Lemma 6.1.9, is -exact. By Lemma 6.1.11 (with and substituted for and ) and Definition 6.5.2,
PROOF: According to Lemma 6.5.15, this holds for .
PROOF: The modes and RAZ are handled by Lemma 6.4.9; RNE and are covered by Lemma 6.4.10; RUP and RDN are reduced to and RAZ by Lemma 6.4.1.
PROOF: This is a consequence of Lemmas 6.2.16, 6.3.12, 6.3.33, and 6.1.5.
For the sake of simplicity, our main result pertaining to constant injection is formulated in the context of bit vector rounding.
PROOF: For the modes RAZ and RUP, the identity follows from Lemma 6.2.24, with , and Lemma 4.2.7. For RNE and RNA, it reduces to Lemmas 6.3.42 and 6.3.42. For and RDN, the lemma is trivial.
Another common implementation of rounding is provided by the following result. Suppose our objective is a correct -bit rounding (with respect to some common rounding mode) of a precise result , and that we have a bit-vector representation of . Then the desired result may be derived by rounding in the direction determined as follows:
(a) is -exact iff and .
(b) If any of the following conditions holds, then , and otherwise :
PROOF: It is clear that . Definition 6.1.1,
(a) According to Lemmas 6.1.9 and 6.1.10, it suffices to show that iff either or . By Lemma 6.1.5,
Suppose . By Lemmas 6.2.18, 6.1.9, and 6.2.11,
For the remaining cases, RNE, and RNA, we refer directly to Definitions 6.3.1 and 6.3.2. Let
For the case , it is clear from Definition 6.3.1 and (1) and (2) above that we need only show that is even iff . But this is a consequence of the above equation .
The final result of this section is a variation of Lemma 6.5.20 that allows us to compute the absolute value of a rounded result when the unrounded value is negative, given a signed integer (twos' complement) encoding of , i.e., , where .
(a) If , then .
(b) is -exact iff and .
(c) If any of the following conditions holds, then , and otherwise :
PROOF: Let . Then ,
To prove (a), note that if , then since , we must have , , and .
For the proof of (b) and (c), by Lemmas 2.2.22 and 6.1.18, we have
Suppose first that . Then
In the remaining case, . By Lemma 4.2.16, is not -exact, and (b) follows. The claim (c) will be derived from Lemma 6.5.20. Note that if is defined as in Lemma 6.5.5, then
If any of the following conditions holds, then , and otherwise :
We invoke Lemma 6.5.20 with and substituted for and , respectively. This yields the following:
If any of the following conditions holds, then , and otherwise :
Suppose . Then and by Lemma 1.1.7,
In the final case, and
. Thus,
and
. Since
David Russinoff 2017-08-01