6.4 Odd Rounding

It is often possible to produce the rounded result of a computation without explicitly
generating every bit of the precise result. For example, a single-precision multiplication
returns a 24-bit approximation to a 48-bit product, and one would like to avoid computing
all 48 of those bits. Naturally, the information required depends on the rounding mode.
For
, the most significant 24 bits of the result are sufficient;
for , 25 are needed. For *RAZ* or *RNE* rounding, no result bits may
be ignored, but we shall show that for any of the modes of interest, a correctly rounded
-bit result may always be recovered from an -bit truncation together with an
appended *sticky* bit, which simply indicates whether any accuracy was lost in the
truncation. Although the operation of computing this -bit intermediate result is not
conventionally viewed as a rounding mode itself, it satisfies the axioms at the beginning of
this chapter and shares some other important properties with the modes discussed in the
preceding sections, and therefore, we find this view to be useful.

The following function, which might be called “round to odd”, computes an -bit rounded result.

PROOF: This is an obvious consequence of Lemma 6.1.2.

PROOF: If is -exact, then so is , and

PROOF: If is -exact, then so is , and

PROOF: By Lemmas 6.4.2 and 6.4.1, we may assume
that . If is -exact, the claim is trivial. Suppose is not
-exact. By Lemma 6.1.6,
.
Since
and
are both -exact,
Lemma 4.2.16 implies

and it follows that .

PROOF: If is -exact, then . Suppose is not -exact. By Lemma 6.1.9, is -exact, i.e.,

PROOF: We may assume that is not -exact, and hence Lemma 6.1.14 yuelds

PROOF: Using Lemmas 6.4.2 and 6.4.1, we may
assume that . Suppose first that is -exact. If is also
-exact, then the claim is trivial, and if not, then by Lemmas 6.1.10
and 6.1.13,

Similarly, if neither nor is -exact, then Lemmas 6.1.13 and 4.1.5 imply

In the remaining case, is -exact and is not. Now by Lemmas 6.1.9 and 4.2.5, and are both -exact, and hence, by Lemmas 4.2.16 and 6.1.6,

The following property, which is not shared by any of the other modes that we have considered, is the basis of this mode's utility.

PROOF: According to Lemmas 6.4.2 and 6.4.1, we may
assume that . But clearly we may also assume that is not -exact,
and hence, by Lemma 4.2.5, is not -exact. On the other hand,
is -exact, and since

Lemma 4.2.16 implies that is not -exact. Applying Lemma 4.2.5 again, we conclude that is not -exact.

For directed rounding, an -bit rounded result may be derived from an -bit odd rounding.

PROOF: We may assume that and is not -exact; the other cases follow trivially.
First, note that by Lemmas 6.4.5 and 6.4.8, is -exact but not -exact,
and therefore, according to Lemmas 6.1.14 and 6.4.4,

Thus, by Lemma 6.1.15, for any ,

For unbiased rounding, one extra bit is required.

PROOF: The second equation follows easily from Lemmas 6.3.43 and 6.4.9:

To prove the first equation using Lemma 6.4.9, it will suffice to show that if and , then . Without loss of generality, we may assume . Suppose . Then by Lemma 6.3.18, for some -exact , . But this implies , for otherwise . Similarly, , for otherwise . Thus, , a contradiction.

The following important property is essential for floating-point addition, as it allows a rounded sum or difference of unaligned numbers to be derived without computing the full result explicitly.

PROOF: Since is -exact,

If is -exact, then

On the other hand, if , then

David Russinoff 2017-08-01