# 6.2  Rounding Away from Zero

The dual of truncation is defined similarly, using the ceiling instead of the floor.

Definition 6.2.1   (raz) For all and ,

Example: Let and . Then , , ,

and

Note that this value is the smallest 5-exact number not exceeded by .

The next three lemmas list simple properties of RAZ that it shares with .

Lemma 6.2.1   (sgn-raz) Let and . If , then

Lemma 6.2.2   (raz-minus) For all and ,

PROOF: By Lemma 4.1.14,

The negative-precision case is less than intuitive.

PROOF: By Lemma 4.1.8,

and hence, by Lemma 1.1.9,

We have the following bounds on .

PROOF: By Lemmas 1.1.9 and 4.1.1,

PROOF: By Definitions 6.2.1 and 1.1.1 and Lemma 4.1.1,

The second inequality follows from Definition 4.1.1

PROOF: By Lemmas 6.2.1 and 6.2.5,

The corollary now follows from Lemma 6.2.6

Unlike , RAZ is not guaranteed to preserve the exponent of its argument, but the only exception is the case in which a number is rounded up to a power of 2.

PROOF: By Lemma 4.1.8,

PROOF: The first inequality follows from Lemmas 6.2.5 and 4.1.5, and the second from Lemmas 6.2.8 and 4.1.3

Corollary 6.2.10   (expo-raz) For all and , if , then

The standard rounding mode properties may now be derived.

PROOF: By Corollary 6.2.10 and Lemma 4.2.6, we may assume that

Consequently, it suffices to observe that

(raz-diff-expo) Let and . If and x is not -exact, then

PROOF: According to Lemma 6.2.11, the hypothesis implies that , and hence Lemma 4.1.3 applies.

(raz-exactp-b) Let and . If and x is -exact, then

PROOF: By Definition 4.2.1 and Lemma 1.1.10,

and hence by Definition 6.2.1 and Lemma 4.1.1,

(raz-exactp-c) Let , , and , . If is -exact and , then .

PROOF: If , then

by Lemmas 6.2.1 and 6.2.5. Therefore, we may assume that . Suppose . Then by Lemmas 4.2.16 and 4.1.5,

(raz-squeeze) Let , and . Assume that is -exact, where and . Then .

PROOF: By Lemmas 6.2.5, 6.2.11, and 4.2.16, . By Lemmas 6.2.14 and 4.2.16,

(raz-up) Let , , , and , with and . If , then .

PROOF: We may assume that . Suppose . By Lemmas 6.2.13 and 4.2.19, , and by Lemmas 6.2.15 and 4.2.20,

PROOF: First suppose . By Lemma 6.2.5, . Since is -exact by Lemma 6.2.11, Lemma 6.2.14 implies

Now suppose . By Lemma 6.2.1, we may assume that . Thus, since , we have and by Lemma 6.2.2,

If is not -exact, then and are successive -exact numbers.

(rtz-raz) Let and . If , , and is not -exact, then

PROOF: Let . By Lemma 6.1.6, . Since , . Since and are both -exact and , Lemmas 4.2.15 and 4.2.16 imply that is -exact and . But by Lemma 6.1.7, , and therefore, by Lemma 6.2.14,

The next four results correspond to Lemmas 6.1.14, 6.1.15 6.1.16, and 6.1.17 of the preceding section.

(raz-midpoint) Let and . If is -exact but not -exact, then

PROOF: For the case , we have by Definition 4.2.1, and by Lemmas 4.1.1 and 6.2.4,

Thus, we may assume , and by Lemma 6.2.2, we may also assume . Let and . As noted in the proof of Lemma 6.1.14, and are both -exact and . Now by Lemma 6.2.14, , but if , then since is -exact, Lemma 4.2.16 would imply , contradicting . Therefore,

PROOF: We may assume . Consider first the case

In this case, is -exact, so that

By Lemma 6.2.8, we need only show that . But since , is -exact, and since , by Lemma 6.2.14.

Thus, we may assume . By Corollary 6.2.10,

and hence by Lemma 1.1.14,

(plus-raz) Let , , and . If , , and is -exact, then

PROOF: Let . Since is -exact,

Let . Then by Lemma 1.1.13,

(minus-rtz-raz) Let , , and . If , , and is -exact, then

PROOF: Let . Since is -exact,

Let . Then by Lemma 1.1.4,

The following result combines Lemmas 6.1.17 and 6.2.22.

(rtz-plus-minus) Let and such that , , and . Let ,

and

If , and is -exact, then

PROOF: By Lemmas 6.1.3 and 6.2.2, we may assume that . The case is handled by Lemma 6.1.16. For the case , Lemmas 6.1.17 and 6.2.22 cover the subcases and , respectively.

We turn now to the problem of bit-level implementation of rounding. Truncation, according to Lemma 6.1.18, is equivalent to a bit slice operation, which may be implemented as a logical operation using Corollary 6.1.21. Other rounding modes may be reduced to the case of truncation by a method known as constant injection. Let be -exact with , say

to be rounded to bits, where , according to a rounding mode . Our goal is to construct a rounding constant , depending on , , , and , such that

The appropriate constant for the case is

which consists of a string of 1's at the bit positions corresponding to the least significant bits of , as illustrated below.

As suggested by the diagram, the addition generates a carry into the position of unless , i.e., unless is -exact, and is rounded up accordingly. This observation is formalized by the following lemma.

(raz-imp) Let , , and . If is -exact, , and , then

PROOF: Let . Since and are both -exact and

by Lemma 4.2.16.

If is -exact, then , and hence . Thus, we may assume is not -exact. But then since and is -exact,

and hence

which implies

David Russinoff 2017-08-01