It is often possible to produce the rounded result of a computation without explicitly
generating every bit of the precise result. For example, a single-precision multiplication
returns a 24-bit approximation to a 48-bit product, and one would like to avoid computing
all 48 of those bits. Naturally, the information required depends on the rounding mode.
For trunc, obviously, the most significant 24 bits of the result are sufficient;
for
, 25 are needed. For away or near rounding, no result bits may
be ignored, but we shall show that for any of the modes of interest, a correctly rounded
-bit result may always be recovered from an
-bit truncation together with an
appended sticky bit, which simply indicates whether any accuracy was lost in the
truncation. Although the operation of computing this
-bit intermediate result is not
conventionally viewed as a rounding mode itself, it shares some important properties with
the modes discussed in the preceding sections and therefore, we find this view to be useful.
The following function computes an
-bit “sticky-rounded” result.
PROOF: This is an obvious consequence of Lemma 5.1.2.
PROOF: If
is
-exact, then so is
, and
Otherwise, by Lemmas 5.1.3 and 4.1.11,
PROOF: If
is
-exact, then so is
, and
Otherwise, by Lemmas 5.1.4 and 4.1.12,
PROOF: By Lemmas 5.4.2 and 5.4.1, we may assume
that
. If
is
-exact, the claim is trivial. Suppose
is not
-exact. By Lemma 5.1.6,
.
Since
and
are both
-exact,
Lemma 4.2.15 implies
| fp |
|||
PROOF: If
is
-exact, then
. Suppose
is not
-exact.
By Lemma 5.1.9,
is
-exact, i.e.,
Thus, by Lemma 5.4.4,
PROOF: We may assume that
is not
-exact, and hence Lemma 5.1.14
yuelds
Thus,
PROOF: Using Lemmas 5.4.2 and 5.4.1, we may
assume that
. Suppose first that
is
-exact. If
is also
-exact, then the claim is trivial, and if not, then by Lemmas 5.1.11
and 5.1.13,
| fp |
|||
The following property, which is not held by any of the other modes that we have considered, is the basis of this mode's utility.
PROOF: According to Lemmas 5.4.2 and 5.4.1, we may
assume that
. But clearly we may also assume that
is not
-exact,
and hence, by Lemma 4.2.5,
is not
-exact. On the other hand,
is
-exact, and since
| fp |
For directed rounding, an
-bit rounded result may be derived from an
-bit
sticky rounding.
PROOF: We may assume that
and
is not
-exact; the other cases follow trivially.
First, note that by Lemmas 5.4.5 and 5.4.8,
is
-exact but not
-exact,
and therefore, according to Lemmas 5.1.14 and 5.4.4,
The corresponding result for away may be similarly derived.
For unbiased rounding, one extra bit is required.
PROOF: The second equation follows easily from Lemmas 5.3.35 and 5.4.9:
The following property is essential for computing a rounded sum or difference.
PROOF: Since
is
-exact,
Thus,
Thus, we may assume that
Now if