Next, we examine the mode that the IEEE standard designates as “round to nearest”, which may round in either direction, selecting the representable number that is closest to its argument. This mode is also sometimes called “round to nearest even” because of the manner in which it resolves the ambiguous case of a midpoint, i.e, a number that is equidistant from two successive representable numbers.
Example: Letand
. Then
and
indicating a “tie”, i.e., thatis equidistant from two successive 5-exact numbers. Since
is even, the tie is broken in favor of the lesser of the two:
Like all rounding modes, the value of near is always that of either trunc or away.
We list several properties of near that may be derived from Lemma 5.3.1 and the corresponding properties of trunc and away.
|
(a) If |
|
(b) If |
PROOF: This follows from Lemmas 5.1.5 and 5.2.5,
which together ensure that
.
PROOF: It is clear from Definition 5.3.1 that the choice between
and
depends only on
. Thus, for example,
if
, then since
,
as well, and by Lemma 5.1.4,
PROOF: This may be derived from Lemmas 5.1.3 and 5.2.3 by following the same reasoning as used in the proof of Lemma 5.3.8.
In the computation of
, the choice between
and
is
governed by their relative distances from
.
PROOF: We may assume that
, for otherwise
Let
It follows that no
-exact number can be closer to
than is
.
PROOF: Assume
. We shall only consider the
case
, as the case
is handled similarly.
First suppose
. Since
, we must
have
and hence
by Lemma 5.1.12. But since
by Lemma 5.3.10, we also have
, and hence
by
Lemma 5.2.14.
In the remaining case,
. Now
and by Lemma 5.2.14,
. But in this case, Lemma 5.3.10 implies
, and
hence
by
Lemma 5.1.12.
Consequently, the maximum near rounding error is half the distance between successive representable numbers.
PROOF: By Lemma 5.3.9, we may assume
. Let
If the statement fails, then since
and hence
Another consequence of Lemma 5.3.11 is the monotonicity of near rounding.
PROOF: Suppose
and
. Then
Lemma 5.3.11 implies
, otherwise
and
.
Similarly,
, and thus
.
Applying Lemma 5.3.11
again, we have
, and hence
.
Similarly,
, and hence
.
Consequently,
, contradicting
.
A midpoint with respect to a precision
may be characterized
as a number that is
-exact but not
-exact. By virtue of the
following result, the term rounding boundary is sometimes used
as well.
PROOF: By Lemma 5.3.13,
. Let
,
and
Since
Moreover,
PROOF: By Lemma 5.3.13, we may assume
, so that
. By Lemmas 4.2.14 and 4.2.15,
and
are successive
-exact numbers, and hence
by Lemma 5.3.14.
We also have the following partial converse of Lemma 5.3.14.
PROOF: Let
. Since
is not
-exact,
.
Let
By Lemma 4.2.19,
and it follows that
By hypothesis,
but
,
and therefore,
is odd, i.e.,
, where
.
Now since
By Lemma 4.2.14,
If
, then by Lemma 4.2.15,
,
which implies
, contradicting Lemma 5.3.11.
Similarly, if
, then
and
.
Thus,
The meaning of “round to nearest even” is that in the case of
a midpoint
,
is defined to be the “rounder” of the
two nearest
-exact numbers, i.e., the one that is
-exact.
PROOF: Again we may assume
. Let
and
. Since
,
. But
,
hence
and
.
If
is even, then
and by Lemma 5.1.6,
If
is odd, then
We may assume
One consequence of this result is that a midpoint is sometimes rounded up and sometimes down, and therefore, over the course of a long series of computations and approximations, rounding error is less likely to accumulate to a significant degree in one particular direction than it would be if the the choice were made more consistently. The cost of this feature is a more complicated definition, requiring a more expensive implementation.
When the goal of a computation is provable accuracy rather than IEEE-compliance, a simpler version of “round to nearest” may be appropriate. The critical feature of this mode then becomes the relative error bound guaranteed by Lemma 5.3.12, since this is likely to be the basis for any formal error analysis. The following definition presents an alternative to near that respects the same error bound (see Lemma 5.3.29) but admits a simpler implementation and is therefore commonly used for internal floating-point calculations.
Example: Letand
. Since
Naturally, many of the properties of near are held by
as well. We list some of them here, omitting the proofs, which are
essentially the same as those given above for near.
|
(a) If |
|
(b) If |
The difference between near and
is that the latter always rounds a
midpoint away from 0.
PROOF: By Lemmas 5.3.9 and 5.2.3, we may assume
. Let
and
. Since
,
. But
, hence
and
. Therefore, according to Definition 5.3.2,
. The second inequality is a restatement of
Lemma 5.2.17.
There is one case of a midpoint for which near and
are guaranteed to produce the same result: if the greater of the two
representable numbers that are equidistant from
is a power of 2,
i.e.,
, then both modes round to
this number.
PROOF: Suppose
. Then Lemma 5.3.6
implies
and by Lemmas 4.2.15 and 5.3.12,
Now suppose
. Using Lemmas 5.3.26
and 5.3.29, we may show in the same way as above that
. Once again,
is
-exact but not
-exact,
and hence, by Lemma 5.3.31,
a contradiction.
Finally, suppose
.
Since
is
-exact,
by
Lemma 5.1.12. But then by Lemma 4.2.15,
The additive property shared by trunc and away that is described in Lemmas 5.1.16
and 5.2.19, respectively, does not hold for near in precisely the same form.
For example, let
,
, and
. Then
and
Although
while
However, this property is shared by
PROOF:
(a) Applying Lemma 5.1.16 and 5.2.19, we need only show that either
or
Let
for some
by Lemma 4.2.1.
(b) Here we must show that either
or
According to Definition 5.3.2, this is true whenever
which is equivalent to the hypothesis that
The rounding constant (see the discussion preceding Lemma 5.2.22) for
both of the modes
and
is a simple power of 2, equal to half the
value of the leasr signifivant bit of the rounded result. That is, if the rounding
precision is
and the unrounded result is
then
The following lemma exposes the extra expense of implementing
as compared to
. While the correctly rounded result is given
by
in most cases, special attention is required
for the computation of
in the case where it differs from
, i.e., when
is
-exact and
. In this
case, the least significant bit must be forced to 0. This is
accomplished by truncating
to
bits rather than
.
PROOF: If
, then by
Lemma 5.3.32,
But then, by Lemmas 5.1.15, 4.2.6, and 5.1.11,
Case 1:
By Lemma 5.1.12,
. But since
Lemma 4.2.15 yields
Case 2:
We have
, for otherwise Lemma 5.3.12
<would imply
and since
Since
The same argument applies to
, but with
Lemma 5.3.29 invoked in place of Lemma 5.3.12.
Case 3:
is
-exact but not
-exact
The identity for
is given by Lemma 5.3.31.
To prove the claim for
, we first consider the case
.
Since
is
-exact,
, hence
,
and by Lemma 5.3.17,
Now suppose
. Then
implies
. But since
As a consequence of the preceding lemma,
may be depends only on
the most significant
bits of
.
PROOF: By Lemmas 5.3.21 and 5.1.3, we may assume that
.
Furthermore, it will suffice to consider the case
, because then for
,
but after applying Lemma 5.1.5, we need only show
Let
By Lemmas 5.1.9 and 4.2.14,
| fp |