The objective of floating-point rounding is an approximation of an
arbitrary rational number by one that is representable with respect to
a given floating-point format. In the usual case, when the number to
be approximated lies within the exponent range of the target format,
this amounts to an
-exact approximation, where
is the number of
bits of precision provided by the format's significand field.
We define a rounding mode to be a mapping
such that for all
,
, and
, the
following axioms are satisfied:
Similarly, if
In the first two sections of this chapter, we examine the two basic rounding modes trunc and away, characterized by the inequalities
and
respectively. It is clear that for any given arguments
Other considerations are involved in the rounding of results that lie outside of the normal range of a format. In the case of overflow, which occurs when the result of a computation exceeds the representable range, the standard prescribes rounding either to the maximum representable number or to infinity. The rules that govern this choice are quite arbitrary from a mathematical perspective and will not be discussed here. The more interesting case of underflow, involving a non-zero result that lies below the normal range, is the subject of Section 5.6.