The objective of floating-point rounding is an approximation of an arbitrary real number by one that is representable with respect to a given floating-point format. We define a rounding mode to be a mapping such that for all , , and , the following axioms are satisfied:
Similarly, if , then . Another consequence is that the approximation given by is optimal in the sense that there can exist no -exact number in the open interval between and . For example, if is -exact and , then .
In the first two sections of this chapter, we examine the two basic rounding modes RTZ (“round toward 0”) and RAZ ('round away from 0”), characterized by the inequalities
and
respectively. It is clear that for any rounding mode and arguments and , either or . It is natural, therefore, to define other rounding modes in terms of these two. In Sections 6.3, 6.4, and 6.5, we discuss the modes that are prescribed by the IEEE standard as well as others that are commonly used in implementations of floating-point operations.
Considerations other than -exactness are involved in the rounding of results that lie outside of the normal range of a format. In the case of overflow, which occurs when the result of a computation exceeds the representable range, the standard prescribes rounding either to the maximum representable number or to infinity. The rules that govern this choice are quite arbitrary from a mathematical perspective and will not be discussed here. The more interesting case of underflow, involving a non-zero result that lies below the normal range, is the subject of Section 6.6.