# 4.1  Floating-Point Decomposition

Floating-point arithmetic is based on the observation that every nonzero real number admits a representation of the form

where is an integer, called the exponent of , and is a number in the range , called the significand of . These components are defined as follows.

Definition 4.1.1   (sgn, expo, sig) Let . If , then

is the unique integer that satisfies

and

If , then .

The decomposition property is immediate.

Lemma 4.1.1   (fp-rep, fp-abs) For all , .

Lemma 4.1.2   (expo-minus, sig-minus) For all ,

(a) ;

(b) .

The definition of may be restated as follows:

(expo<=, expo>=) For all and ,

(a) If , then ;

(b) If , then .

PROOF:

(a) If , then , which implies

(b) If , then , which implies

Corollary 4.1.4   (expo-2**n) For all , .

PROOF: since , the lemma follows from Lemma 4.1.3(b).

The width of a bit vector is determined by its exponent.

(bvecp-expo) For all , is a bit vector of width .

PROOF: This is just a restatement of the second inequality of Definition 4.1.1

PROOF: This is an instance of Lemma 1.2.16

:

We have the following bounds on .

PROOF: Definition 4.1.1 yields

Corollary 4.1.9   (expo-sig) For all , .

PROOF: Since , , and hence

Corollary 4.1.11   (sig-sig) For all , .

(fp-rep-unique) Let . If , where , , and , then and .

PROOF: Since , , where . It follows from Definition 4.1.1 that , and therefore

Changing the sign of a number does not affect its exponent or significand.

Lemma 4.1.13   (sgn-minus, expo-minus, sig-minus) For all ,

 (a) (b) (c) .

A shift does not affect the sign or significand.

(sgn-shift, expo-shift, sig-shift) If , , and , then

 (a) (b) (c) .

PROOF:

(a) .

(b) .

(c)

We have the following formulas for the components of a product.

(sgn-prod, expo-prod, sig-prod) Let and . If , then

 (a) (b) (c)

PROOF:

(a) .

(b) Since and , we have

If , then

and by Definition 4.1.1, .

On the other hand, if , then similarly,

and Definition 4.1.1 yields .

(c) If , then

Otherwise,

David Russinoff 2017-08-01