As we shall see in this section, a partitioning of our multiplier into slices of four bits instead of three leads to a decomposition

The choice between radix-8 and radix-4 multiplication is generally decided by the timing details of a hardware design. In a typical implementation, the partial products are computed in one clock and the compression tree is executed in the next. If there is sufficient time during the first clock to perform the addition required for radix-8 encoding (which is more likely to be the case, for example, for a low-precision operation), then this scheme is feasible. Since most of the silicon area allocated to a multiplier is associated with the compression tree, the resulting reduction in the number of partial products may represent a significant gain in efficiency.

For the purpose of this analysis, which is otherwise quite similar to that of Section 12.1, we shall assume that and are bit vectors of widths and , respectively. The multiplier is now partitioned into 3-bit slices, , , and encoded as follows.

The proof of the following identity is essentially the same as that of Lemma 12.1.1.

PROOF: We shall prove, by induction, that for ,

+- | + | ||

+- | |||

++- | |||

++- | |||

+ | |||

which completes the induction. In particular, substituting for , we have

The partial products are computed with an 9-1 multiplexer.

Once again, our theorem is formulated as generally as possible, although in this case, we shall use it only once, in order to establish Corollary 12.4.2.

PROOF: Let . After a trivial reformulation of Definition 12.4.2, we have

Next, we observe that the final term of the sum may be rewritten as

PROOF: Since ,

David Russinoff 2017-08-01