Preface

It is not the purpose of this book to expound the principles of computer arithmetic algorithms, nor does it presume to offer instruction in the art of arithmetic circuit design. A variety of publications spanning these subjects are readily available, including general texts as well as more specialized treatments, covering all areas of functionality and aspects of implementation. There is one relevant issue, however, that remains to be adequately addressed: the problem of eliminating human error from arithmetic hardware designs and establishing their ultimate correctness.

As in all areas of computer architecture, the designer of arithmetic circuitry is preoccupied with efficiency. His objective is the rapid development of logic that maximizes execution speed, guided by established practices and intuition. Subtle conceptual errors and miscalculations are accepted as inevitable, with the expectation that they will be eliminated through extensive testing. Naturally, as implementations grow in complexity through the use of increasingly sophisticated techniques, errors become more difficult to detect. Experience has shown that testing alone is insufficient to provide a satisfactory level of confidence in the functional correctness of a state-of-the-art floating-point unit.

In order to ensure the correctness of complex implementations of arithmetic operations while reducing reliance on testing, a more scrupulous approach to the design and analysis of arithmetic circuits is imperative. Loose concepts, intuition, and arguments by example must be replaced by formal development, explicit theorems, and rigorous proofs.

One problem to be addressed is the semantic gap between abstract behavioral specifications and concrete hardware models. While the design of a circuit is modeled for most purposes at the bit level, its prescribed behavior is naturally expressed in terms of high-level arithmetic concepts and algorithms. It is often easier to prove the correctness of an algorithm than to demonstrate that is has been implemented accurately.

As a simple illustration, consider the addition of two numbers, $ x$ and $ y$, as diagrammed below in binary notation. Let us suppose that the sum $ z = x+y$ is to be truncated at the dotted line, and that the precision of $ x$ is such that its least significant non-zero bit lies to the left of the line.


\begin{picture}(120,55)(-25,20)\setlength{\unitlength}{2mm}
\par
\put(10,10){...
...zzzzzz}$\:${\tt zz}$\;\cdots$}}
\dashline[30]{.5}(27.4,4)(27.4,12)
\end{picture}

Rather than computing the exact value of $ z$ and then extracting the truncated result, an implementation may choose to perform the truncation on $ y$ instead (at the same dotted line) before adding it to $ x$. The designer, in order to convince himself of the equivalence of these two approaches, might resort to a diagram like the one above.

The next logical step would be to formulate the underlying principle in precise terms (see Lemma 6.1.16 in Section 6.1), explicitly identifying the necessary conditions for equivalence, and establishing its correctness by rigorous mathematical proof. The result could then be integrated into an evolving theory of computer arithmetic, and thus become available for reuse and subject to extension and generalization (e.g., Lemma 6.5.13 in Section 6.5) as appropriate for new applications.

Unfortunately, such results are nowhere to be found in the existing literature, which is more concerned with advanced techniques and optimizations than with their theoretical underpinnings. What prevents the organization of the essential properties of the basic data objects and operations of this domain into a theory suitable for systematic application? The foregoing example, however trivial, serves to illustrate one of the main obstacles to this objective: the modeling of data at different levels of abstraction. Numbers are naturally represented as bit vectors, which, for some purposes, may be viewed simply as strings of Boolean values. Truncation, for example, is conveniently described as the extraction of an initial segment of bits. For the purpose of analyzing arithmetic operations, on the other hand, the same data must be interpreted as abstract numbers. Although the correspondence between a number and its binary representation is straightforward, a rigorous proof derived from this correspondence requires more effort than an appeal to intuition based on a simple diagram. Consequently, the essential properties of bit vector arithmetic have never been formalized and compiled into a well-founded comprehensive theory. In the absence of such a theory, the designer must rediscover the basics as needed, relying on examples and intuition rather than theorems and proofs.

Nearly two centuries ago, the mathematician Niels Abel complained about a similar state of affairs in another area of mathematical endeavor [KLI72]:

It lacks so completely all plan and system that it is peculiar that so many men could have studied it. The worst is, it has never been treated stringently. There are very few theorems in [this subject] which have been demonstrated in a logically tenable manner. Everywhere one finds this miserable way of concluding from the specific to the general ....

The subject in question was the calculus, a major mathematical development with a profound impact on the sciences, but lacking a solid logical foundation. Various attempts to base it on geometry or on intuition derived from other areas of mathematics proved inadequate as the discipline grew in complexity. The result was a climate of uncertainty, controversy, and stagnation. Abel and others resolved to restore order by rebuilding the theory of calculus solely on the basis of arithmetical concepts, thereby laying the groundwork for modern mathematical analysis.

While the contemporary hardware engineer may not be susceptible to the same philosophical qualms that motivated nineteenth century mathematicians, he is certainly concerned with the “bugs” that inevitably attend undisciplined reasoning. Even if the analogy overreaches the present problem, it charts a course for its solution and sets the direction of this investigation.

Our objective is a unified mathematical theory, derived from the first principles of arithmetic, encompassing two distinct domains of discourse. The first of these, which is the subject of Part I, is the realm of register-transfer logic (RTL), comprising the primitive data types and operations on which microprocessor designs are built, i.e., bit vectors and logical operations. A critical first step is the careful formulation of these primitives in a manner consistent with our goals, which sometimes requires resistance to intuition. Thus, notwithstanding its name, we define a bit vector of width $ k$ to be a natural number less than $ 2^k$ rather than a sequence of $ k$ Boolean values. This decision will seem unnatural to those who are accustomed to dealing with these obects in the context of hardware description languages, but it is a critical step in the master plan of an arithmetic-based theory. As a consequence, which may also be disturbing to some, the logical operations are defined recursively rather than bit-wise.

Part II addresses the second domain of interest, the more abstract world of rational arithmetic, focusing on floating-point representations of rational numbers as bit vectors. The benefits of a rigorous approach are most evident in the chapter on floating-point rounding, which includes a variety of results that would otherwise be difficult to state or prove, especially those that relate abstract arithmetic operations on rationals to lower-level properties of bit vectors. All of the architectural rounding modes prescribed by IEEE Standard 754[IEEE08] are thoroughly analyzed. Moreover, since hardware implementation is of central interest, further attention is given to several other modes that are commonly used for internal computations but are not normally covered in treatises on floating-point arithmetic.

Although IEEE 754 is routinely cited as a specification of correctness of floating-point implementations, it contains a number of ambiguities and leaves many aspects of behavior unspecified, as reflected in the divergent behavior exhibited by various architectures, especially in the treatment of exceptional conditions. In particular, the two primary floating-point instruction sets of the x86 architecture, known as SSE and x87, employ distinct exception-handling procedures that have been implicitly established across the microprocessor industry but are not adequately described in any programming manual. Another important floating-point instruction set, with its own variations of exception handling, is provided by the ARM architecture. In Part III, we present comprehensive behavioral specifications for the elementary arithmetic instructions of these three sets, based on the theory developed in the preceding chapters, which may serve as a basis for formal verification of an implementation.

In Part IV, the theory is applied to the analysis of several well known algorithms and techniques used in the implementation of elementary arithmetic operations. The purpose here is not to present a comprehensive survey of the field, but merely to demonstrate a methodology for proving the correctness of implementations, providing guidance to those who are interested in applying it further. There is a chapter on addition, including a discussion of leading one prediction, and another on multiplication, describing several versions of Booth encoding. Two division algorithms are analyzed: a multiplicative algorithm based on a fused multiply-add operation and a subtractive SRT algorithm. The latter is also extended to square root extraction.

This book is an outgrowth of the author's work on formal verification of the arithmetic components of microprocessor designs, conducted at Advanced Micro Devices, Inc. between 1995 and 2011, and continuing at Intel Corp. from 2012 to the present. In order to maximize confidence in the validity of our results and hence the correctness of these designs, the ACL2 theorem prover[ACL205] has played a central role in this effort. In Part V, we describe a methodology, developed at Intel, for generating representations of RTL models in the logical language of ACL2 and proofs of correctness that may be mechanically checked by the ACL2 prover. Several examples of such models and proofs are included [not yet].

During the course of this project, many thousands of definitions, lemmas, and theorems--all referred to as events in the ACL2 world--were formulated and admitted to the logic by the ACL2 prover. Those events pertaining to the general principles of RTL design and computer arithmetic, i.e., not tied to a specific implementation, were collected in an evolving library for reuse in support of subsequent proofs. The current version of the library comprises approximately six hundred events. It has not only proved to be an effective tool in reducing the effort required in the verification of successive AMD processors, but has found use outside of AMD as well and is now included as a subdirectory of the standard ACL2 release. The sequence of library releases reside in the directories books/rtl/rel*/lib/.

This book is an exposition of the theory underlying the ACL2 RTL library. Here is the sense in which the word formal is used in the title: all of the definitions and results presented herein are transcriptions of mechanically checked ACL2 events. The accompanying proofs are derived from notes that were used to guide the prover interactively through this process. Each chapter corresponds to a “book” of the library.

The role of ACL2 in the development of the theory is evidenced in various ways, including its emphasis on recursion and induction, but mainly in the level of rigor that is inherent in the logic and enforced by the prover. Any vague arguments, miscalculations, or missing hypotheses may be assumed to have been corrected through the mechanization process. With respect to style of presentation, the result will appeal to those readers whose thought processes most closely resemble the workings of a mechanical theorem prover; others may find it pedantic.

The claim regarding the correspondence between the results presented here and their ACL2 formalizations requires a caveat. Since the ACL2 logic is limited to rational arithmetic,0.1 properties that hold generally for real numbers can only be stated formally as properties of rationals. In the chapters relevant to the theory of rounding--1, 4, and 6-all results, with the single exception of Lemma 4.2.9, are stated and proved more generally, with real variables appearing in place of the rational variables found in the corresponding ACL2 code.

This limitation of the logic is especially relevant to the square root operator, which cannot be defined as an ACL2 function. In our treatment of the square root in Chapter 14, every result, along with its proof, admits a straightforward formulation in the realm of rational arithmetic. In particular, a relation of the form $ a < \sqrt{b}$ is formalized as $ a^2 < b$. The more challenging problem of formalizing IEEE compliance (i.e., correctly rounded results) of a square root implementation is addressed in Chapter 7. Here we introduce an equivalent formulation of the IEEE requirement that is formalizable in ACL2. It is only in the informal discussion motivating the results of this chapter that the rounding of irrational numbers is relevant.

On the other hand, the ACL2 connection may be safely ignored by those uninterested in pursuing the formal aspect of the theory. Outside of Section 15.6, no familiarity with ACL2, LISP, or formal logic is required of (or even useful to) the reader. The exposition adheres to the most basic conventions of mathematical notation, supplemented only by several RTL constructs common to hardware description languages, all of which are explicitly defined upon first use. Nor is any uncommon knowledge of mathematics presupposed. The entire contents should be accessible to a competent high school student who has been exposed to the algebra of real numbers and the principle of mathematical induction, especially one with an assiduous capacity for detail. It should be noted, however, that repeated attempts to support this claim have been uniformly unsuccessful.

The book has been written with several purposes in mind. For the ACL2 programmer interested in applying or extending the associated RTL library, it may be read as a user's manual. The theory might also be used to guide other verification efforts, either without mechanical support or encoded in the formalism of another theorem prover. A more ambitious hope is that arithmetic circuit designers who appreciate the value of a rigorous approach to their art may use this as a guide to reasoning about their designs. It is in this direction that the potential impact of this methodology is greatest.

This work has benefited from a variety of contributions from colleagues and friends. J Moore and Matt Kaufmann, the authors of ACL2, were the first to demonstrate its utility as a floating-point verification tool. What little I know about arithmetic circuits was explained to me by Mike Achenbach, Kelvin Goveas, Tom Lynch, Stuart Oberman, Carl Lemonds, Steve Espy, Simon Rubanovitch, and Warren Ferguson. The Verilog-ACL2 translator, a critical component of the AMD project, was implemented by Kaufmann, Art Flatau, Rob Sumners, and Hanbing Liu. The ACL2 mechanization of some of the proofs was done by Kaufmann and Eric Smith. The MASC project (Part V) is a collaboration with John O'Leary. I am also grateful to Matthew Bottkol and Cooky Goldblatt for their suggestions for improvement of an earlier manuscript. Finally, I cannot deny myself the pleasure of thanking Lin Russinoff, who designed the cover graphics and is my continual source of inspiration.

D.M.R.
Block Island, Rhode Island
August, 2005

David Russinoff 2017-08-01