🤖 AI Summary
This work addresses the absence of an efficient, consistent, and parameterizable standard for low-bit-width floating-point representations and arithmetic in modern machine learning. The authors propose a family of configurable binary floating-point formats that flexibly support customizable bit-width, precision, signedness, and infinity handling, with a unified operational semantics grounded in the closed extended real number system. By introducing a kappa-approximation metric to characterize implementation error and integrating parameterized formats with stochastic rounding, saturation modes, and exception-free arithmetic, they construct a formally verified floating-point arithmetic framework. This design substantially enhances hardware throughput while preserving predictable and verifiable numerical behavior.
📝 Abstract
The IEEE P3109 draft standard defines a parameterized family of binary floating-point formats and associated operations, with a focus on facilitating machine learning. These formats allow efficient and consistent representation of values in a small number of bits. The defined formats are parameterized over width and precision in bits, signedness, and the presence of infinities. Operations are defined by decoding floating-point values to the set of closed extended reals: the reals augmented with positive and negative infinity and NaN (Not a Number). Explicit treatment of NaN and infinite operands ensures that only real arithmetic is invoked in operation definitions. Extensive rounding and saturation modes are defined; stochastic rounding is included. Operations are exception-free, accelerating throughput, with exceptional situations communicated through return values, e.g., NaN. Operations on blocks of values sharing a common scale factor are defined in terms of the underlying operations in a uniform manner. System vendors may describe approximate implementations via a novel scale-invariant measure, akin to units in the last place, called kappa-approximation. Standard function definitions and various other properties are mechanically verified and generated using formal specifications.