DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration

๐Ÿ“… 2026-04-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the urgent demand for highly energy-efficient and flexible low-precision floating-point multiply-accumulate (MAC) units driven by AI and edge computing applications. The paper proposes a fully pipelined dual-precision floating-point MAC engine supporting FP8 (E4M3/E5M2) and FP4 (E2M1/E1M2) formats. Its key innovation lies in a novel bit-partitioning architecture that enables a single 4-bit multiplier to be dynamically configured as either one 4ร—4 or two parallel 2ร—2 multipliers, achieving 100% hardware utilization with no logic redundancy. Implemented in 28 nm CMOS technology, the design integrates mixed-precision support and dynamic bit-width reconfiguration, operating at 1.94 GHz while occupying only 0.00396 mmยฒ and consuming 2.13 mWโ€”yielding up to 60.4% area and 86.6% power savings compared to state-of-the-art alternatives.
๐Ÿ“ Abstract
The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a fully pipelined dual-precision floating-point MAC processing engine supporting FP8 formats (E4M3, E5M2) and FP4 formats (E2M1, E1M2), specifically optimized for low-power and high-throughput AI workloads. The proposed architecture employs a novel bit-partitioning technique that enables a single 4-bit unit multiplier to operate either as a standard 4x4 multiplier for FP8 or as two parallel 2x2 multipliers for 2-bit operands, achieving 100 percent hardware utilization without duplicating logic. Implemented in 28 nm technology, the proposed processing engine achieves an operating frequency of 1.94 GHz with an area of 0.00396 mm^2 and power consumption of 2.13 mW, resulting in up to 60.4 percent area reduction and 86.6 percent power savings compared to state-of-the-art designs.
Problem

Research questions and friction points this paper is trying to address.

low-precision arithmetic
floating-point MAC
AI acceleration
energy efficiency
flexible computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-precision
bit-partitioning
floating-point MAC
FP8/FP4
hardware utilization
๐Ÿ”Ž Similar Papers
No similar papers found.
Shubham Kumar
Shubham Kumar
Ph.D. Student at University of Illinois at Urbana Champaign
V
Vijay Pratap Sharma
NSDCS Research Group, Dept of Electrical Engineering, Indian Institute of Technology Indore, Madhya Pradesh 453552, India
V
Vaibhav Neema
IET DAVV, Khandwa Road, Indore, Madhya Pradesh, 452001, India
S
Santosh Kumar Vishvakarma
NSDCS Research Group, Dept of Electrical Engineering, Indian Institute of Technology Indore, Madhya Pradesh 453552, India