🤖 AI Summary
Implicit Neural Representations (INRs) rely on full-precision computation, incurring high hardware overhead; existing quantization methods primarily target weights while neglecting activation quantization, yielding limited hardware benefits. To address this, we propose Hadamard-Aware Quantization (HAQ), the first INR quantization framework incorporating the Hadamard transform to unify the statistical distributions of weights and activations, enabling end-to-end joint quantization. HAQ integrates with standard quantizers to co-quantize weights and activations within MLPs and features an FPGA-friendly inference architecture. Evaluated on image reconstruction, HAQ achieves 32.7% latency reduction, 40.1% energy savings, and up to 98.3% reduction in FPGA resource utilization versus full-precision baselines—significantly outperforming state-of-the-art INR quantization approaches.
📝 Abstract
Implicit Neural Representations (INRs) encode discrete signals using Multi-Layer Perceptrons (MLPs) with complex activation functions. While INRs achieve superior performance, they depend on full-precision number representation for accurate computation, resulting in significant hardware overhead. Previous INR quantization approaches have primarily focused on weight quantization, offering only limited hardware savings due to the lack of activation quantization. To fully exploit the hardware benefits of quantization, we propose DHQ, a novel distribution-aware Hadamard quantization scheme that targets both weights and activations in INRs. Our analysis shows that the weights in the first and last layers have distributions distinct from those in the intermediate layers, while the activations in the last layer differ significantly from those in the preceding layers. Instead of customizing quantizers individually, we utilize the Hadamard transformation to standardize these diverse distributions into a unified bell-shaped form, supported by both empirical evidence and theoretical analysis, before applying a standard quantizer. To demonstrate the practical advantages of our approach, we present an FPGA implementation of DHQ that highlights its hardware efficiency. Experiments on diverse image reconstruction tasks show that DHQ outperforms previous quantization methods, reducing latency by 32.7%, energy consumption by 40.1%, and resource utilization by up to 98.3% compared to full-precision counterparts.