๐ค AI Summary
This work addresses the significant performance degradation commonly observed in speaker verification under low-bit quantization, a phenomenon whose underlying mechanisms remain poorly understood. Through inter-layer weight analysis and score-level error tracing, the study revealsโfor the first timeโa performance inflection point at 2-bit precision and harmful decision flips near critical thresholds. Building on these insights, the authors propose a multi-precision cascaded calibration strategy that dynamically elevates quantization precision only for ambiguous samples, integrating uniform K-means quantization-aware training with an efficient inference mechanism. This approach substantially reduces computational and memory costs while preserving verification accuracy close to that of FP32 models, thereby enabling efficient and reliable low-bit speaker verification.
๐ Abstract
Although low-bit quantization provides practical means to deploy speaker verification on resource-constrained devices, its effects on speaker verification performance remain poorly understood. In this paper, we study uniform K-means quantization-aware training of ResNet-36 and ResNet-200 through joint layer-wise and score-level analyses. Our layer-wise analysis highlights fragile components and shows that score degradation is not fully explained by weight distortion alone. We identify a clear knee point at 2 bits, with larger score drift and harmful decision flips concentrated near the FP32 threshold. Our score-level analysis reveals where and how score errors emerge under extreme quantization. Building on these findings, we propose a calibrated multi-precision cascade that resolves most trials at 2 bits and escalates only ambiguous cases, achieving performance close to FP32 while preserving the efficiency benefits of low-bit inference with substantially lower compute and memory costs.