NSVQ: Mitigating Codebook Collapse by Stabilizing Encoder Drift in Vector Quantization

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses codebook collapse in large-scale vector quantization, a phenomenon characterized by unassigned codewords and increased quantization error, and identifies encoder drift as the primary underlying cause. To mitigate this issue, the authors propose Non-stationary-aware Vector Quantization (NSVQ), a novel training strategy that employs a non-stationary embedding loss to guide the codebook in tracking early-stage encoder drift. NSVQ integrates dynamic codebook replacement and phased encoder freezing to first enable joint optimization and subsequently stabilize the architecture, followed by adversarial fine-tuning to disrupt the feedback loop of quantization error. Evaluated on ImageNet-1k at 128×128 resolution, NSVQ reduces the reconstruction Fréchet Inception Distance (rFID) from 2.39 to 2.10, achieves 100% codebook utilization, and substantially enhances the generation quality of downstream latent diffusion models.

📝 Abstract

Vector quantization is central to modern generative modeling pipelines, but large-codebook VQ models often suffer from codebook collapse. We identify encoder drift as a key driver of this failure: as the encoder moves the latent distribution, sparsely updated code vectors can lag behind, lose assignments, and increase quantization error, creating a feedback loop through the straight-through estimator. We propose NSVQ, a non-stationary-aware VQ training strategy that combines a dense non-stationary embedding loss, codebook replacement, and stage-wise encoder freezing. NSVQ first helps the codebook track encoder drift during early training, then freezes the encoder to consolidate the codebook under a fixed latent geometry, and finally reintroduces adversarial refinement. Experiments on ImageNet-1k show that NSVQ improves reconstruction quality while maintaining full codebook utilization. On ImageNet-1k at 128$\times$128 with 65,536 codes, NSVQ reduces rFID from 2.39 to 2.10 compared with SimVQ, while both methods maintain 100\% utilization. Additional latent diffusion experiments show that NSVQ also improves downstream ImageNet generation FID.

Problem

Research questions and friction points this paper is trying to address.

codebook collapse

vector quantization

encoder drift

quantization error

generative modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

vector quantization

codebook collapse

encoder drift