Balancing Image Compression and Generation with Bootstrapped Tokenization

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the redundancy and computational burden in conventional image tokenization methods, which entangle multi-granularity information into a single token set. To overcome this limitation, the authors propose SelfBootTok, a novel framework that, for the first time, disentangles image representations into distinct global and local token groups via self-bootstrapped learning. By enabling the generator to rely solely on a lightweight set of global tokens for efficient image synthesis, the modeling of fine visual details is effectively offloaded to the tokenizer. This design significantly enhances both generation efficiency and quality: the generator’s computational cost is reduced by approximately 40%, and using only 64 tokens, the method achieves a new state-of-the-art gFID of 1.56 on ImageNet, outperforming existing approaches in both reconstruction fidelity and generative performance.

📝 Abstract

Despite progress in image tokenization, standard methods encode redundant information by mixing all granularities within each token, thus redundancy persists between tokens. The mix of information of different granularity also complicates the training of generators. This paper introduces SelfBootTok, a method that resolves this by cleanly decomposing information into global and local token groups. Through self-bootstrapped learning, the model predicts local details exclusively from global tokens, shifting the burden of visual details from the generator to the tokenizer. Consequently, our generator is far more efficient, requiring only global tokens and reducing computation by approximately 40%, while delivering superior reconstruction and generation. Moreover, this paradigm scales elegantly: by leveraging more data or parameters to self-supervise local representation learning, SelfBootTok achieves a new state-of-the-art gFID score of 1.56 using only 64 tokens.

Problem

Research questions and friction points this paper is trying to address.

image tokenization

redundancy

granularity

generator training

visual details

Innovation

Methods, ideas, or system contributions that make the work stand out.

SelfBootTok

image tokenization

self-bootstrapped learning