Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Transformer (ViT) models suffer from low efficiency, redundant background noise, and spurious correlations during data-free dense image inversion—particularly for high-resolution inputs. Method: We propose a sparse inversion framework that (i) first identifies and models the “hallucination” phenomenon inherent in inversion processes, (ii) introduces a plug-and-play gradient masking mechanism, and (iii) integrates semantic saliency detection to enable selective foreground reconstruction while actively suppressing background noise and spurious feature correlations. Contribution/Results: Our approach requires no modification to the original loss function, achieves up to 3.79× speedup in inversion latency, and maintains—or even surpasses—the performance of dense inversion on downstream tasks including data-free quantization and knowledge transfer.

Technology Category

Application Category

📝 Abstract
Model inversion, which aims to reconstruct the original training data from pre-trained discriminative models, is especially useful when the original training data is unavailable due to privacy, usage rights, or size constraints. However, existing dense inversion methods attempt to reconstruct the entire image area, making them extremely inefficient when inverting high-resolution images from large-scale Vision Transformers (ViTs). We further identify two underlying causes of this inefficiency: the redundant inversion of noisy backgrounds and the unintended inversion of spurious correlations--a phenomenon we term "hallucination" in model inversion. To address these limitations, we propose a novel sparse model inversion strategy, as a plug-and-play extension to speed up existing dense inversion methods with no need for modifying their original loss functions. Specifically, we selectively invert semantic foregrounds while stopping the inversion of noisy backgrounds and potential spurious correlations. Through both theoretical and empirical studies, we validate the efficacy of our approach in achieving significant inversion acceleration (up to 3.79 faster) while maintaining comparable or even enhanced downstream performance in data-free model quantization and data-free knowledge transfer. Code is available at https://github.com/Egg-Hu/SMI.
Problem

Research questions and friction points this paper is trying to address.

Reconstruct training data from pre-trained Vision Transformers efficiently
Address inefficiency of dense inversion for high-resolution images
Prevent inversion of noisy backgrounds and spurious correlations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selectively inverts semantic foregrounds only
Stops inversion of noisy background areas
Prevents spurious correlation hallucination in inversion
🔎 Similar Papers
No similar papers found.
Z
Zixuan Hu
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Yongxian Wei
Yongxian Wei
Tsinghua University
Machine Learning
L
Li Shen
School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen, China
Zhenyi Wang
Zhenyi Wang
Assistant Professor of Computer Science, University of Central Florida
continual learningAI securitydata-efficient learningfoundation model
L
Lei Li
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
C
Chun Yuan
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining