Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inflexibility and inefficiency of inner-loop optimization in dataset distillation—where conventional methods rely on fixed random truncation—this paper proposes Adaptive Truncation Backpropagation Through Time (AT-BPTT). Its core contributions are threefold: (1) the first stage-aware probabilistic timestep selection mechanism, enabling dynamic prioritization of informative timesteps; (2) an adaptive truncation window adjustment strategy guided by gradient variance detection, improving convergence stability and speed; and (3) a low-rank Hessian approximation to stabilize gradient estimation under truncation. Evaluated across multiple image benchmarks, AT-BPTT achieves an average 6.16% improvement in distilled model accuracy, accelerates inner-loop optimization by 3.9×, and reduces memory overhead by 63%, establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
The growing demand for efficient deep learning has positioned dataset distillation as a pivotal technique for compressing training dataset while preserving model performance. However, existing inner-loop optimization methods for dataset distillation typically rely on random truncation strategies, which lack flexibility and often yield suboptimal results. In this work, we observe that neural networks exhibit distinct learning dynamics across different training stages-early, middle, and late-making random truncation ineffective. To address this limitation, we propose Automatic Truncated Backpropagation Through Time (AT-BPTT), a novel framework that dynamically adapts both truncation positions and window sizes according to intrinsic gradient behavior. AT-BPTT introduces three key components: (1) a probabilistic mechanism for stage-aware timestep selection, (2) an adaptive window sizing strategy based on gradient variation, and (3) a low-rank Hessian approximation to reduce computational overhead. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet-1K show that AT-BPTT achieves state-of-the-art performance, improving accuracy by an average of 6.16% over baseline methods. Moreover, our approach accelerates inner-loop optimization by 3.9x while saving 63% memory cost.
Problem

Research questions and friction points this paper is trying to address.

Optimizing dataset distillation by replacing random truncation strategies
Adapting truncation positions dynamically based on gradient behavior
Reducing computational overhead while improving model accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic truncation positions and window sizes adaptation
Probabilistic stage-aware timestep selection mechanism
Low-rank Hessian approximation for computational efficiency
🔎 Similar Papers
No similar papers found.
M
Muquan Li
University of Electronic Science and Technology of China
H
Hang Gou
University of Electronic Science and Technology of China
Dongyang Zhang
Dongyang Zhang
University of Electronic Science and Technology of China
图像复原、超分辨率
S
Shuang Liang
University of Electronic Science and Technology of China
X
Xiurui Xie
University of Electronic Science and Technology of China
D
Deqiang Ouyang
Chongqing University
K
Ke Qin
University of Electronic Science and Technology of China