Beyond Random: Automatic Inner-loop Optimization in Dataset Distillation

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the inflexibility and inefficiency of inner-loop optimization in dataset distillation—where conventional methods rely on fixed random truncation—this paper proposes Adaptive Truncation Backpropagation Through Time (AT-BPTT). Its core contributions are threefold: (1) the first stage-aware probabilistic timestep selection mechanism, enabling dynamic prioritization of informative timesteps; (2) an adaptive truncation window adjustment strategy guided by gradient variance detection, improving convergence stability and speed; and (3) a low-rank Hessian approximation to stabilize gradient estimation under truncation. Evaluated across multiple image benchmarks, AT-BPTT achieves an average 6.16% improvement in distilled model accuracy, accelerates inner-loop optimization by 3.9×, and reduces memory overhead by 63%, establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

The growing demand for efficient deep learning has positioned dataset distillation as a pivotal technique for compressing training dataset while preserving model performance. However, existing inner-loop optimization methods for dataset distillation typically rely on random truncation strategies, which lack flexibility and often yield suboptimal results. In this work, we observe that neural networks exhibit distinct learning dynamics across different training stages-early, middle, and late-making random truncation ineffective. To address this limitation, we propose Automatic Truncated Backpropagation Through Time (AT-BPTT), a novel framework that dynamically adapts both truncation positions and window sizes according to intrinsic gradient behavior. AT-BPTT introduces three key components: (1) a probabilistic mechanism for stage-aware timestep selection, (2) an adaptive window sizing strategy based on gradient variation, and (3) a low-rank Hessian approximation to reduce computational overhead. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet-1K show that AT-BPTT achieves state-of-the-art performance, improving accuracy by an average of 6.16% over baseline methods. Moreover, our approach accelerates inner-loop optimization by 3.9x while saving 63% memory cost.

Problem

Research questions and friction points this paper is trying to address.

Optimizing dataset distillation by replacing random truncation strategies

Adapting truncation positions dynamically based on gradient behavior

Reducing computational overhead while improving model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic truncation positions and window sizes adaptation

Probabilistic stage-aware timestep selection mechanism

Low-rank Hessian approximation for computational efficiency

🔎 Similar Papers

No similar papers found.