ATOM: Attention Mixer for Efficient Dataset Distillation

📅 2024-05-02
🏛️ 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
Existing dataset distillation methods suffer from limited downstream performance gains, inadequate contextual modeling, and poor cross-architecture generalization. To address these issues, we propose a dual-path attention mechanism for efficient distillation, which— for the first time—jointly integrates spatial attention (to enhance class-localization consistency) and channel attention (to model intra-class semantic context) directly into the feature matching process. This enables synergistic modeling of class-level semantics and spatial structure without requiring bi-level optimization and remains fully compatible with arbitrary CNN architectures. On CIFAR-10/100 and Tiny-ImageNet, our method significantly outperforms state-of-the-art approaches: with only 1–5 synthetic images per class, top-1 accuracy improves by up to 8.2%. Moreover, it maintains robust high performance across downstream tasks—including neural architecture search—effectively mitigating performance degradation under low-sample regimes.

Technology Category

Application Category

📝 Abstract
Recent works in dataset distillation seek to minimize training expenses by generating a condensed synthetic dataset that encapsulates the information present in a larger real dataset. These approaches ultimately aim to attain test accuracy levels akin to those achieved by models trained on the entirety of the original dataset. Previous studies in feature and distribution matching have achieved significant results without incurring the costs of bi-level optimization in the distillation process. Despite their convincing efficiency, many of these methods suffer from marginal downstream performance improvements, limited distillation of contextual information, and subpar cross-architecture generalization. To address these challenges in dataset distillation, we propose the ATtentiOn Mixer (ATOM) module to efficiently distill large datasets using a mixture of channel and spatial-wise attention in the feature matching process. Spatial-wise attention helps guide the learning process based on consistent localization of classes in their respective images, allowing for distillation from a broader receptive field. Meanwhile, channel-wise attention captures the contextual information associated with the class itself, thus making the synthetic image more informative for training. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets, including CIFAR10/100 and Tiny-Imagenet. Notably, our method significantly improves performance in scenarios with a low number of images per class, thereby enhancing its potential. Furthermore, we maintain the improvement on cross-architectures and applications such as neural architecture search.
Problem

Research questions and friction points this paper is trying to address.

Improves dataset distillation efficiency with attention mechanisms
Enhances contextual information and cross-architecture generalization
Boosts performance in low-data scenarios and downstream tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

ATOM module mixes channel and spatial-wise attention
Spatial-wise attention guides learning from broad fields
Channel-wise attention captures contextual class information
🔎 Similar Papers
No similar papers found.
Samir Khaki
Samir Khaki
University of Toronto
Artificial Intelligence
A
A. Sajedi
University of Toronto
K
Kai Wang
National University of Singapore
L
Lucy Z. Liu
Royal Bank of Canada (RBC)
Y
Y. Lawryshyn
University of Toronto
K
K. Plataniotis
University of Toronto