TED: Accelerate Model Training by Internal Generalization

📅 2024-05-06
🏛️ European Conference on Artificial Intelligence
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address data redundancy in large-model training and overfitting under high pruning rates, this paper proposes TED, a data pruning framework. Methodologically, it introduces “internal generalization” (IG)—a novel metric quantifying the trade-off between model fitting on retained data and performance improvement on pruned data—and formulates the “internal generalization distance” (IGD) as an implicit regularization objective. TED further develops an efficient IGD estimator based on first-order masked Taylor approximation and a progressive pruning strategy. Evaluated across image classification, natural language understanding, and large language model fine-tuning tasks, TED achieves zero accuracy degradation while retaining only 30–40% of the original training data, substantially reducing computational and memory overhead. Key contributions include: (i) the formal definition of IG and IGD as principled criteria for data pruning; (ii) a scalable, gradient-based IGD estimation technique; and (iii) empirical validation of high-fidelity pruning across diverse modalities and scales.

Technology Category

Application Category

📝 Abstract
Large language models have demonstrated strong performance in recent years, but the high cost of training drives the need for efficient methods to compress dataset sizes. We propose TED pruning, a method that addresses the challenge of overfitting under high pruning ratios by quantifying the model's ability to improve performance on pruned data while fitting retained data, known as Internal Generalization (IG). TED uses an optimization objective based on Internal Generalization Distance (IGD), measuring changes in IG before and after pruning to align with true generalization performance and achieve implicit regularization. The IGD optimization objective was verified to allow the model to achieve the smallest upper bound on generalization error. The impact of small mask fluctuations on IG is studied through masks and Taylor approximation, and fast estimation of IGD is enabled. In analyzing continuous training dynamics, the prior effect of IGD is validated, and a progressive pruning strategy is proposed. Experiments on image classification, natural language understanding, and large language model fine-tuning show TED achieves lossless performance with 60-70% of the data. Upon acceptance, our code will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Reduces dataset size for efficient model training
Addresses overfitting under high pruning ratios
Optimizes internal generalization to improve performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

TED pruning method using Internal Generalization
Optimization based on Internal Generalization Distance metric
Progressive pruning strategy achieving lossless performance
🔎 Similar Papers
No similar papers found.
J
Jinying Xiao
Changsha University of Science and Technology
P
Ping Li
Changsha University of Science and Technology
J
Jie Nie
Changsha University of Science and Technology