TED: Accelerate Model Training by Internal Generalization

📅 2024-05-06

🏛️ European Conference on Artificial Intelligence

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address data redundancy in large-model training and overfitting under high pruning rates, this paper proposes TED, a data pruning framework. Methodologically, it introduces “internal generalization” (IG)—a novel metric quantifying the trade-off between model fitting on retained data and performance improvement on pruned data—and formulates the “internal generalization distance” (IGD) as an implicit regularization objective. TED further develops an efficient IGD estimator based on first-order masked Taylor approximation and a progressive pruning strategy. Evaluated across image classification, natural language understanding, and large language model fine-tuning tasks, TED achieves zero accuracy degradation while retaining only 30–40% of the original training data, substantially reducing computational and memory overhead. Key contributions include: (i) the formal definition of IG and IGD as principled criteria for data pruning; (ii) a scalable, gradient-based IGD estimation technique; and (iii) empirical validation of high-fidelity pruning across diverse modalities and scales.

Technology Category

Application Category

📝 Abstract

Large language models have demonstrated strong performance in recent years, but the high cost of training drives the need for efficient methods to compress dataset sizes. We propose TED pruning, a method that addresses the challenge of overfitting under high pruning ratios by quantifying the model's ability to improve performance on pruned data while fitting retained data, known as Internal Generalization (IG). TED uses an optimization objective based on Internal Generalization Distance (IGD), measuring changes in IG before and after pruning to align with true generalization performance and achieve implicit regularization. The IGD optimization objective was verified to allow the model to achieve the smallest upper bound on generalization error. The impact of small mask fluctuations on IG is studied through masks and Taylor approximation, and fast estimation of IGD is enabled. In analyzing continuous training dynamics, the prior effect of IGD is validated, and a progressive pruning strategy is proposed. Experiments on image classification, natural language understanding, and large language model fine-tuning show TED achieves lossless performance with 60-70% of the data. Upon acceptance, our code will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

Reduces dataset size for efficient model training

Addresses overfitting under high pruning ratios

Optimizes internal generalization to improve performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

TED pruning method using Internal Generalization

Optimization based on Internal Generalization Distance metric

Progressive pruning strategy achieving lossless performance

🔎 Similar Papers

No similar papers found.