🤖 AI Summary
The generalization mechanisms underlying multimodal parameter-efficient fine-tuning (PEFT) remain poorly understood. This work reveals, for the first time, a prevalent “flatness preference” phenomenon in PEFT: generalization performance is predominantly governed by only a few sharp optimization dimensions. Building on this insight, we propose Flatness Preference Optimization (FlatPO), a method that selectively smooths these critical dimensions. Remarkably, FlatPO achieves substantial improvements in model generalization while updating merely ~5% of the parameters. Extensive experiments demonstrate that FlatPO consistently outperforms full fine-tuning across multiple multimodal downstream tasks, thereby challenging conventional fine-tuning paradigms and offering a novel pathway toward highly efficient adaptation of large models.
📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods provide a streamlined and efficient tool for adapting large models to domain-specific multimodal downstream tasks. Although these methods proved their tangible effects in practice, their principal aspects remain under-explored. Therefore we remain curious about the underlying generalization mechanisms in various PEFT methods and how they can be further enhanced. In this paper, we reveal the flatness preference widely present in various PEFTs, where a small fraction of sharp dimensions dominates the generalization of PEFT. This finding suggests an appealing possibility: we may be satisfied with a better generalization by merely attending to this small fraction of sharp dimensions instead of all of them. Furthermore, we propose Flatness Preference Optimization (FlatPO) to flatten these key sharpness dimensions, leading various PEFTs toward better generalization. Extensive experiments demonstrate the effectiveness of our findings and the proposed method. Code is available at https://github.com/Can-Lin/FlatPO.