🤖 AI Summary
Existing auditory large language models exhibit limited generalization under few-shot prompting, particularly in low-resource tasks such as child speech recognition. To address this challenge, this work proposes FSA-GRPO, a reinforcement learning-based post-training approach built upon Group Relative Policy Optimization (GRPO). FSA-GRPO incorporates an example-aware reward function, a data selection strategy, and an auxiliary reward weighting mechanism to guide the model toward more effective utilization of few-shot examples during inference. Remarkably, when trained solely on high-resource adult speech data and without any target-domain supervision, FSA-GRPO significantly enhances few-shot generalization across multiple tasks—including child speech recognition, speech translation, and audio understanding—outperforming conventional out-of-domain fine-tuning methods.
📝 Abstract
Few-shot prompting provides an effective way to adapt auditory large language models to low-resource tasks such as children's speech recognition. However, most auditory large language models are not explicitly trained to perform inference in this demonstration-conditioned format, limiting the extent to which they can benefit from few-shot prompting. To address this limitation, we introduce Few-Shot Aware GRPO (FSA-GRPO), an RL-based post-training recipe that uses a specially designed reward to encourage the model to leverage few-shot demonstrations, thereby strengthening its few-shot adaptation ability. Notably, training with only high-resource adult ASR data improves the model's general few-shot adaptation ability, yielding gains not only in children's speech recognition but also in speech translation and audio understanding. We further study data selection and auxiliary reward weighting to identify an effective training recipe. Our experiments show that when in-domain data are unavailable or cannot be used for training, FSA-GRPO is more effective than direct tuning on related out-of-domain data.