Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 3

🤖 AI Summary

To address the insufficient semantic coherence and relevance in textless spoken-language models (SLMs), this paper proposes the first AI-feedback-based reinforcement learning alignment framework specifically designed for text-agnostic spoken-language modeling. Our method introduces the Reinforcement Learning from AI Feedback (RLAIF) paradigm into SLM training, integrating semantic-driven multi-candidate generation, zero-shot speech modeling, semantic-metric-guided preference data construction, and Direct Preference Optimization (DPO). A GPT-4o–based automatic evaluation mechanism is employed throughout data curation and result validation. On benchmarks including ZeroSpeech 2021 and the spoken-language variant of StoryCloze, our approach achieves state-of-the-art performance. Both GPT-4o evaluations and human assessments demonstrate significant improvements over existing methods, marking the first systematic enhancement of implicit semantic understanding in SLMs.

Technology Category

Application Category

📝 Abstract

While textless Spoken Language Models (SLMs) have shown potential in end-to-end speech-to-speech modeling, they still lag behind text-based Large Language Models (LLMs) in terms of semantic coherence and relevance. This work introduces the Align-SLM framework, which leverages preference optimization inspired by Reinforcement Learning with AI Feedback (RLAIF) to enhance the semantic understanding of SLMs. Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO). We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation. Experimental results show that our method achieves state-of-the-art performance for SLMs on most benchmarks, highlighting the importance of preference optimization to improve the semantics of SLMs.

Problem

Research questions and friction points this paper is trying to address.

Improving semantic coherence in textless spoken language models

Bridging performance gap between speech and text models

Enhancing SLM semantics through reinforcement learning feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning with AI Feedback

Direct Preference Optimization for SLMs

Generating speech continuations with semantic metrics

🔎 Similar Papers

No similar papers found.

Authors to Follow