The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses fundamental challenges in reinforcement learning (RL)—including sparse prior knowledge, difficulty in long-horizon planning, and complex reward function design—by systematically surveying recent advances in integrating large language models (LLMs) and vision-language models (VLMs) into RL. We propose a novel tripartite functional taxonomy wherein LLMs/VLMs serve as agents, planners, and reward generators, and introduce a unified language-vision-action modeling framework that integrates multimodal alignment, prompt engineering, and interpretability analysis. Our contributions include: (i) the first structured survey framework for LLM/VLM-augmented RL; (ii) identification of four critical open problems—grounding fidelity, bias mitigation, representation optimization, and action grounding; and (iii) theoretical foundations and evolutionary pathways for multimodal intelligent decision-making. The work bridges symbolic reasoning and embodied control, advancing principled integration of foundation models into RL systems. (149 words)

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LLMs and VLMs are used to overcome key challenges in RL, such as lack of prior knowledge, long-horizon planning, and reward design. We present a taxonomy that categorizes these LLM/VLM-assisted RL approaches into three roles: agent, planner, and reward. We conclude by exploring open problems, including grounding, bias mitigation, improved representations, and action advice. By consolidating existing research and identifying future directions, this survey establishes a framework for integrating LLMs and VLMs into RL, advancing approaches that unify natural language and visual understanding with sequential decision-making.

Problem

Research questions and friction points this paper is trying to address.

Integrates LLMs and VLMs into RL

Addresses RL challenges like planning and rewards

Explores grounding and bias mitigation in RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs and VLMs in RL

Addresses RL challenges via multimodal models

Categorizes LLM/VLM roles in RL

🔎 Similar Papers

No similar papers found.

Authors to Follow