World Model Implanting for Test-time Adaptation of Embodied Agents

๐Ÿ“… 2025-09-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of retraining and poor generalization faced by embodied AI in unseen environments, this paper proposes a test-time adaptation framework that requires neither fine-tuning nor additional training data. Instead, it dynamically injects plug-and-play, domain-specific world modelsโ€”tightly coupled with large language models (LLMs)โ€”to enable cross-domain reasoning. Key contributions include: (1) a trajectory-abstraction-based prototype retrieval mechanism for efficient domain identification; (2) a dynamic world model injection strategy that adapts model components to environmental affordances at inference time; and (3) a composite attention mechanism that jointly leverages LLM-derived semantic understanding and world-model-derived physical representations. Evaluated on VirtualHome and ALFWorld, the method achieves significant zero-shot and few-shot performance gains, demonstrating strong cross-task and cross-environment generalization, as well as scalable deployment capability.

Technology Category

Application Category

๐Ÿ“ Abstract
In embodied AI, a persistent challenge is enabling agents to robustly adapt to novel domains without requiring extensive data collection or retraining. To address this, we present a world model implanting framework (WorMI) that combines the reasoning capabilities of large language models (LLMs) with independently learned, domain-specific world models through test-time composition. By allowing seamless implantation and removal of the world models, the embodied agent's policy achieves and maintains cross-domain adaptability. In the WorMI framework, we employ a prototype-based world model retrieval approach, utilizing efficient trajectory-based abstract representation matching, to incorporate relevant models into test-time composition. We also develop a world-wise compound attention method that not only integrates the knowledge from the retrieved world models but also aligns their intermediate representations with the reasoning model's representation within the agent's policy. This framework design effectively fuses domain-specific knowledge from multiple world models, ensuring robust adaptation to unseen domains. We evaluate our WorMI on the VirtualHome and ALFWorld benchmarks, demonstrating superior zero-shot and few-shot performance compared to several LLM-based approaches across a range of unseen domains. These results highlight the frameworks potential for scalable, real-world deployment in embodied agent scenarios where adaptability and data efficiency are essential.
Problem

Research questions and friction points this paper is trying to address.

Enabling embodied agents to adapt to novel domains without extensive retraining
Combining LLM reasoning with domain-specific world models during test-time
Achieving robust cross-domain adaptability through seamless model implantation
Innovation

Methods, ideas, or system contributions that make the work stand out.

World model implanting framework for test-time adaptation
Prototype-based retrieval with trajectory matching
World-wise compound attention for knowledge integration
๐Ÿ”Ž Similar Papers
No similar papers found.