Trajectory-Class-Aware Multi-Agent Reinforcement Learning

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of policy generalization in multi-task multi-agent reinforcement learning, this paper proposes a trajectory-class-aware framework enabling agents to adapt to diverse tasks requiring distinct coordination strategies after a single training phase. Methodologically, it introduces (1) a novel quantized autoencoder-based trajectory embedding and unsupervised clustering approach to automatically discover task-relevant trajectory classes; and (2) agent-level trajectory-class predictors coupled with class-specific representation models, jointly trained via multi-head attention and POMDP-based policy networks for class-aware learning. Evaluated on the StarCraft II multi-task benchmark, the method substantially outperforms state-of-the-art approaches, achieving significant improvements in cross-task policy generalization and trajectory-class identification accuracy (>89%).

Technology Category

Application Category

📝 Abstract
In the context of multi-agent reinforcement learning, generalization is a challenge to solve various tasks that may require different joint policies or coordination without relying on policies specialized for each task. We refer to this type of problem as a multi-task, and we train agents to be versatile in this multi-task setting through a single training process. To address this challenge, we introduce TRajectory-class-Aware Multi-Agent reinforcement learning (TRAMA). In TRAMA, agents recognize a task type by identifying the class of trajectories they are experiencing through partial observations, and the agents use this trajectory awareness or prediction as additional information for action policy. To this end, we introduce three primary objectives in TRAMA: (a) constructing a quantized latent space to generate trajectory embeddings that reflect key similarities among them; (b) conducting trajectory clustering using these trajectory embeddings; and (c) building a trajectory-class-aware policy. Specifically for (c), we introduce a trajectory-class predictor that performs agent-wise predictions on the trajectory class; and we design a trajectory-class representation model for each trajectory class. Each agent takes actions based on this trajectory-class representation along with its partial observation for task-aware execution. The proposed method is evaluated on various tasks, including multi-task problems built upon StarCraft II. Empirical results show further performance improvements over state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Generalization in multi-agent reinforcement learning tasks
Versatile agents in multi-task settings without task-specific policies
Trajectory-class-aware policy for task-aware execution using partial observations
Innovation

Methods, ideas, or system contributions that make the work stand out.

TRAMA uses trajectory-class-aware multi-agent reinforcement learning.
Agents predict trajectory class for task-aware action policies.
Quantized latent space and clustering enhance trajectory embeddings.
🔎 Similar Papers
No similar papers found.
Hyungho Na
Hyungho Na
Postdoctoral Researcher, Korea Advanced Institute of Science and Technology
(Multi-agent) Reinforcement LearningRepresentation LearningMulti-agent systems
K
Kwanghyeon Lee
Korea Advanced Institute of Science and Technology (KAIST)
S
Sumin Lee
Korea Advanced Institute of Science and Technology (KAIST)
I
IL-Chul Moon
Korea Advanced Institute of Science and Technology (KAIST), summary.ai