🤖 AI Summary
Existing approaches struggle to simultaneously ensure route feasibility and destination accuracy in joint monthly-scale ship trajectory and destination prediction. This work proposes a Reinforcement Learning framework with Verifiable Rewards (RLVR), which semantically encodes AIS trajectories into textual sequences and leverages large language models (LLMs) for long-horizon joint forecasting. By integrating physically valid constraints, early-stage weighted supervision, and a hierarchical destination matching mechanism, the method achieves, for the first time, alignment between semantic reasoning and verifiable objectives. Experimental results demonstrate that a 4B-parameter LLM trained with RLVR significantly outperforms both zero-shot LLMs and deep learning baselines on destination prediction metrics, confirming that task-specific optimization yields greater gains than merely scaling up model size.
📝 Abstract
Long-horizon maritime trajectory prediction is important for shipping management, logistics planning, and maritime risk analysis, yet month-level forecasting remains insufficiently studied. Existing deep learning methods mainly focus on short- and mid-term coordinate extrapolation and often struggle to preserve route feasibility and destination correctness over extended horizons. This paper investigates joint long-horizon vessel trajectory and destination forecasting with reasoning-capable large language models, and develops a Maritime LLM post-training framework based on Reinforcement Learning with Verifiable Reward (RLVR). An AIS-based benchmark is constructed with 60-day historical trajectories and 30-day forecasting horizons, where trajectories are converted into semantic textual representations for RL prompt construction. RLVR aligns LLMs with maritime forecasting objectives by enforcing physical validity, providing early-weighted trajectory supervision, and evaluating destination correctness through hierarchical matching and curriculum learning. Experimental results show that RLVR-trained LLMs substantially improve over zero-shot LLMs and representative deep learning baselines, especially on destination-related metrics. Among the evaluated RLVR-trained variants, 4B LLMs achieve the best overall performance, suggesting that reward-compatible optimization and task-specific capacity matching are more important than simply using larger 8B or 14B LLMs. The results also show that LSTM remains a strong deep learning baseline under limited fine-tuning data, while Transformer-style spatio-temporal models typically require larger datasets and richer structured inputs. Overall, this work advances semantic, verifier-aligned maritime forecasting for operational decision support.