🤖 AI Summary
To address the challenge of joint reasoning over textual and tabular data in open-domain multi-hop question answering, this paper proposes an end-to-end cross-modal reasoning framework based on deep reinforcement learning. Methodologically, it introduces Proximal Policy Optimization (PPO)—the first application of policy gradient methods to text-table joint QA—integrating a multimodal encoder (BERT + TabTransformer) with a differentiable table operation module to dynamically plan reading order and operational actions, eliminating the need for predefined alignments or intermediate supervision. Its key innovations include joint optimization of cross-modal reasoning paths and Monte Carlo policy evaluation. The framework achieves state-of-the-art performance on WikiTableQuestions and HybridQA, improving accuracy by 3.2% and 4.7%, respectively, demonstrating substantial gains in complex, cross-table, multi-hop reasoning capability.