MRT at SemEval-2025 Task 8: Maximizing Recovery from Tables with Multiple Steps

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses SemEval-2025 Task 8, a table-based question answering challenge requiring precise semantic understanding and reasoning over structured data. Method: We propose a multi-step Python code generation framework comprising three tightly coupled stages: (1) semantic parsing of the input table, (2) generation of executable natural-language reasoning steps, and (3) translation into robust, executable Python code with dynamic execution and adaptive error recovery. The framework establishes a closed-loop pipeline—“instruction generation → code translation → dynamic execution”—leveraging fine-grained prompt engineering and collaborative inference among open-source large language models. Contribution/Results: Our approach requires no model fine-tuning, relying solely on program synthesis, prompt engineering, and built-in exception handling to achieve end-to-end table understanding and self-correcting execution. On Task 8 Subtask 1, it achieves 70.50% accuracy—substantially outperforming baseline methods—and demonstrates the effectiveness and robustness of stepwise, controllable code generation for table QA.

Technology Category

Application Category

📝 Abstract
In this paper we expose our approach to solve the extit{SemEval 2025 Task 8: Question-Answering over Tabular Data} challenge. Our strategy leverages Python code generation with LLMs to interact with the table and get the answer to the questions. The process is composed of multiple steps: understanding the content of the table, generating natural language instructions in the form of steps to follow in order to get the answer, translating these instructions to code, running it and handling potential errors or exceptions. These steps use open source LLMs and fine grained optimized prompts for each task (step). With this approach, we achieved a score of $70.50%$ for subtask 1.
Problem

Research questions and friction points this paper is trying to address.

Solving question-answering over tabular data challenge
Generating Python code with LLMs for table interaction
Optimizing multi-step process for accurate answer retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Python code generation with LLMs
Multi-step natural language instructions
Fine-grained optimized prompts
🔎 Similar Papers
No similar papers found.
Maximiliano Hormazábal Lagos
Maximiliano Hormazábal Lagos
PhD Student
Computer VisionNatural Language ProcessingDocument Image AnalysisVision Language Models
Á
Álvaro Bueno Sáez
Fundación Centro Tecnolóxico de Telecomunicacións de Galicia (GRADIANT), Vigo, Spain
H
Héctor Cerezo-Costas
Fundación Centro Tecnolóxico de Telecomunicacións de Galicia (GRADIANT), Vigo, Spain
P
Pedro Alonso Doval
Fundación Centro Tecnolóxico de Telecomunicacións de Galicia (GRADIANT), Vigo, Spain
J
Jorge Alcalde Vesteiro
Fundación Centro Tecnolóxico de Telecomunicacións de Galicia (GRADIANT), Vigo, Spain