LyS at SemEval 2025 Task 8: Zero-Shot Code Generation for Tabular QA

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses zero-shot tabular question answering (TQA) by proposing a fine-tuning-free, large language model (LLM)-driven code generation approach. Given an input question, the method automatically synthesizes executable Python code to retrieve answers from structured tables. To enhance robustness, it employs a three-stage modular pipeline: (1) column importance identification and data type analysis to improve semantic understanding; (2) initial code generation guided by syntactic parsing and structured prompting; and (3) error-feedback-driven iterative code reconstruction. Crucially, the method requires no parameter fine-tuning, thereby preserving LLM generalization while substantially improving code correctness and cross-domain adaptability in zero-shot settings. Evaluated on SemEval 2025 Task 8, it ranks 33rd among 53 participating teams—marking the first empirical validation of a purely zero-shot code-generation paradigm for complex TQA tasks, demonstrating both feasibility and effectiveness without task-specific adaptation.

Technology Category

Application Category

📝 Abstract

This paper describes our participation in SemEval 2025 Task 8, focused on Tabular Question Answering. We developed a zero-shot pipeline that leverages an Large Language Model to generate functional code capable of extracting the relevant information from tabular data based on an input question. Our approach consists of a modular pipeline where the main code generator module is supported by additional components that identify the most relevant columns and analyze their data types to improve extraction accuracy. In the event that the generated code fails, an iterative refinement process is triggered, incorporating the error feedback into a new generation prompt to enhance robustness. Our results show that zero-shot code generation is a valid approach for Tabular QA, achieving rank 33 of 53 in the test phase despite the lack of task-specific fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Zero-shot code generation for tabular QA

Improving extraction accuracy via modular pipeline

Iterative refinement for robust code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot pipeline using Large Language Model

Modular components for column and data analysis

Iterative refinement with error feedback

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering