Training with Pseudo-Code for Instruction Following

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited generalization on complex, compositional instruction-following tasks. To address this, we propose pseudocode-augmented instruction tuning: a lightweight, structured intermediate representation automatically integrated throughout the entire instruction-tuning pipeline—without requiring manual coding or expert-crafted prompts. This approach implicitly guides models to learn semantic structures of instructions and their mappings to executable behaviors. We conduct systematic evaluation across five mainstream open-source LLMs on eleven public benchmarks spanning instruction following, mathematical reasoning, and commonsense reasoning. Results show an average performance gain of 14%, with instruction-following gains ranging from 3% to 19%, while preserving pre-existing reasoning capabilities. To our knowledge, this is the first work to incorporate *automatically generated pseudocode* as a universal intermediate representation across all stages of instruction tuning, significantly enhancing generalization from unstructured natural-language instructions to executable behavior.

Technology Category

Application Category

📝 Abstract

Despite the rapid progress in the capabilities of Large Language Models (LLMs), they continue to have difficulty following relatively simple, unambiguous instructions, especially when compositions are involved. In this paper, we take inspiration from recent work that suggests that models may follow instructions better when they are expressed in pseudo-code. However, writing pseudo-code programs can be tedious and using few-shot demonstrations to craft code representations for use in inference can be unnatural for non-expert users of LLMs. To overcome these limitations, we propose fine-tuning LLMs with instruction-tuning data that additionally includes instructions re-expressed in pseudo-code along with the final response. We evaluate models trained using our method on $11$ publicly available benchmarks comprising of tasks related to instruction-following, mathematics, and common-sense reasoning. We conduct rigorous experiments with $5$ different models and find that not only do models follow instructions better when trained with pseudo-code, they also retain their capabilities on the other tasks related to mathematical and common sense reasoning. Specifically, we observe a relative gain of $3$--$19$% on instruction-following benchmark, and an average gain of upto 14% across all tasks.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with simple, unambiguous instruction-following tasks

Pseudo-code improves instruction-following but is tedious to write

Fine-tuning LLMs with pseudo-code enhances performance across multiple benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs with pseudo-code instructions

Enhancing instruction-following via pseudo-code training

Improving model performance across multiple benchmarks

🔎 Similar Papers

No similar papers found.