I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses

📅 2024-02-17
🏛️ Conference on Empirical Methods in Natural Language Processing
📈 Citations: 11
Influential: 1
📄 PDF
🤖 AI Summary
This paper investigates why supervised fine-tuning (SFT) using LLM-generated responses—rather than human-authored ones—yields superior performance on reasoning tasks. Method: We conduct perplexity analysis, cross-task generalization evaluation, and controlled ablation experiments; additionally, we employ multi-LLM collaborative response generation for SFT. Contribution/Results: We identify, for the first time, that the key mechanism is the model’s “inherent familiarity” with LLM-generated text—evidenced by lower pre-fine-tuning perplexity—rather than differences in response length or information content. Our approach not only improves performance on target reasoning tasks but also significantly enhances zero-shot generalization to unseen reasoning benchmarks. These findings offer a novel perspective on LLM self-optimization mechanisms and empirically validate the efficacy and transferability of familiarity-driven learning in language model adaptation.

Technology Category

Application Category

📝 Abstract
This paper explores an intriguing observation: fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans, particularly in reasoning tasks. We conduct an in-depth investigation to understand why this occurs. Contrary to the common belief that these instances is due to the more detailed nature of LLM-generated content, our study identifies another contributing factor: an LLM is inherently more “familiar” with LLM generated responses. This familiarity is evidenced by lower perplexity before fine-tuning. We design a series of experiments to understand the impact of the “familiarity” and our conclusion reveals that this “familiarity” significantly impacts learning performance. Training with LLM-generated responses not only enhances performance but also helps maintain the model’s capabilities in other reasoning tasks after fine-tuning on a specific task.
Problem

Research questions and friction points this paper is trying to address.

Why LLM-generated responses outperform human ones in fine-tuning
Investigates the role of LLM familiarity with its own generated content
Explores how this familiarity enhances and maintains reasoning performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs with LLM-generated responses improves performance
LLM familiarity with its own responses enhances learning efficiency
Using LLM-generated data maintains reasoning capabilities across tasks
🔎 Similar Papers
No similar papers found.
X
Xuan Ren
University of Adelaide
B
Biao Wu
University of Technology Sydney
Lingqiao Liu
Lingqiao Liu
Associate Professor at the University of Adelaide
computer visionmachine learning