Improving Consistency in Large Language Models through Chain of Guidance

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Large language models (LLMs) lack built-in mechanisms to ensure output consistency during inference, leading to inconsistent responses to semantically equivalent inputs and undermining system reliability. To address this, we propose Chain of Guidance (CoG): a prompt-only, multi-step reasoning strategy that requires no external modules, coupled with consistency-aware supervised fine-tuning. CoG explicitly models semantic equivalence through structured prompt chains, and lightweight fine-tuning is performed using synthetically generated consistency-annotated data. Under a closed-book question-answering evaluation framework, the fine-tuned model achieves over a two-fold improvement in response consistency across equivalent inputs compared to baselines—while demonstrating strong generalization to unseen datasets. It significantly outperforms direct prompting, template-based responses, and majority voting. This work represents the first approach to enhance LLMs’ intrinsic semantic consistency solely through synergistic prompt engineering and internal fine-tuning.

Technology Category

Application Category

📝 Abstract

Consistency is a fundamental dimension of trustworthiness in Large Language Models (LLMs). For humans to be able to trust LLM-based applications, their outputs should be consistent when prompted with inputs that carry the same meaning or intent. Despite this need, there is no known mechanism to control and guide LLMs to be more consistent at inference time. In this paper, we introduce a novel alignment strategy to maximize semantic consistency in LLM outputs. Our proposal is based on Chain of Guidance (CoG), a multistep prompting technique that generates highly consistent outputs from LLMs. For closed-book question-answering (Q&A) tasks, when compared to direct prompting, the outputs generated using CoG show improved consistency. While other approaches like template-based responses and majority voting may offer alternative paths to consistency, our work focuses on exploring the potential of guided prompting. We use synthetic data sets comprised of consistent input-output pairs to fine-tune LLMs to produce consistent and correct outputs. Our fine-tuned models are more than twice as consistent compared to base models and show strong generalization capabilities by producing consistent outputs over datasets not used in the fine-tuning process.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM output consistency

Introducing Chain of Guidance method

Improving trust in LLM applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain of Guidance technique

Synthetic data fine-tuning

Multi-step prompting strategy

🔎 Similar Papers

No similar papers found.