Align to Structure: Aligning Large Language Models with Structural Information

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Large language models often suffer from poor coherence and logical consistency in long-text generation due to the absence of hierarchical planning and structured organization. To address this, we propose a Structural Alignment Framework that explicitly models human discourse structure—grounded in linguistic discourse frameworks and hierarchical discourse motifs—as fine-grained, token-level structural rewards. Our method integrates Proximal Policy Optimization (PPO)-based reinforcement learning, a dual-reward mechanism (combining surface-level readability with global discourse-pattern fidelity), and dense structural representation techniques. Evaluated on argumentative essay generation and long-document summarization, our approach significantly outperforms strong baselines and RLHF-enhanced models, achieving measurable improvements in logical coherence, structural completeness, and rhetorical depth. All code and datasets are publicly released.

Technology Category

Application Category

📝 Abstract

Generating long, coherent text remains a challenge for large language models (LLMs), as they lack hierarchical planning and structured organization in discourse generation. We introduce Structural Alignment, a novel method that aligns LLMs with human-like discourse structures to enhance long-form text generation. By integrating linguistically grounded discourse frameworks into reinforcement learning, our approach guides models to produce coherent and well-organized outputs. We employ a dense reward scheme within a Proximal Policy Optimization framework, assigning fine-grained, token-level rewards based on the discourse distinctiveness relative to human writing. Two complementary reward models are evaluated: the first improves readability by scoring surface-level textual features to provide explicit structuring, while the second reinforces deeper coherence and rhetorical sophistication by analyzing global discourse patterns through hierarchical discourse motifs, outperforming both standard and RLHF-enhanced models in tasks such as essay generation and long-document summarization. All training data and code will be publicly shared at https://github.com/minnesotanlp/struct_align.

Problem

Research questions and friction points this paper is trying to address.

Enhancing long-form text coherence in LLMs

Aligning LLMs with human discourse structures

Improving readability and rhetorical sophistication

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns LLMs with human-like discourse structures

Uses dense reward scheme in PPO framework

Evaluates two complementary reward models

🔎 Similar Papers

Is Free Self-Alignment Possible?