Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

132K/year

🤖 AI Summary

Cross-lingual RST treebanks employ incompatible rhetorical relation taxonomies, hindering the development of a unified discourse parser. Method: We propose UniRST—the first end-to-end unified rhetorical structure parsing framework supporting 11 languages and 18 heterogeneous treebanks. Its core innovations include: (1) a Masked-Union training strategy that implicitly aligns disparate relation labels; (2) a multi-head classification mechanism enabling treebank-adaptive prediction under parameter sharing; and (3) data augmentation to enhance generalization for low-resource languages. Results: UniRST outperforms prior methods on 16 of the 18 single-treebank benchmarks. It is the first approach to demonstrate high-performance, multilingual discourse parsing without modifying original relation schemas—thereby validating both the effectiveness and scalability of unified rhetorical parsing across diverse linguistic and annotation paradigms.

Technology Category

Application Category

📝 Abstract

We introduce UniRST, the first unified RST-style discourse parser capable of handling 18 treebanks in 11 languages without modifying their relation inventories. To overcome inventory incompatibilities, we propose and evaluate two training strategies: Multi-Head, which assigns separate relation classification layer per inventory, and Masked-Union, which enables shared parameter training through selective label masking. We first benchmark monotreebank parsing with a simple yet effective augmentation technique for low-resource settings. We then train a unified model and show that (1) the parameter efficient Masked-Union approach is also the strongest, and (2) UniRST outperforms 16 of 18 mono-treebank baselines, demonstrating the advantages of a single-model, multilingual end-to-end discourse parsing across diverse resources.

Problem

Research questions and friction points this paper is trying to address.

Unified multilingual RST parser handles 18 treebanks

Resolves inventory incompatibilities across discourse treebanks

Enables single-model parsing for diverse languages and resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified RST parser handles 18 multilingual treebanks

Masked-Union training enables shared parameter learning

Single multilingual model outperforms most mono-treebank baselines

🔎 Similar Papers

Unsupervised Mutual Learning of Discourse Parsing and Topic Segmentation in Dialogue