🤖 AI Summary
Cross-lingual RST treebanks employ incompatible rhetorical relation taxonomies, hindering the development of a unified discourse parser. Method: We propose UniRST—the first end-to-end unified rhetorical structure parsing framework supporting 11 languages and 18 heterogeneous treebanks. Its core innovations include: (1) a Masked-Union training strategy that implicitly aligns disparate relation labels; (2) a multi-head classification mechanism enabling treebank-adaptive prediction under parameter sharing; and (3) data augmentation to enhance generalization for low-resource languages. Results: UniRST outperforms prior methods on 16 of the 18 single-treebank benchmarks. It is the first approach to demonstrate high-performance, multilingual discourse parsing without modifying original relation schemas—thereby validating both the effectiveness and scalability of unified rhetorical parsing across diverse linguistic and annotation paradigms.
📝 Abstract
We introduce UniRST, the first unified RST-style discourse parser capable of handling 18 treebanks in 11 languages without modifying their relation inventories. To overcome inventory incompatibilities, we propose and evaluate two training strategies: Multi-Head, which assigns separate relation classification layer per inventory, and Masked-Union, which enables shared parameter training through selective label masking. We first benchmark monotreebank parsing with a simple yet effective augmentation technique for low-resource settings. We then train a unified model and show that (1) the parameter efficient Masked-Union approach is also the strongest, and (2) UniRST outperforms 16 of 18 mono-treebank baselines, demonstrating the advantages of a single-model, multilingual end-to-end discourse parsing across diverse resources.