Understanding LLM Behavior in Multi-Target Cross-Lingual Summarization

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Multilingual cross-lingual summarization lags significantly behind monolingual summarization, and the internal mechanisms of large language models (LLMs) in this task remain poorly understood. This work introduces MTXLS, a new benchmark spanning 24 languages, to systematically evaluate both end-to-end and pipeline approaches using prominent LLMs. We propose a hierarchical analysis framework to uncover how these models process cross-lingual summarization internally. Our findings reveal that translation and summarization jointly emerge in deep latent representations rather than occurring in discrete stages. Building on this insight, we design an inference-time activation steering method that leverages latent representations from English summaries to guide generation in target languages. This approach consistently improves summary quality across diverse target languages, demonstrating the effectiveness and generalizability of latent-space guidance for cross-lingual summarization.

📝 Abstract

Multi-target cross-lingual text summarization (MTXLS), which summarizes a source document into multiple target languages, is increasingly important as users consume content in diverse languages, but remains underexplored. To address this gap, we introduce multi-target cross-lingual element-aware (MEA), a new MTXLS benchmark covering 24 target languages. We benchmark end-to-end and pipeline approaches across various LLMs and show that MTXLS performance still substantially lags behind English monolingual summarization. To better understand MTXLS in LLMs, we propose a layer-wise analysis framework for investigating how LLMs internally perform MTXLS. Our analyses suggest that translation and summarization behaviors emerge jointly within later layers rather than as distinctly decomposed stages. Most task-relevant processing occurs within these layers, and errors also tend to arise at similar depths. Motivated by these findings, we introduce an inference-time activation steering method that leverages hidden representations from English summarization to guide MTXLS generation. Experiments show that our method consistently improves MTXLS quality across target languages.

Problem

Research questions and friction points this paper is trying to address.

multi-target cross-lingual summarization

large language models

MTXLS

cross-lingual generation

summarization behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-target cross-lingual summarization

layer-wise analysis

activation steering