🤖 AI Summary
This work proposes a lightweight, data-driven document graph representation to overcome the limitations of traditional NLP systems that treat documents as linear sequences and struggle to model long-range dependencies and global structure. The approach automatically constructs a sentence-level graph using a dynamic sliding-window attention mechanism, effectively capturing local and medium-range semantic dependencies while preserving holistic document relationships. This graph is then processed by a Graph Attention Network (GAT) for downstream tasks. Evaluated on document classification benchmarks, the method achieves competitive performance with lower computational overhead and also demonstrates promising results in extractive summarization, highlighting its versatility and efficiency.
📝 Abstract
This paper proposes a data-driven method to automatically construct graph-based document representations. Building upon the recent work of Bugue\~no and de Melo (2025), we leverage the dynamic sliding-window attention module to effectively capture local and mid-range semantic dependencies between sentences, as well as structural relations within documents. Graph Attention Networks (GATs) trained on our learned graphs achieve competitive results on document classification while requiring lower computational resources than previous approaches. We further present an exploratory evaluation of the proposed graph construction method for extractive document summarization, highlighting both its potential and current limitations. The implementation of this project can be found on GitHub.