SFDLA: Source-Free Document Layout Analysis

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the bottleneck in document layout analysis (DLA) cross-domain transfer—where existing methods require access to source-domain data and target-domain labels, rendering them unsuitable for privacy-sensitive domains (e.g., finance, healthcare)—this paper proposes DLAdapter, the first fully source-free unsupervised domain adaptation framework for DLA. Methodologically, it introduces a novel feature disentanglement mechanism that jointly incorporates geometric priors and semantic context, coupled with pseudo-label refinement and layout-structure consistency constraints to achieve content-structure co-alignment. We establish the first Source-Free DLA (SFDLA) benchmark and release it publicly. On the PubLayNet→DocLayNet transfer task, DLAdapter achieves a 4.21% absolute improvement over the source model and outperforms prior source-free methods by 2.26%. Crucially, it substantially mitigates performance collapse under zero-source-data and zero-target-label conditions, reducing average degradation from −32.64% observed in baselines.

Technology Category

Application Category

📝 Abstract
Document Layout Analysis (DLA) is a fundamental task in document understanding. However, existing DLA and adaptation methods often require access to large-scale source data and target labels. This requirements severely limiting their real-world applicability, particularly in privacy-sensitive and resource-constrained domains, such as financial statements, medical records, and proprietary business documents. According to our observation, directly transferring source-domain fine-tuned models on target domains often results in a significant performance drop (Avg. -32.64%). In this work, we introduce Source-Free Document Layout Analysis (SFDLA), aiming for adapting a pre-trained source DLA models to an unlabeled target domain, without access to any source data. To address this challenge, we establish the first SFDLA benchmark, covering three major DLA datasets for geometric- and content-aware adaptation. Furthermore, we propose Document Layout Analysis Adapter (DLAdapter), a novel framework that is designed to improve source-free adaptation across document domains. Our method achieves a +4.21% improvement over the source-only baseline and a +2.26% gain over existing source-free methods from PubLayNet to DocLayNet. We believe this work will inspire the DLA community to further investigate source-free document understanding. To support future research of the community, the benchmark, models, and code will be publicly available at https://github.com/s3setewe/sfdla-DLAdapter.
Problem

Research questions and friction points this paper is trying to address.

Adapts pre-trained DLA models without source data access
Addresses performance drop in cross-domain document layout analysis
Enhances privacy-sensitive document understanding in resource-limited settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-Free Document Layout Analysis (SFDLA) introduced
DLAdapter improves source-free domain adaptation
First SFDLA benchmark covering three datasets
🔎 Similar Papers
S
Sebastian Tewes
CV:HCI lab, Karlsruhe Institute of Technology, Germany.
Y
Yufan Chen
CV:HCI lab, Karlsruhe Institute of Technology, Germany.
Omar Moured
Omar Moured
Karlsruhe Institue of Technology
Computer VisionVision-Language ModelsDocument AnalysisAssistive Tech
J
Jiaming Zhang
CV:HCI lab, Karlsruhe Institute of Technology, Germany., CVG, ETH Zurich, Switzerland.
Rainer Stiefelhagen
Rainer Stiefelhagen
Karlsruhe Institute of Technology, Karlsruhe, Germany
Computer visionMultimodal interactionAccessibility