🤖 AI Summary
Scientific poster layout analysis has long been hindered by the scarcity of annotated datasets and dedicated models. To address this, we propose the first structured analysis framework for scientific posters. We introduce SciPostLayoutTree, a large-scale dataset comprising 8,000 posters meticulously annotated with reading order and hierarchical parent–child relationships. We further design Layout Tree Decoder, a Transformer-based model that jointly encodes visual features, bounding box coordinates, and semantic class labels to capture both spatial and semantic dependencies; it employs beam search to optimize sequential decoding of tree-structured layouts. Experiments demonstrate that our method significantly outperforms existing baselines in predicting complex spatial relationships, establishing a robust benchmark for poster content understanding. All components—including the dataset, model, and code—are publicly released.
📝 Abstract
Scientific posters play a vital role in academic communication by presenting ideas through visual summaries. Analyzing reading order and parent-child relations of posters is essential for building structure-aware interfaces that facilitate clear and accurate understanding of research content. Despite their prevalence in academic communication, posters remain underexplored in structural analysis research, which has primarily focused on papers. To address this gap, we constructed SciPostLayoutTree, a dataset of approximately 8,000 posters annotated with reading order and parent-child relations. Compared to an existing structural analysis dataset, SciPostLayoutTree contains more instances of spatially challenging relations, including upward, horizontal, and long-distance relations. As a solution to these challenges, we develop Layout Tree Decoder, which incorporates visual features as well as bounding box features including position and category information. The model also uses beam search to predict relations while capturing sequence-level plausibility. Experimental results demonstrate that our model improves the prediction accuracy for spatially challenging relations and establishes a solid baseline for poster structure analysis. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayouttree. The code is also publicly available at https://github.com/omron-sinicx/scipostlayouttree.