PathAR: Structure-First Autoregressive Synthesis of Multimodal Pathology Images

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
This work addresses the scarcity of multimodal histopathology data and the challenge that existing generative models struggle to synthesize modality-specific appearances while preserving anatomical structure consistency. To this end, the authors propose a structure-prior autoregressive generative framework that explicitly disentangles structure from appearance, enabling high-quality, modality-conditional image synthesis. The method innovatively integrates a dual-vector-quantized (Dual-VQ) tokenizer with an interleaved autoregressive (IAR) Transformer featuring asymmetric attention visibility, facilitating the generation of spatially aligned image–mask pairs. Experiments demonstrate that the proposed approach outperforms baseline methods in structural consistency, modality fidelity, and sample diversity, significantly improving downstream segmentation performance under data-scarce conditions and extending effectively to organ-level fine-grained variation modeling.
📝 Abstract
Data scarcity in multimodal pathology motivates unified generative models that synthesize modality-specific appearance while preserving anatomically coherent structure. Although modalities differ in appearance statistics, morphological structures such as cellular topology and tissue boundaries are largely preserved across acquisition protocols. However, existing methods often model these factors within a homogeneous token stream, implicitly coupling structure with appearance and weakening structural controllability under modality shifts. To address this, we propose pathology Autorgressive modeling (PathAR), a structure-first autoregressive synthesis framework that explicitly factorizes structure and appearance for modality-label-conditioned pathology generation.PathAR employs a dual vector quantization (Dual-VQ) tokenizer to decompose samples into mask-grounded structure and appearance tokens, and an interleaved autoregressive (IAR) transformer with asymmetric attention visibility to enforce structure-to-appearance dependence. PathAR stabilizes morphology under heterogeneous modality-specific appearances and enables spatially aligned image--mask pair generation. Extensive experiments show that PathAR improves structural consistency and modality fidelity over baselines, maintains sample diversity, supports downstream segmentation in data-scarce regimes, and demonstrates extensibility to finer-grained intra-modality organ-label variation.
Problem

Research questions and friction points this paper is trying to address.

multimodal pathology
data scarcity
structural consistency
appearance synthesis
modality shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

structure-appearance disentanglement
autoregressive synthesis
dual vector quantization
pathology image generation
modality-conditioned generation
Yuan Zhang
Yuan Zhang
Southeast University
Computer visionMedical image analysis
Jiahao Xia
Jiahao Xia
Research Fellow, University of Technology Sydney
Deep Learning
J
Junzhang Huang
Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, Nanjing 210096, China
M
Meng Wang
Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore; and Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore
Feng Chen
Feng Chen
Nanjing Medical University
Biostatistics
Guanyu Yang
Guanyu Yang
School of Computer Science and Engineering, Southeast University
Medical image processingdeep learning
Huazhu Fu
Huazhu Fu
Principal Scientist, IHPC, A*STAR
Medical Image AnalysisAI for HealthcareMedical AITrustworthy AI