🤖 AI Summary
This work addresses the challenge of cross-domain image registration, where systematic intensity discrepancies often invalidate the conventional brightness constancy assumption. To overcome this limitation, the authors propose a scene-appearance disentanglement mechanism that decomposes images into a domain-invariant scene representation and a domain-specific appearance encoding. Cross-domain alignment is achieved through re-rendering, thereby circumventing direct intensity matching. Theoretical analysis establishes sufficient conditions under which scene consistency loss guarantees geometric correspondence, enabling, for the first time, high-precision real-time cross-domain registration. Implemented within a unified end-to-end deep framework termed SAR-Net, the method jointly optimizes scene consistency and domain alignment losses. Evaluated on bidirectional scanning microscope data, it achieves 0.885 SSIM and 0.979 NCC—outperforming the strongest baseline by 3.1×—while running at 77 frames per second.
📝 Abstract
Image registration under domain shift remains a fundamental challenge in computer vision and medical imaging: when source and target images exhibit systematic intensity differences, the brightness constancy assumption underlying conventional registration methods is violated, rendering correspondence estimation ill-posed. We propose SAR-Net, a unified framework that addresses this challenge through principled scene-appearance disentanglement. Our key insight is that observed images can be decomposed into domain-invariant scene representations and domain-specific appearance codes, enabling registration via re-rendering rather than direct intensity matching. We establish theoretical conditions under which this decomposition enables consistent cross-domain alignment (Proposition 1) and prove that our scene consistency loss provides a sufficient condition for geometric correspondence in the shared latent space (Proposition 2). Empirically, we validate SAR-Net on the ANHIR (Automatic Non-rigid Histological Image Registration) challenge benchmark, where multi-stain histopathology images exhibit coupled domain shift from different staining protocols and geometric distortion from tissue preparation. Our method achieves a median relative Target Registration Error (rTRE) of 0.25%, outperforming the state-of-the-art MEVIS method (0.27% rTRE) by 7.4%, with robustness of 99.1%. Code is available at https://github.com/D-ST-Sword/SAR-NET .