🤖 AI Summary
This paper addresses privacy preservation in semantic segmentation by proposing a perception-based encryption and domain adaptation co-design method tailored for Vision Transformers (ViTs). To tackle the challenge of performing accurate segmentation directly on encrypted data, the method embeds a lightweight domain adaptation mechanism into the ViT’s embedding layer, enabling feature alignment between the encrypted and original domains. Consequently, both training and inference operate natively on perceptually encrypted images without decryption. Evaluated on state-of-the-art architectures—including Segmentation Transformer—across Cityscapes and ADE20K, the approach achieves only a 0.3–0.8 percentage-point drop in mIoU compared to unencrypted baselines, while substantially enhancing image content unintelligibility and robustness against reconstruction and recognition attacks. This work is the first to integrate lightweight domain adaptation into the ViT embedding structure to bridge the encrypted–unencrypted domain gap, simultaneously ensuring strong privacy guarantees and high model fidelity.
📝 Abstract
We propose a privacy-preserving semantic-segmentation method for applying perceptual encryption to images used for model training in addition to test images. This method also provides almost the same accuracy as models without any encryption. The above performance is achieved using a domain-adaptation technique on the embedding structure of the Vision Transformer (ViT). The effectiveness of the proposed method was experimentally confirmed in terms of the accuracy of semantic segmentation when using a powerful semantic-segmentation model with ViT called Segmentation Transformer.