Transformer-Based Inpainting for Real-Time 3D Streaming in Sparse Multi-Camera Setups

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of texture incompleteness and surface artifacts in novel view synthesis under sparse multi-camera configurations, where limited viewpoints often result in missing visual details. To this end, the authors propose a post-processing inpainting method tailored for real-time 3D streaming, which leverages a multi-view-aware Transformer architecture to perform texture completion independently after rendering, making it compatible with arbitrarily calibrated multi-camera systems. The approach incorporates spatio-temporal embeddings to ensure inter-frame consistency and features a resolution-agnostic design combined with an adaptive patch selection strategy, achieving high visual fidelity while meeting real-time performance constraints. Experimental results demonstrate that, under identical real-time requirements, the proposed method outperforms existing techniques in both image and video quality metrics, establishing a new state-of-the-art trade-off between quality and speed.

Technology Category

Application Category

📝 Abstract

High-quality 3D streaming from multiple cameras is crucial for immersive experiences in many AR/VR applications. The limited number of views - often due to real-time constraints - leads to missing information and incomplete surfaces in the rendered images. Existing approaches typically rely on simple heuristics for the hole filling, which can result in inconsistencies or visual artifacts. We propose to complete the missing textures using a novel, application-targeted inpainting method independent of the underlying representation as an image-based post-processing step after the novel view rendering. The method is designed as a standalone module compatible with any calibrated multi-camera system. For this we introduce a multi-view aware, transformer-based network architecture using spatio-temporal embeddings to ensure consistency across frames while preserving fine details. Additionally, our resolution-independent design allows adaptation to different camera setups, while an adaptive patch selection strategy balances inference speed and quality, allowing real-time performance. We evaluate our approach against state-of-the-art inpainting techniques under the same real-time constraints and demonstrate that our model achieves the best trade-off between quality and speed, outperforming competitors in both image and video-based metrics.

Problem

Research questions and friction points this paper is trying to address.

3D streaming

multi-camera setup

view synthesis

hole filling

real-time rendering

Innovation

Methods, ideas, or system contributions that make the work stand out.

transformer-based inpainting

multi-view consistency

real-time 3D streaming

spatio-temporal embeddings

adaptive patch selection

🔎 Similar Papers

No similar papers found.

Authors to Follow