To View Transform or Not to View Transform: NeRF-based Pre-training Perspective

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of integrating Neural Radiance Fields (NeRF) with view transformation for 3D perception, where the inherent conflict between discrete rigid representations and continuous adaptive priors leads to ambiguous 3D features and hinders reuse of pretrained NeRF models. To resolve this, the authors propose NeRP3D, the first method to seamlessly embed continuous NeRF representations into an end-to-end 3D detection framework. By adopting a point-based formulation that directly learns a continuous 3D radiance field, NeRP3D circumvents the representational mismatch introduced by conventional view transformations. The approach supports self-supervised pretraining and, on the nuScenes benchmark, simultaneously achieves state-of-the-art performance in both scene reconstruction and 3D object detection—significantly outperforming existing methods while fully preserving and effectively leveraging the pretrained NeRF network throughout the pipeline.
📝 Abstract
Neural radiance fields (NeRFs) have emerged as a prominent pre-training paradigm for vision-centric autonomous driving, which enhances 3D geometry and appearance understanding in a fully self-supervised manner. To apply NeRF-based pretraining to 3D perception models, recent approaches have simply applied NeRFs to volumetric features obtained from view transformation. However, coupling NeRFs with view transformation inherits conflicting priors; view transformation imposes discrete and rigid representations, whereas radiance fields assume continuous and adaptive functions. When these opposing assumptions are forced into a single pipeline, the misalignment surfaces as blurry and ambiguous 3D representations that ultimately limit 3D scene understanding. Moreover, the NeRF network for pre-training is discarded during downstream tasks, resulting in inefficient utilization of enhanced 3D representations through NeRF. In this paper, we propose a novel NeRF-Resembled Point-based 3D detector that can learn continuous 3D representation and thus avoid the misaligned priors from view transformation. NeRP3D preserves the pre-trained NeRF network regardless of the tasks, inheriting the principle of continuous 3D representation learning and leading to greater potentials for both scene reconstruction and detection tasks. Experiments on nuScenes dataset demonstrate that our proposed approach significantly improves previous state-of-the-art methods, outperforming not only pretext scene reconstruction tasks but also downstream detection tasks.
Problem

Research questions and friction points this paper is trying to address.

Neural Radiance Fields
View Transformation
3D Perception
Pre-training
Continuous Representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

NeRF
view transformation
continuous 3D representation
pre-training
3D object detection
🔎 Similar Papers
No similar papers found.
H
Hyeonjun Jeong
KAIST, Daejeon, Korea
J
Juyeb Shin
KAIST, Daejeon, Korea
Dongsuk Kum
Dongsuk Kum
KAIST
Vehicle Dynamics & Control