CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Neural rendering struggles to simultaneously achieve efficiency, adaptivity, and representational fidelity. To address this, we propose Compressed Light Field Tokens (CLiFT): a compact, variable-length light field token representation enabling multi-fidelity rendering within a single network. Our method integrates a multi-view encoder, latent-space K-means clustering, a token compressor, and a pose-aware adaptive renderer that dynamically adjusts the number of tokens to balance computational cost and reconstruction quality. The key innovation is the first introduction of a scalable light field token mechanism, supporting fine-grained control over model complexity. Evaluated on RealEstate10K and DL3DV, CLiFT achieves state-of-the-art rendering quality while significantly reducing storage and computation overhead—delivering superior overall performance in terms of speed, quality, and memory efficiency.

Technology Category

Application Category

📝 Abstract

This paper proposes a neural rendering approach that represents a scene as "compressed light-field tokens (CLiFTs)", retaining rich appearance and geometric information of a scene. CLiFT enables compute-efficient rendering by compressed tokens, while being capable of changing the number of tokens to represent a scene or render a novel view with one trained network. Concretely, given a set of images, multi-view encoder tokenizes the images with the camera poses. Latent-space K-means selects a reduced set of rays as cluster centroids using the tokens. The multi-view ``condenser'' compresses the information of all the tokens into the centroid tokens to construct CLiFTs. At test time, given a target view and a compute budget (i.e., the number of CLiFTs), the system collects the specified number of nearby tokens and synthesizes a novel view using a compute-adaptive renderer. Extensive experiments on RealEstate10K and DL3DV datasets quantitatively and qualitatively validate our approach, achieving significant data reduction with comparable rendering quality and the highest overall rendering score, while providing trade-offs of data size, rendering quality, and rendering speed.

Problem

Research questions and friction points this paper is trying to address.

Compress scene data into efficient light-field tokens

Adapt rendering compute budget via token selection

Balance data size, quality, and speed trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compressed light-field tokens for efficient rendering

Adaptive token count with one trained network

Latent-space K-means for reduced ray selection

🔎 Similar Papers

Expansive Supervision for Neural Radiance Field