🤖 AI Summary
Existing neural field methods rely on sample-level meta-learning, suffering from high memory overhead and poor scalability, while feedforward encoding often imposes modality-specific assumptions that compromise generalization. This work proposes LH-NeF, a novel framework that, for the first time, leverages spatial locality and hierarchical structure as modality-agnostic universal priors. It introduces a feedforward encoder that maps continuous signals into structured tokens, enabling efficient signal reconstruction in a single forward pass without inner-loop optimization. The approach reduces memory consumption by 42× and allows a 133× increase in batch size compared to prior methods. Experiments across diverse tasks—including image reconstruction, 3D shape representation, and climate field modeling—demonstrate significant improvements in both reconstruction fidelity and downstream performance over current baselines.
📝 Abstract
Neural fields parameterize data as functions from coordinates to values, providing a unified framework for representation learning across modalities. Existing approaches are dominated by per-sample meta-learning, which scales poorly due to memory-intensive inner-loop optimization. The natural alternative -- feed-forward encoding -- typically introduces modality-specific assumptions, sacrificing the generality that makes learning with neural fields attractive. We argue that locality and hierarchy are useful priors for learning field representations that can be injected without compromising modality-agnosticism. We propose LH-NeF, a framework to learn general-purpose tokenized representations of continuous signals. A locality-preserving hierarchical encoder maps raw coordinate-value field observations to structured tokens, from which the field is reconstructed during training. By replacing meta-learning's inner loop with a single forward pass, LH-NeF uses 42$\times$ less memory and supports 133$\times$ larger batches than the strongest modality-agnostic baseline. Across images, 3D shapes, and climate fields, our learned representations match or exceed performance of modality-agnostic, modality-specific, and specialized generative neural field baselines on both reconstruction and downstream tasks.