Building The Ph(ysical)AI Layer Of Machine Intelligence

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

254K/year
🤖 AI Summary
This work addresses the challenge of generalizing foundation models to novel domains in the absence of paired data by proposing a principle-driven modeling paradigm that embeds signal-theoretic principles—such as Fourier decomposition, energy conservation, and symmetry—directly into the network architecture and loss functions. Trained exclusively on radio-frequency data without any fine-tuning, the model achieves cross-modal transfer to audio, image, text, and video tasks. Using a frozen encoder with only 1.99 million parameters, linear probing attains an average accuracy of 77.7% (Top-3: 91.9%) across 15 diverse tasks, with 84.5% on physical tasks and 70.0% on semantic tasks, demonstrating that incorporating physical priors is crucial for efficient cross-modal generalization.
📝 Abstract
Foundation models achieve generalization through massive-scale training on diverse data, but have limitations with transfer to truly unseen domains without paired training data. We propose principle-driven foundation models that encode signal-theoretic principles (Fourier decomposition, energy conservation, symmetry) rather than learn untethered statistical correlations. We hypothesize that domains differ not in fundamental physics, but in learnable transformations in time, frequency, magnitude, or phase. Training exclusively on radio-frequency (RF) data with co-designed architecture and losses incorporating these principles, we achieve cross-modal transfer to audio, images, text, and video using only frozen representations learned from RF data, requiring no fine-tuning of the encoder on target domains. Our 1.99M parameter frozen encoder achieves 77.7% average accuracy (91.9% top-3) across 15 diverse tasks via linear probing, with systematic variation: 84.5 on physically-grounded tasks (speaker recognition, seismology, RF fingerprinting) versus 70.0% on semantic tasks (music genre, language recognition). This reveals that principle-driven and scale-driven approaches offer complementary paths: physical principles enable efficient cross-modal transfer while naturally establishing the boundary between physical and semantic understanding.
Problem

Research questions and friction points this paper is trying to address.

foundation models
cross-modal transfer
generalization
unseen domains
physical principles
Innovation

Methods, ideas, or system contributions that make the work stand out.

principle-driven foundation models
signal-theoretic principles
cross-modal transfer
frozen encoder
physical AI
🔎 Similar Papers
No similar papers found.