Building The Ph(ysical)AI Layer Of Machine Intelligence

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This work addresses the challenge of generalizing foundation models to novel domains in the absence of paired data by proposing a principle-driven modeling paradigm that embeds signal-theoretic principles—such as Fourier decomposition, energy conservation, and symmetry—directly into the network architecture and loss functions. Trained exclusively on radio-frequency data without any fine-tuning, the model achieves cross-modal transfer to audio, image, text, and video tasks. Using a frozen encoder with only 1.99 million parameters, linear probing attains an average accuracy of 77.7% (Top-3: 91.9%) across 15 diverse tasks, with 84.5% on physical tasks and 70.0% on semantic tasks, demonstrating that incorporating physical priors is crucial for efficient cross-modal generalization.

📝 Abstract

Foundation models achieve generalization through massive-scale training on diverse data, but have limitations with transfer to truly unseen domains without paired training data. We propose principle-driven foundation models that encode signal-theoretic principles (Fourier decomposition, energy conservation, symmetry) rather than learn untethered statistical correlations. We hypothesize that domains differ not in fundamental physics, but in learnable transformations in time, frequency, magnitude, or phase. Training exclusively on radio-frequency (RF) data with co-designed architecture and losses incorporating these principles, we achieve cross-modal transfer to audio, images, text, and video using only frozen representations learned from RF data, requiring no fine-tuning of the encoder on target domains. Our 1.99M parameter frozen encoder achieves 77.7% average accuracy (91.9% top-3) across 15 diverse tasks via linear probing, with systematic variation: 84.5 on physically-grounded tasks (speaker recognition, seismology, RF fingerprinting) versus 70.0% on semantic tasks (music genre, language recognition). This reveals that principle-driven and scale-driven approaches offer complementary paths: physical principles enable efficient cross-modal transfer while naturally establishing the boundary between physical and semantic understanding.

Problem

Research questions and friction points this paper is trying to address.

foundation models

cross-modal transfer

generalization

unseen domains

physical principles

Innovation

Methods, ideas, or system contributions that make the work stand out.

principle-driven foundation models

signal-theoretic principles

cross-modal transfer