🤖 AI Summary
This study addresses the challenge of unifying the measurement of functional and representational similarity in neural networks by introducing a framework grounded in the notion of “usable information.” Leveraging tools such as conditional mutual information, Centered Kernel Alignment (CKA), and Representational Similarity Analysis (RSA), combined with bidirectional concatenation probing and task-granularity hierarchical modeling, the work elucidates the relationship between these two forms of similarity. The findings reveal that representational similarity is a sufficient but not necessary condition for functional similarity, that functional mappings exhibit asymmetry, and that similarity depends critically on the capacity of the predictor family. This research establishes a theoretical bridge between the two similarity notions, clarifies the conditions under which existing metrics effectively estimate usable information, and demonstrates that similarity on complex tasks can be reduced to that of their coarser-grained subtasks.
📝 Abstract
We present a unified framework for quantifying the similarity between representations through the lens of \textit{usable information}, offering a rigorous theoretical and empirical synthesis across three key dimensions. First, addressing functional similarity, we establish a formal link between stitching performance and conditional mutual information. We further reveal that stitching is inherently asymmetric, demonstrating that robust functional comparison necessitates a bidirectional analysis rather than a unidirectional mapping. Second, concerning representational similarity, we prove that reconstruction-based metrics and standard tools (e.g., CKA, RSA) act as estimators of usable information under specific constraints. Crucially, we show that similarity is relative to the capacity of the predictive family: representations that appear distinct to a rigid observer may be identical to a more expressive one. Third, we demonstrate that representational similarity is sufficient but not necessary for functional similarity. We unify these concepts through a task-granularity hierarchy: similarity on a complex task guarantees similarity on any coarser derivative, establishing representational similarity as the limit of maximum granularity: input reconstruction.