Feature-Based Lie Group Transformer for Real-World Applications

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing unsupervised representation learning methods struggle with high-resolution, background-rich real-world images due to reliance on pixel-level translation modeling and insufficient semantic feature extraction. Method: We propose a semantics-structured group decomposition representation framework. Its core innovation lies in the first integration of Galois-theoretic group decomposition with feature-level Lie group transformations—extending group actions from pixel translations to semantic feature translations—and introducing an object-segmentation-guided, transformation-driven feature grouping mechanism. Contribution/Results: Our approach relaxes the strong conditional independence assumptions inherent in conventional disentangled representation learning. Evaluated on background-contaminated real-object datasets, it demonstrates preserved transformation structure and consistent object recognition performance. It significantly enhances interpretability and generalization capability of unsupervised representations in complex scenes, advancing semantic-aware, geometry-grounded representation learning.

Technology Category

Application Category

📝 Abstract
The main goal of representation learning is to acquire meaningful representations from real-world sensory inputs without supervision. Representation learning explains some aspects of human development. Various neural network (NN) models have been proposed that acquire empirically good representations. However, the formulation of a good representation has not been established. We recently proposed a method for categorizing changes between a pair of sensory inputs. A unique feature of this approach is that transformations between two sensory inputs are learned to satisfy algebraic structural constraints. Conventional representation learning often assumes that disentangled independent feature axes is a good representation; however, we found that such a representation cannot account for conditional independence. To overcome this problem, we proposed a new method using group decomposition in Galois algebra theory. Although this method is promising for defining a more general representation, it assumes pixel-to-pixel translation without feature extraction, and can only process low-resolution images with no background, which prevents real-world application. In this study, we provide a simple method to apply our group decomposition theory to a more realistic scenario by combining feature extraction and object segmentation. We replace pixel translation with feature translation and formulate object segmentation as grouping features under the same transformation. We validated the proposed method on a practical dataset containing both real-world object and background. We believe that our model will lead to a better understanding of human development of object recognition in the real world.
Problem

Research questions and friction points this paper is trying to address.

Develop unsupervised representation learning for real-world sensory inputs
Overcome limitations of disentangled feature axes in representation learning
Apply group decomposition theory to realistic scenarios with feature extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines feature extraction with group decomposition
Replaces pixel translation with feature translation
Formulates object segmentation via transformation grouping
🔎 Similar Papers
No similar papers found.
T
Takayuki Komatsu
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Y
Yoshiyuki Ohmura
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
K
Kayato Nishitsunoi
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Yasuo Kuniyoshi
Yasuo Kuniyoshi
School of Information Science and Technology, The University of Tokyo
Intelligent SystemsEmbodied Artificial IntelligenceDevelopmental Cognitive NeuroscienceComplex Emergent SystemsHumanoid