$ ext{H}^2$em: Learning Hierarchical Hyperbolic Embeddings for Compositional Zero-Shot Learning

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing compositional zero-shot learning (CZSL) methods neglect both semantic hierarchies (e.g., “apple” → “fruit”) and conceptual hierarchies (e.g., “sliced apple” → “apple”), while Euclidean embeddings—subject to polynomial volume growth—fail to capture the exponential branching inherent in real-world taxonomies, limiting generalization. This work introduces the first dual-level hyperspherical embedding framework for CZSL, jointly modeling states, objects, and their compositions within a hierarchical geometric structure. We propose a dual-level entailment loss, a discriminative alignment loss, and a hyperspherical cross-modal attention mechanism to mitigate hierarchical collapse and enhance fine-grained discrimination. Additionally, entailment cone constraints and hard negative mining enforce geometric consistency. Evaluated on three standard benchmarks under both closed-world and open-world settings, our method achieves state-of-the-art performance, with significant improvements in unseen composition recognition accuracy.

Technology Category

Application Category

📝 Abstract
Compositional zero-shot learning (CZSL) aims to recognize unseen state-object compositions by generalizing from a training set of their primitives (state and object). Current methods often overlook the rich hierarchical structures, such as the semantic hierarchy of primitives (e.g., apple fruit) and the conceptual hierarchy between primitives and compositions (e.g, sliced apple apple). A few recent efforts have shown effectiveness in modeling these hierarchies through loss regularization within Euclidean space. In this paper, we argue that they fail to scale to the large-scale taxonomies required for real-world CZSL: the space's polynomial volume growth in flat geometry cannot match the exponential structure, impairing generalization capacity. To this end, we propose H2em, a new framework that learns Hierarchical Hyperbolic EMbeddings for CZSL. H2em leverages the unique properties of hyperbolic geometry, a space naturally suited for embedding tree-like structures with low distortion. However, a naive hyperbolic mapping may suffer from hierarchical collapse and poor fine-grained discrimination. We further design two learning objectives to structure this space: a Dual-Hierarchical Entailment Loss that uses hyperbolic entailment cones to enforce the predefined hierarchies, and a Discriminative Alignment Loss with hard negative mining to establish a large geodesic distance between semantically similar compositions. Furthermore, we devise Hyperbolic Cross-Modal Attention to realize instance-aware cross-modal infusion within hyperbolic geometry. Extensive ablations on three benchmarks demonstrate that H2em establishes a new state-of-the-art in both closed-world and open-world scenarios. Our codes will be released.
Problem

Research questions and friction points this paper is trying to address.

Learning hyperbolic embeddings for compositional zero-shot learning
Modeling hierarchical structures in state-object compositions
Improving generalization with hyperbolic geometry and loss objectives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperbolic geometry for hierarchical embeddings
Dual-Hierarchical Entailment Loss with hyperbolic cones
Hyperbolic Cross-Modal Attention for instance-aware infusion
🔎 Similar Papers
No similar papers found.
L
Lin Li
HKUST
J
Jiahui Li
Zhejiang University
J
Jiaming Lei
Zhejiang University
J
Jun Xiao
Zhejiang University
Feifei Shao
Feifei Shao
Zhejiang Univiersity
Machine learningcomputer visionweakly supervised learningactive learning
L
Long Chen
HKUST