Hyperbolic Hierarchical Alignment Reasoning Network for Text-3D Retrieval

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-3D retrieval faces two key challenges: Hierarchical Representation Collapse (HRC) and Redundancy-Induced Significance Dilution (RISD). To address them, this work introduces the first hyperbolic-space-based approach—Spherical Hierarchical Alignment Inference Network (SHARIN). SHARIN employs Lorentzian hyperbolic embeddings to jointly model hierarchical semantics of text and 3D data in a unified geometric space. It enforces fine-grained hierarchical alignment via entailment cones and mitigates RISD through contribution-aware hyperbolic weighted aggregation. Further, it jointly optimizes a hierarchical ranking loss and an instance-level contrastive loss—requiring no additional supervision. Evaluated on the extended T3DR-HIT v2 benchmark (8,935 pairs), SHARIN significantly outperforms state-of-the-art methods, especially in challenging domains such as cultural heritage artifacts and complex indoor scenes, where it markedly improves discrimination against hard negative samples.

Technology Category

Application Category

📝 Abstract
With the daily influx of 3D data on the internet, text-3D retrieval has gained increasing attention. However, current methods face two major challenges: Hierarchy Representation Collapse (HRC) and Redundancy-Induced Saliency Dilution (RISD). HRC compresses abstract-to-specific and whole-to-part hierarchies in Euclidean embeddings, while RISD averages noisy fragments, obscuring critical semantic cues and diminishing the model's ability to distinguish hard negatives. To address these challenges, we introduce the Hyperbolic Hierarchical Alignment Reasoning Network (H$^{2}$ARN) for text-3D retrieval. H$^{2}$ARN embeds both text and 3D data in a Lorentz-model hyperbolic space, where exponential volume growth inherently preserves hierarchical distances. A hierarchical ordering loss constructs a shrinking entailment cone around each text vector, ensuring that the matched 3D instance falls within the cone, while an instance-level contrastive loss jointly enforces separation from non-matching samples. To tackle RISD, we propose a contribution-aware hyperbolic aggregation module that leverages Lorentzian distance to assess the relevance of each local feature and applies contribution-weighted aggregation guided by hyperbolic geometry, enhancing discriminative regions while suppressing redundancy without additional supervision. We also release the expanded T3DR-HIT v2 benchmark, which contains 8,935 text-to-3D pairs, 2.6 times the original size, covering both fine-grained cultural artefacts and complex indoor scenes. Our codes are available at https://github.com/liwrui/H2ARN.
Problem

Research questions and friction points this paper is trying to address.

Addresses Hierarchy Representation Collapse in text-3D retrieval embeddings
Solves Redundancy-Induced Saliency Dilution by suppressing noisy fragments
Enables hierarchical reasoning between text descriptions and 3D objects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding text and 3D data in hyperbolic space
Using hierarchical ordering loss for entailment cones
Applying contribution-aware hyperbolic aggregation module
🔎 Similar Papers
No similar papers found.
Wenrui Li
Wenrui Li
Assistant Professor, University of Connecticut
StatisticsNetwork scienceBiostatistics
Y
Yidan Lu
Harbin Institute of Technology
Y
Yeyu Chai
Harbin Institute of Technology
R
Rui Zhao
Nanyang Technological University
H
Hengyu Man
Harbin Institute of Technology
Xiaopeng Fan
Xiaopeng Fan
Professor, Harbin Institute of Technology
Video/ImageWireless