Training-Free Dual Hyperbolic Adapters for Better Cross-Modal Reasoning

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models (VLMs) suffer from substantial performance degradation and high fine-tuning overhead when generalizing across domains. To address this, we propose a training-free hyperbolic adapter framework that, for the first time, embeds vision-language semantic hierarchies into the Poincaré ball—a hyperbolic space inherently suited to modeling hierarchical structures and enabling contrastive learning without negative sampling. Our method bypasses conventional parameter-intensive fine-tuning entirely, achieving cross-domain alignment solely through geometric remapping of features in hyperbolic space. Evaluated on few-shot image classification and domain generalization benchmarks, it outperforms state-of-the-art methods with significantly fewer feature dimensions, delivering superior accuracy and robustness. This work establishes a lightweight, modality-agnostic paradigm for cross-modal transfer, offering zero-parameter adaptation while preserving semantic hierarchy and discriminative capacity.

Technology Category

Application Category

📝 Abstract
Recent research in Vision-Language Models (VLMs) has significantly advanced our capabilities in cross-modal reasoning. However, existing methods suffer from performance degradation with domain changes or require substantial computational resources for fine-tuning in new domains. To address this issue, we develop a new adaptation method for large vision-language models, called extit{Training-free Dual Hyperbolic Adapters} (T-DHA). We characterize the vision-language relationship between semantic concepts, which typically has a hierarchical tree structure, in the hyperbolic space instead of the traditional Euclidean space. Hyperbolic spaces exhibit exponential volume growth with radius, unlike the polynomial growth in Euclidean space. We find that this unique property is particularly effective for embedding hierarchical data structures using the Poincar'e ball model, achieving significantly improved representation and discrimination power. Coupled with negative learning, it provides more accurate and robust classifications with fewer feature dimensions. Our extensive experimental results on various datasets demonstrate that the T-DHA method significantly outperforms existing state-of-the-art methods in few-shot image recognition and domain generalization tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in cross-modal reasoning with domain changes.
Reduces computational resources needed for fine-tuning in new domains.
Improves representation and classification accuracy in hierarchical data structures.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperbolic space embedding for hierarchical vision-language relationships
Training-free adapters using Poincaré ball model for improved representation
Negative learning coupled with hyperbolic adapters for robust classification
🔎 Similar Papers
No similar papers found.
Y
Yi Zhang
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Chun-Wun Cheng
Chun-Wun Cheng
PhD student, University of Cambridge
Implicit Deep LearningApplied MathematicsGenerative AI
J
Junyi He
Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
K
Ke Yu
Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
Y
Yushun Tang
Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
C
C. Schonlieb
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
Zhihai He
Zhihai He
Southern University of Science and Technology
Deep learningcomputer visionmachine learningsmart cyber-physical systems
Angelica I. Aviles-Rivero
Angelica I. Aviles-Rivero
Yau Mathematical Sciences Center, Tsinghua University
Computational MathematicsApplied MathematicsMachine Learning