Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport

📅 2024-10-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
When class-conditional distributions are multimodal, conventional CPCC regularization—relying on class-mean distances—fails to accurately capture semantic structure. To address this, we embed hierarchical class priors into the feature space and replace class-mean distances with Earth Mover’s Distance (EMD), enabling explicit alignment between multimodal class distributions and tree-structured semantic distances. This work introduces the first exact EMD integration into the CPCC framework and proposes Fast FlowTree—a linear-time algorithm for constructing an efficient OT-CPCC approximation family. Evaluated on multiple benchmark datasets, our method achieves state-of-the-art (SOTA) or SOTA-comparable performance. Fast FlowTree attains O(N) time complexity, accelerating computation by over two orders of magnitude compared to exact EMD, thereby significantly enhancing scalability while preserving semantic fidelity.

Technology Category

Application Category

📝 Abstract
To embed structured knowledge within labels into feature representations, prior work [Zeng et al., 2022] proposed to use the Cophenetic Correlation Coefficient (CPCC) as a regularizer during supervised learning. This regularizer calculates pairwise Euclidean distances of class means and aligns them with the corresponding shortest path distances derived from the label hierarchy tree. However, class means may not be good representatives of the class conditional distributions, especially when they are multi-mode in nature. To address this limitation, under the CPCC framework, we propose to use the Earth Mover's Distance (EMD) to measure the pairwise distances among classes in the feature space. We show that our exact EMD method generalizes previous work, and recovers the existing algorithm when class-conditional distributions are Gaussian. To further improve the computational efficiency of our method, we introduce the Optimal Transport-CPCC family by exploring four EMD approximation variants. Our most efficient OT-CPCC variant, the proposed Fast FlowTree algorithm, runs in linear time in the size of the dataset, while maintaining competitive performance across datasets and tasks. The code is available at https://github.com/uiuctml/OTCPCC.
Problem

Research questions and friction points this paper is trying to address.

Improves class representation using Earth Mover's Distance
Generalizes CPCC framework with exact EMD method
Enhances efficiency via Fast FlowTree algorithm
Innovation

Methods, ideas, or system contributions that make the work stand out.

Use Earth Mover's Distance for class distances
Introduce Optimal Transport-CPCC family variants
Propose Fast FlowTree for linear-time efficiency
🔎 Similar Papers
No similar papers found.