High-Performance Self-Supervised Learning by Joint Training of Flow Matching

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Diffusion models in self-supervised learning (SSL) face two key bottlenecks: the trade-off between generative fidelity and discriminative performance, and high computational cost due to iterative sampling. This paper proposes FlowFM, the first framework to introduce a decoupled joint training paradigm that simultaneously optimizes a representation encoder and a conditional flow-matching generator. By modeling a velocity field via flow matching—replacing stochastic diffusion—FlowFM improves semantic discriminability while preserving generation quality and training stability. It further unifies self-supervised contrastive learning with differentiable generative modeling. Evaluated on wearable sensor data, FlowFM achieves 50.4% faster training and outperforms SSL-Wearables across all downstream tasks. At inference, it accelerates up to 51.0× over diffusion baselines while maintaining high-fidelity generation.

Technology Category

Application Category

📝 Abstract

Diffusion models can learn rich representations during data generation, showing potential for Self-Supervised Learning (SSL), but they face a trade-off between generative quality and discriminative performance. Their iterative sampling also incurs substantial computational and energy costs, hindering industrial and edge AI applications. To address these issues, we propose the Flow Matching-based Foundation Model (FlowFM), which jointly trains a representation encoder and a conditional flow matching generator. This decoupled design achieves both high-fidelity generation and effective recognition. By using flow matching to learn a simpler velocity field, FlowFM accelerates and stabilizes training, improving its efficiency for representation learning. Experiments on wearable sensor data show FlowFM reduces training time by 50.4% compared to a diffusion-based approach. On downstream tasks, FlowFM surpassed the state-of-the-art SSL method (SSL-Wearables) on all five datasets while achieving up to a 51.0x inference speedup and maintaining high generative quality. The implementation code is available at https://github.com/Okita-Laboratory/jointOptimizationFlowMatching.

Problem

Research questions and friction points this paper is trying to address.

Addresses trade-off between generative quality and discriminative performance in self-supervised learning

Reduces computational and energy costs of iterative sampling in diffusion models

Improves efficiency and speed for representation learning in industrial and edge AI applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint training of encoder and conditional flow matching generator

Flow matching learns simpler velocity field for efficiency

Decoupled design achieves high-fidelity generation and recognition

🔎 Similar Papers

No similar papers found.

Authors to Follow