🤖 AI Summary
Fine-grained domain generalization (FGDG) confronts challenges including small inter-class margins, large intra-class variations, and high sensitivity of subtle discriminative features to distributional shifts. To address these, we propose the first Feature Structuring (FS) paradigm, which disentangles representations into three complementary components: generality, specificity, and confounding factors. This disentanglement is guided by five joint constraints—decorrelation regularization, cross-domain consistency, intra-class specificity preservation, semantic alignment, and prediction calibration—enabling multi-granularity knowledge-guided decomposition and explicit concept-channel matching. The method is architecture-agnostic and seamlessly integrates with mainstream backbone networks. Evaluated on three standard FGDG benchmarks, it achieves an average performance gain of 6.2%, significantly improving fine-grained discriminative robustness under domain shifts. Moreover, FS delivers strong interpretability through its explicit structural decomposition and maintains broad applicability across diverse network architectures.
📝 Abstract
Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distribution data, leveraging structured multi-granularity knowledge that emerges from discerning the commonality and specificity within categories. Likewise, we propose a Feature Structuralized Domain Generalization (FSDG) model, wherein features experience structuralization into common, specific, and confounding segments, harmoniously aligned with their relevant semantic concepts, to elevate performance in FGDG. Specifically, feature structuralization (FS) is accomplished through joint optimization of five constraints: a decorrelation function applied to disentangled segments, three constraints ensuring common feature consistency and specific feature distinctiveness, and a prediction calibration term. By imposing these stipulations, FSDG is prompted to disentangle and align features based on multi-granularity knowledge, facilitating robust subtle distinctions among categories. Extensive experimentation on three benchmarks consistently validates the superiority of FSDG over state-of-the-art counterparts, with an average improvement of 6.2% in FGDG performance. Beyond that, the explainability analysis on explicit concept matching intensity between the shared concepts among categories and the model channels, along with experiments on various mainstream model architectures, substantiates the validity of FS.