Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenges of simulation-to-reality (sim2real) image translation, where scarce real-world labeled data and the difficulty of modeling structured realism factors hinder existing approaches. The authors propose a neuro-symbolic, zero-shot image translation framework that, for the first time, encodes interpretable attributes—such as lighting and material properties—into an ontological knowledge graph. This graph, in conjunction with a graph neural network and a symbolic planner, guides a pretrained diffusion model to generate photorealistic images. By innovatively integrating structured ontological knowledge into the diffusion process, the method achieves interpretable, data-efficient, and highly generalizable zero-shot transfer. Experiments demonstrate that the proposed approach outperforms current diffusion-based methods across multiple benchmarks, with graph embeddings effectively discriminating between real and synthetic images and substantially enhancing translation fidelity.

Technology Category

Application Category

📝 Abstract

Bridging the simulation-to-reality (sim2real) gap remains challenging as labelled real-world data is scarce. Existing diffusion-based approaches rely on unstructured prompts or statistical alignment, which do not capture the structured factors that make images look real. We introduce Ontology- Guided Diffusion (OGD), a neuro-symbolic zero-shot sim2real image translation framework that represents realism as structured knowledge. OGD decomposes realism into an ontology of interpretable traits -- such as lighting and material properties -- and encodes their relationships in a knowledge graph. From a synthetic image, OGD infers trait activations and uses a graph neural network to produce a global embedding. In parallel, a symbolic planner uses the ontology traits to compute a consistent sequence of visual edits needed to narrow the realism gap. The graph embedding conditions a pretrained instruction-guided diffusion model via cross-attention, while the planned edits are converted into a structured instruction prompt. Across benchmarks, our graph-based embeddings better distinguish real from synthetic imagery than baselines, and OGD outperforms state-of-the-art diffusion methods in sim2real image translations. Overall, OGD shows that explicitly encoding realism structure enables interpretable, data-efficient, and generalisable zero-shot sim2real transfer.

Problem

Research questions and friction points this paper is trying to address.

sim2real

zero-shot

realism gap

visual transfer

ontology

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ontology-Guided Diffusion

zero-shot sim2real

knowledge graph