🤖 AI Summary
This work addresses the lack of formal safety guarantees in existing generative models under hard constraints, particularly in safety-critical applications. The authors propose a safety filtering framework that ensures online safety during the streaming generation process for any pretrained generative model, without requiring retraining or architectural modifications. The approach innovatively constructs a progressively refined “safety tube” that narrows from coarse to fine as generation proceeds, and at each sampling step synthesizes a minimally perturbative feedback control via Control Barrier Functions (CBFs) and convex Quadratic Programming (QP). Experiments demonstrate that the framework achieves 100% constraint satisfaction across diverse tasks—including constrained image generation, physically consistent trajectory sampling, and safe robotic manipulation—while preserving high semantic fidelity.
📝 Abstract
Flow-based generative models, such as diffusion models and flow matching models, have achieved remarkable success in learning complex data distributions. However, a critical gap remains for their deployment in safety-critical domains: the lack of formal guarantees that generated samples will satisfy hard constraints. We address this by proposing a safety filtering framework that acts as an online shield for any pre-trained generative model. Our key insight is to cooperate with the generative process rather than override it. We define a constricting safety tube that is relaxed at the initial noise distribution and progressively tightens to the target safe set at the final data distribution, mirroring the coarse-to-fine structure of the generative process itself. By characterizing this tube via Control Barrier Functions (CBFs), we synthesize a feedback control input through a convex Quadratic Program (QP) at each sampling step. As the tube is loosest when noise is high and intervention is cheapest in terms of control energy, most constraint enforcement occurs when it least disrupts the model's learned structure. We prove that this mechanism guarantees safe sampling while minimizing the distributional shift from the original model at each sampling step, as quantified by the KL divergence. Our framework applies to any pre-trained flow-based generative scheme requiring no retraining or architectural modifications. We validate the approach across constrained image generation, physically-consistent trajectory sampling, and safe robotic manipulation policies, achieving 100% constraint satisfaction while preserving semantic fidelity.