🤖 AI Summary
This work proposes a safety-compliant multimodal Transformer architecture designed to meet automotive functional safety standards, addressing the lack of fault-tolerant and robust designs in existing Transformer models. By employing independent encoders to map heterogeneous sensor inputs into a shared latent space, the architecture structurally embeds redundancy and diversity at the representation level, enabling continuous operation under modality-level failures. This approach represents the first integration of multimodal foundation models with established automotive functional safety practices, facilitating certifiable autonomous driving systems. The model maintains consistent scene understanding even when certain modalities degrade, thereby offering a viable pathway for deploying Transformers in safety-critical applications.
📝 Abstract
Transformer-based architectures have shown remarkable performance in vision and language tasks but pose unique challenges for safety-critical applications. This paper presents a conceptual framework for integrating Transformers into automotive systems from a safety perspective. We outline how multimodal Foundation Models can leverage sensor diversity and redundancy to improve fault tolerance and robustness. Our proposed architecture combines multiple independent modality-specific encoders that fuse their representations into a shared latent space, supporting fail-operational behavior if one modality degrades. We demonstrate how different input modalities could be fused in order to maintain consistent scene understanding. By structurally embedding redundancy and diversity at the representational level, this approach bridges the gap between modern deep learning and established functional safety practices, paving the way for certifiable AI systems in autonomous driving.