🤖 AI Summary
This work addresses the challenge of requiring fully formalized inputs during data ingestion—a bottleneck arising from the gap between natural language and formal semantic models—by proposing a “Semantic Ladder” framework. This approach enables a progressive transformation from textual fragments to higher-order logical models through modular semantic units. By integrating natural language, ontological models, and vector embeddings, the framework constructs a multi-layered, traceable semantic representation system. The resulting semantic knowledge infrastructure is both extensible and interoperable, substantially reducing the burden of semantic parsing while supporting the incremental integration and reasoning over heterogeneous knowledge sources. This provides AI systems with a unified yet flexible semantic foundation.
📝 Abstract
Semantic data and knowledge infrastructures must reconcile two fundamentally different forms of representation: natural language, in which most knowledge is created and communicated, and formal semantic models, which enable machine-actionable integration, interoperability, and reasoning. Bridging this gap remains a central challenge, particularly when full semantic formalization is required at the point of data entry. Here, we introduce the Semantic Ladder, an architectural framework that enables the progressive formalization of data and knowledge. Building on the concept of modular semantic units as identifiable carriers of meaning, the framework organizes representations across levels of increasing semantic explicitness, ranging from natural language text snippets to ontology-based and higher-order logical models. Transformations between levels support semantic enrichment, statement structuring, and logical modelling while preserving semantic continuity and traceability. This approach enables the incremental construction of semantic knowledge spaces, reduces the semantic parsing burden, and supports the integration of heterogeneous representations, including natural language, structured semantic models, and vector-based embeddings. The Semantic Ladder thereby provides a foundation for scalable, interoperable, and AI-ready data and knowledge infrastructures.