Data2Concept2Text: An Explainable Multilingual Framework for Data Analysis Narration

📅 2025-02-11

🏛️ Electronic Proceedings in Theoretical Computer Science

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing black-box generative models lack transparency, auditability, and accessibility—critical shortcomings for safety-critical domains (e.g., healthcare) and visually impaired users. Method: We propose an interpretable, multilingual data-to-narrative framework grounded in constraint logic programming (Prolog/CLP), introducing the first Concept2Text paradigm wherein ontology-driven conceptual abstractions are deterministically mapped to natural language via a hierarchical rule-rewriting system, enabling semantically equivalent multilingual generation. Contributions/Results: (1) End-to-end traceable rule execution paths; (2) A three-tier equivalence control mechanism ensuring syntactic, semantic, and lexical consistency; (3) Zero-shot extensibility across languages and domains without fine-tuning. Experiments demonstrate high surface-form diversity, strong semantic fidelity, and human-auditable rule chains—achieving an optimal balance between expressiveness and transparency.

Technology Category

Application Category

📝 Abstract

This paper presents a complete explainable system that interprets a set of data, abstracts the underlying features and describes them in a natural language of choice. The system relies on two crucial stages: (i) identifying emerging properties from data and transforming them into abstract concepts, and (ii) converting these concepts into natural language. Despite the impressive natural language generation capabilities demonstrated by Large Language Models, their statistical nature and the intricacy of their internal mechanism still force us to employ these techniques as black boxes, forgoing trustworthiness. Developing an explainable pipeline for data interpretation would allow facilitating its use in safety-critical environments like processing medical information and allowing non-experts and visually impaired people to access narrated information. To this end, we believe that the fields of knowledge representation and automated reasoning research could present a valid alternative. Expanding on prior research that tackled the first stage (i), we focus on the second stage, named Concept2Text. Being explainable, data translation is easily modeled through logic-based rules, once again emphasizing the role of declarative programming in achieving AI explainability. This paper explores a Prolog/CLP-based rewriting system to interpret concepts-articulated in terms of classes and relations, plus common knowledge-derived from a generic ontology, generating natural language text. Its main features include hierarchical tree rewritings, modular multilingual generation, support for equivalent variants across semantic, grammar, and lexical levels, and a transparent rule-based system. We outline the architecture and demonstrate its flexibility through some examples capable of generating numerous diverse and equivalent rewritings based on the input concept.

Problem

Research questions and friction points this paper is trying to address.

Explainable data analysis narration

Convert abstract concepts to natural language

Multilingual, modular, rule-based system

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable data interpretation pipeline

Logic-based rules for data translation

Prolog/CLP-based rewriting system

🔎 Similar Papers

No similar papers found.

Authors to Follow