🤖 AI Summary
Traditional information theory lacks formal characterization of semantic fidelity, hindering the deployment of generative AI in multimedia communication. To address this, we propose the first semantic information-theoretic framework tailored for multimedia communication, rigorously defining semantic entropy, semantic mutual information, semantic channel capacity, and semantic rate-distortion functions—thereby bridging the theoretical gap beyond syntactic-level models. Methodologically, we integrate diffusion models with Transformer architectures, jointly optimizing information-theoretic modeling and semantic encoding to realize end-to-end, semantic-aware communication. Our framework provides a quantifiable and optimizable theoretical foundation for generative AI–driven multimedia systems, enabling a paradigm shift from syntactic transmission to semantic delivery. It further identifies key technical pathways—including semantic compression, controllable generation, and cross-modal alignment—paving the way for next-generation intelligent communication systems.
📝 Abstract
Recent breakthroughs in generative artificial intelligence (AI) are transforming multimedia communication. This paper systematically reviews key recent advancements across generative AI for multimedia communication, emphasizing transformative models like diffusion and transformers. However, conventional information-theoretic frameworks fail to address semantic fidelity, critical to human perception. We propose an innovative semantic information-theoretic framework, introducing semantic entropy, mutual information, channel capacity, and rate-distortion concepts specifically adapted to multimedia applications. This framework redefines multimedia communication from purely syntactic data transmission to semantic information conveyance. We further highlight future opportunities and critical research directions. We chart a path toward robust, efficient, and semantically meaningful multimedia communication systems by bridging generative AI innovations with information theory. This exploratory paper aims to inspire a semantic-first paradigm shift, offering a fresh perspective with significant implications for future multimedia research.