🤖 AI Summary
Speech summarization lacks a well-defined task formulation, clear domain boundaries, and systematic survey coverage. Method: This work formally defines the task scope, clarifies its distinctions from and relationships to automatic speech recognition (ASR) and text summarization, traces its technical evolution—from ASR post-processing and cascaded fine-tuning to end-to-end joint modeling—and establishes a cross-task evaluation framework integrating metrics (e.g., ROUGE, BERTScore) and benchmarks (e.g., AMI, ICSI, SummSpeech). Contribution/Results: It presents the first comprehensive, full-stack survey of speech summarization, unifying definitions, methodologies, evaluation protocols, and datasets. This framework provides a principled foundation for algorithm design, benchmark development, and real-world deployment.
📝 Abstract
Speech summarization has become an essential tool for efficiently managing and accessing the growing volume of spoken and audiovisual content. However, despite its increasing importance, speech summarization is still not clearly defined and intersects with several research areas, including speech recognition, text summarization, and specific applications like meeting summarization. This survey not only examines existing datasets and evaluation methodologies, which are crucial for assessing the effectiveness of summarization approaches but also synthesizes recent developments in the field, highlighting the shift from traditional systems to advanced models like fine-tuned cascaded architectures and end-to-end solutions.