🤖 AI Summary
To address the challenges of poor modularity, high latency, and limited scalability in real-time audio labeling tools for IP-based broadcast workflows, this paper proposes a lightweight audio tagging microservice architecture tailored for broadcast applications. The architecture leverages Docker containerization and RESTful API design, integrates pre-trained models (e.g., PANNs), and natively supports SMPTE ST 2110-30 for low-latency real-time audio stream analysis. It introduces a novel pluggable architecture that simultaneously ensures IP network compatibility and end-to-end real-time performance, significantly enhancing system flexibility and vendor interoperability. Experimental evaluation demonstrates an end-to-end latency under 200 ms and noise event detection accuracy exceeding 92% in live news and music broadcasting scenarios. These results validate the architecture’s adaptability and practicality across broadcast workflows—from small-scale production environments to large enterprise deployments.
📝 Abstract
The broadcasting industry is increasingly adopting IP techniques, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in routing signals but with the integration of tools using standard web development techniques. One possible tool could include the use of live audio tagging, which has a number of uses in the production of content. These include from automated closed captioning to identifying unwanted sound events within a scene. In this paper, we describe the process of containerising an audio tagging model into a microservice, a small segregated code module that can be integrated into a multitude of different network setups. The goal is to develop a modular, accessible, and flexible tool capable of seamless deployment into broadcasting workflows of all sizes, from small productions to large corporations. Challenges surrounding latency of the selected audio tagging model and its effect on the usefulness of the end product are discussed.