🤖 AI Summary
This work proposes an end-to-end framework for automatic movie trailer generation that integrates large language models (LLMs) with multimodal analysis to overcome the inefficiencies of traditional manual editing, which often struggles to produce content that is both narratively coherent and emotionally engaging. For the first time, LLMs are comprehensively leveraged across the entire pipeline—including key scene selection, highlight dialogue extraction, soundtrack generation, and voice-over synthesis—enabling synergistic co-creation across visual, textual, and audio modalities. Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches in terms of narrative tension, visual appeal, and overall viewer experience.
📝 Abstract
Trailers are short promotional videos designed to provide audiences with a glimpse of a movie. The process of creating a trailer typically involves selecting key scenes, dialogues and action sequences from the main content and editing them together in a way that effectively conveys the tone, theme and overall appeal of the movie. This often includes adding music, sound effects, visual effects and text overlays to enhance the impact of the trailer. In this paper, we present a framework exploiting a comprehensive multimodal strategy for automated trailer production. Also, a Large Language Model (LLM) is adopted across various stages of the trailer creation. First, it selects main key visual sequences that are relevant to the movie's core narrative. Then, it extracts the most appealing quotes from the movie, aligning them with the trailer's narrative. Additionally, the LLM assists in creating music backgrounds and voiceovers to enrich the audience's engagement, thus contributing to make a trailer not just a summary of the movie's content but a narrative experience in itself. Results show that our framework generates trailers that are more visually appealing to viewers compared to those produced by previous state-of-the-art competitors.