Demystifying ChatGPT: How It Masters Genre Recognition

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates ChatGPT’s zero-shot and few-shot capabilities for movie genre prediction—a challenging multimodal classification task requiring semantic understanding across textual and visual cues. Method: Leveraging audio transcriptions and subtitle text from MovieLens-100K, we design multilingual, multi-label prompts and, for the first time, integrate IMDb poster images via a vision-language model (VLM) to extract fine-grained visual features that augment textual prompts. Our approach combines prompt engineering (zero-shot/few-shot), large language model (LLM) inference, and joint modeling of textual and visual modalities. Contribution/Results: Unfine-tuned ChatGPT significantly outperforms other mainstream LLMs; VLM-augmented prompting further improves multi-genre classification accuracy. This work is the first to demonstrate ChatGPT’s strong generalization capability for genre recognition without task-specific adaptation. We propose a lightweight, scalable text–vision collaborative prompting framework, establishing a novel paradigm for multimodal content understanding.

Technology Category

Application Category

📝 Abstract
The introduction of ChatGPT has garnered significant attention within the NLP community and beyond. Previous studies have demonstrated ChatGPT's substantial advancements across various downstream NLP tasks, highlighting its adaptability and potential to revolutionize language-related applications. However, its capabilities and limitations in genre prediction remain unclear. This work analyzes three Large Language Models (LLMs) using the MovieLens-100K dataset to assess their genre prediction capabilities. Our findings show that ChatGPT, without fine-tuning, outperformed other LLMs, and fine-tuned ChatGPT performed best overall. We set up zero-shot and few-shot prompts using audio transcripts/subtitles from movie trailers in the MovieLens-100K dataset, covering 1682 movies of 18 genres, where each movie can have multiple genres. Additionally, we extended our study by extracting IMDb movie posters to utilize a Vision Language Model (VLM) with prompts for poster information. This fine-grained information was used to enhance existing LLM prompts. In conclusion, our study reveals ChatGPT's remarkable genre prediction capabilities, surpassing other language models. The integration of VLM further enhances our findings, showcasing ChatGPT's potential for content-related applications by incorporating visual information from movie posters.
Problem

Research questions and friction points this paper is trying to address.

Assess ChatGPT's genre prediction capabilities in movies
Compare ChatGPT with other LLMs using MovieLens-100K dataset
Enhance genre prediction by integrating visual data from posters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses zero-shot and few-shot prompts for genre prediction
Integrates Vision Language Model with movie poster data
Fine-tunes ChatGPT for superior genre recognition performance
🔎 Similar Papers
No similar papers found.
S
Subham Raj
Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
S
Sriparna Saha
Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
B
Brijraj Singh
Sony Research India
Niranjan Pedanekar
Niranjan Pedanekar
Sony Research India
Media and EntertainmentImage and Video ProcessingNatural Language ProcessingMachine and Deep