Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of automatically analyzing complex teacher-student interaction behaviors in classroom videos. Methodologically, it proposes a fine-grained behavioral understanding framework grounded in visual question answering (VQA), integrating multimodal VQA models—including LLaMA2/3, Qwen3, and NVILA—with frame-level semantic parsing of classroom videos to infer educationally salient behaviors such as student engagement and interaction types. Its primary contribution is the construction of BAV-Classroom-VQA, the first VQA benchmark dataset derived from authentic Vietnamese classroom recordings, thereby filling a critical gap in the evaluation of vision-language models for educational applications. Experimental results demonstrate strong performance across behavior-oriented VQA tasks, with average accuracy exceeding 78% for all evaluated models. These findings validate the feasibility and generalizability of VQA techniques for classroom analytics and establish a novel paradigm for quantitative, behavior-aware educational process research.

Technology Category

Application Category

📝 Abstract
Classroom behavior monitoring is a critical aspect of educational research, with significant implications for student engagement and learning outcomes. Recent advancements in Visual Question Answering (VQA) models offer promising tools for automatically analyzing complex classroom interactions from video recordings. In this paper, we investigate the applicability of several state-of-the-art open-source VQA models, including LLaMA2, LLaMA3, QWEN3, and NVILA, in the context of classroom behavior analysis. To facilitate rigorous evaluation, we introduce our BAV-Classroom-VQA dataset derived from real-world classroom video recordings at the Banking Academy of Vietnam. We present the methodology for data collection, annotation, and benchmark the performance of the selected VQA models on this dataset. Our initial experimental results demonstrate that all four models achieve promising performance levels in answering behavior-related visual questions, showcasing their potential in future classroom analytics and intervention systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating VQA models for classroom behavior analysis
Introducing BAV-Classroom-VQA dataset for real-world evaluation
Assessing model performance on behavior-related visual questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes state-of-the-art VQA models
Introduces BAV-Classroom-VQA dataset
Benchmarks models on real-world data
S
Sinh Trong Vu
Banking Academy of Vietnam, Hanoi, Vietnam
H
Hieu Trung Pham
Banking Academy of Vietnam, Hanoi, Vietnam
D
Dung Manh Nguyen
Banking Academy of Vietnam, Hanoi, Vietnam
H
Hieu Minh Hoang
Banking Academy of Vietnam, Hanoi, Vietnam
N
Nhu Hoang Le
Banking Academy of Vietnam, Hanoi, Vietnam
T
Thu Ha Pham
Banking Academy of Vietnam, Hanoi, Vietnam
Tai Tan Mai
Tai Tan Mai
Assistant Professor at Dublin City University
Learning AnalyticsProcess MiningComplex SystemsData MiningMachine Learning