From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This study addresses the insufficient domain-specific accuracy of large language models (LLMs) in answering questions from university-level computer science course materials. We propose a multimodal retrieval-augmented generation (RAG) method tailored to small-scale, heterogeneous course resources—including slide images, mathematical formulas, and colloquial lecture transcripts. Unlike conventional text-only RAG or computationally intensive continual pretraining (CPT), our approach jointly models the visual and semantic content of slides, enabling lightweight adaptation directly over raw multimodal instructional data. Experiments on limited course datasets demonstrate that our multimodal RAG significantly improves question-answering accuracy over both text-only RAG and CPT baselines, while requiring substantially lower training overhead. Our key contribution is the first systematic empirical validation of image–text joint retrieval for precise recall of course-specific knowledge—establishing a novel, efficient paradigm for domain-adapting LLMs in professional educational settings.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have advanced rapidly in recent years. One application of LLMs is to support student learning in educational settings. However, prior work has shown that LLMs still struggle to answer questions accurately within university-level computer science courses. In this work, we investigate how incorporating university course materials can enhance LLM performance in this setting. A key challenge lies in leveraging diverse course materials such as lecture slides and transcripts, which differ substantially from typical textual corpora: slides also contain visual elements like images and formulas, while transcripts contain spoken, less structured language. We compare two strategies, Retrieval-Augmented Generation (RAG) and Continual Pre-Training (CPT), to extend LLMs with course-specific knowledge. For lecture slides, we further explore a multi-modal RAG approach, where we present the retrieved content to the generator in image form. Our experiments reveal that, given the relatively small size of university course materials, RAG is more effective and efficient than CPT. Moreover, incorporating slides as images in the multi-modal setting significantly improves performance over text-only retrieval. These findings highlight practical strategies for developing AI assistants that better support learning and teaching, and we hope they inspire similar efforts in other educational contexts.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs to answer university computer science questions accurately

Incorporating diverse course materials like slides and transcripts into LLMs

Comparing strategies to extend LLMs with course-specific knowledge effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used Retrieval-Augmented Generation to enhance LLMs

Applied multi-modal RAG with slides as images

Compared RAG with Continual Pre-Training for courses

🔎 Similar Papers

No similar papers found.