OpenStaxQA: A multilingual dataset based on open-source college textbooks

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Existing higher-education evaluation benchmarks lack multilingual, open-university-textbook-based benchmarks for large language models (LLMs). Method: We introduce UniEdQA—the first open, textbook-driven, multilingual educational question-answering benchmark covering English, Spanish, and Polish—systematically achieving cross-lingual textbook alignment, question generation, and answer annotation. We employ QLoRA for efficient fine-tuning of a 7B-parameter LLM and conduct cross-lingual zero-shot transfer experiments. Contribution/Results: (1) UniEdQA fills a critical gap in multilingual educational QA evaluation datasets; (2) It empirically validates LLMs’ comprehension of university-level domain knowledge and their cross-lingual generalization capability; (3) It demonstrates improved zero-shot transfer performance on the AI2 Reasoning Challenge, establishing a new paradigm for evaluating and adapting LLMs in educational contexts.

Technology Category

Application Category

📝 Abstract

We present OpenStaxQA, an evaluation benchmark specific to college-level educational applications based on 43 open-source college textbooks in English, Spanish, and Polish, available under a permissive Creative Commons license. We finetune and evaluate large language models (LLMs) with approximately 7 billion parameters on this dataset using quantized low rank adapters (QLoRa). Additionally we also perform a zero-shot evaluation on the AI2 reasoning challenge dev dataset in order to check if OpenStaxQA can lead to an improved performance on other tasks. We also discuss broader impacts relevant to datasets such as OpenStaxQA.

Problem

Research questions and friction points this paper is trying to address.

Creating multilingual educational QA dataset from open-source textbooks

Evaluating large language models on college-level academic content

Assessing cross-task generalization using quantized adapter techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Finetunes 7B LLMs with quantized low rank adapters

Uses multilingual textbook dataset for evaluation

Performs zero-shot evaluation on external benchmarks

🔎 Similar Papers

No similar papers found.