Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low efficiency of manual grading for handwritten open-ended questions in university STEM courses, this paper proposes the first end-to-end AI-assisted scoring system. The system integrates optical character recognition (OCR) with large language models (LLMs) to achieve high-accuracy transcription of handwritten content, semantics-driven automated scoring, confidence-aware evaluation, and generation of personalized feedback. It pioneers deep LLM integration across the entire grading pipeline—transcription, scoring, calibration, and feedback—enabling instructor-in-the-loop intervention and dynamic alignment with evolving rubrics. Deployed across 20+ universities and covering mathematics, physics, chemistry, and engineering, the system reduces grading time by 65% on average. It achieves a 95.4% agreement rate with human instructors for high-confidence predictions and has processed over 300,000 student responses.

Technology Category

Application Category

📝 Abstract
Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface. Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.
Problem

Research questions and friction points this paper is trying to address.

Automates grading handwritten STEM responses efficiently
Integrates transcription and rubric-aligned scoring using LLMs
Reduces grading time while maintaining high accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-powered platform for handwritten STEM grading
Leverages LLMs for transcription and evaluation
Human-in-the-loop interface for entire grading pipeline
🔎 Similar Papers
No similar papers found.
Yoonseok Yang
Yoonseok Yang
UC Berkeley EECS
NLPCollaborative FilteringRecommender Systems
M
Minjune Kim
Pensieve Inc.
M
Marlon Rondinelli
Pensieve Inc.
K
Keren Shao
Pensieve Inc.