An NLP-Driven Framework for Curriculum-Labor Market Alignment: Schema-Constrained LLM Extraction, ESCO-Anchored Semantic Matching, and Multi-Dimensional Gap Quantification

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

158K/year
🤖 AI Summary
This study addresses the challenge that existing NLP methods struggle to reliably extract implicit competencies from educational and labor market texts due to the absence of a unified terminology framework and robust credibility assessment. To overcome this, the authors propose a four-stage framework comprising competency formalization via JSON Schema, constrained-prompt dual-model LLM extraction, semantic alignment with the ESCO taxonomy, and a two-tier arbitration mechanism. The approach is rigorously validated through multidimensional evaluation metrics, including Cohen’s kappa (0.79), schema compliance, and document completeness. Applied to computer science curricula at a UAE university, the method extracted 400 competencies, revealing a 25.0% gap in generic skill supply versus demand, compared to only 1.8% in artificial intelligence—demonstrating the framework’s effectiveness, interpretability, and cross-domain applicability.
📝 Abstract
Schema-constrained information extraction from diverse educational and labor-market corpora remains an open challenge in natural language processing because existing pipelines rely primarily on lexical-surface methods that cannot recover implicit competencies, lack grounding in shared taxonomies, and provide no formal measures of extraction reliability or document-level completeness. To address these limitations, this paper proposes a four-stage NLP framework that combines (i) schema-constrained prompting of a two-model frontier-LLM ensemble against a JSON Schema-enforced seven-slot competency formalism, (ii) Sentence-BERT (SBERT) alignment of the extracted records against an eleven-domain ESCO v1.2.1 controlled vocabulary, (iii) a two-tier adjudication protocol that resolves inter-model disagreements, and (iv) a verification mechanism that combines per-slot Cohen's kappa, schema conformance, and document-level completeness audits. The framework is instantiated for a critical application in higher-education quality assurance, namely curriculum-labor market alignment for the ABET-accredited BSc Computer Science program at the United Arab Emirates University. The pipeline extracts 400 competency records from the 85-course 2025-2026 study plan and aligns them, under a five-scope analysis ranging from the computing core to a probability-weighted student trajectory, with 30 job postings (483 requirement clauses) at an SBERT cosine threshold of 0.50. The extractor achieves Cohen's kappa of 0.79 on the skill slot, with 100% schema conformance and 100% document-level completeness. The alignment surfaces interpretable supply-demand gaps of 25.0% in general and transversal skills, 13.8% in algorithms and computational theory, and 12.2% in software engineering and project management, with a near-zero 1.8% gap in artificial intelligence and data science despite 38.6% supply coverage.
Problem

Research questions and friction points this paper is trying to address.

schema-constrained information extraction
competency alignment
labor market demand
curriculum analysis
semantic matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

schema-constrained LLM extraction
ESCO-anchored semantic matching
multi-dimensional gap quantification
competency alignment
document-level completeness
🔎 Similar Papers
No similar papers found.