AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

133K/year

🤖 AI Summary

Standard NLP models struggle to identify implicit cultural capital (CC) themes—such as aspirational goals and familial support—in STEM education narratives, primarily because they process sentences in isolation and lack domain-specific contextual understanding; these themes often manifest across sentence boundaries rather than through explicit keywords. To address this, we propose a “triple-aware” framework: domain-aware (incorporating STEM-adapted vocabulary), context-aware (modeling document-level sentence dependencies), and overlap-aware (capturing multi-label co-occurrence structures), implemented as an end-to-end Transformer-based multi-label classifier. Evaluated on real-world student reflection data, our model achieves a 2.1-percentage-point improvement in Macro-F1 over the strongest baseline, with consistent gains across all CC themes and strong generalization. This work constitutes the first systematic modeling of the deep semantic structure of cultural capital in educational narratives, establishing an interpretable NLP paradigm for equitable education assessment.

Technology Category

Application Category

📝 Abstract

Identifying cultural capital (CC) themes in student reflections can offer valuable insights that help foster equitable learning environments in classrooms. However, themes such as aspirational goals or family support are often woven into narratives, rather than appearing as direct keywords. This makes them difficult to detect for standard NLP models that process sentences in isolation. The core challenge stems from a lack of awareness, as standard models are pre-trained on general corpora, leaving them blind to the domain-specific language and narrative context inherent to the data. To address this, we introduce AWARE, a framework that systematically attempts to improve a transformer model's awareness for this nuanced task. AWARE has three core components: 1) Domain Awareness, adapting the model's vocabulary to the linguistic style of student reflections; 2) Context Awareness, generating sentence embeddings that are aware of the full essay context; and 3) Class Overlap Awareness, employing a multi-label strategy to recognize the coexistence of themes in a single sentence. Our results show that by making the model explicitly aware of the properties of the input, AWARE outperforms a strong baseline by 2.1 percentage points in Macro-F1 and shows considerable improvements across all themes. This work provides a robust and generalizable methodology for any text classification task in which meaning depends on the context of the narrative.

Problem

Research questions and friction points this paper is trying to address.

Detecting cultural capital themes in student STEM narratives

Addressing limitations of sentence-level NLP for contextual meaning

Improving model awareness of domain-specific language and context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts vocabulary to domain-specific linguistic styles

Generates context-aware embeddings using full essay

Employs multi-label strategy for overlapping theme recognition

🔎 Similar Papers

Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art