🤖 AI Summary
This study addresses a critical gap in the evaluation of large language models (LLMs), which has predominantly focused on task-oriented NLP benchmarks while neglecting their interpretive validity, cultural situatedness, and role as knowledge mediators in the humanities and social sciences. To bridge this gap, the authors propose the first multilingual LLM evaluation framework specifically designed for these disciplines, integrating hermeneutics, philosophy of technology, and computational social science. The framework emphasizes cultural alignment, cross-lingual stability, reasoning fidelity, and transparency. Its efficacy is demonstrated through multilingual political discourse analysis, offering both theoretical grounding and methodological foundations for the responsible integration of LLMs into computational infrastructures supporting humanities and social science research.
📝 Abstract
Large language models have rapidly evolved in multilingual competence and reasoning capacity, enabling their integration into Social Sciences and Humanities research workflows. Yet existing evaluation paradigms remain anchored in task-based NLP benchmarks and fail to address interpretive validity, cultural situatedness, and epistemic mediation. This paper reconceptualizes multilingual reasoning LLMs as hermeneutic instruments that actively structure meaning production across linguistic and cultural contexts. Drawing on hermeneutics, philosophy of technology, science and technology studies, multilingual NLP research, and computational social science methodology, we develop a theoretically grounded framework for evaluating multilingual reasoning in Social Sciences and Humanities (SSH) research. We articulate a rigorous experimental protocol with operationalized metrics for cultural alignment, cross-lingual stability, and reasoning faithfulness, along with transparency requirements tailored to interpretive research tasks. We illustrate the framework through a concrete application scenario involving multilingual political discourse analysis. The paper contributes a conceptual and methodological foundation for responsible integration of multilingual reasoning LLMs into computational social science infrastructures.