Not All Jokes Land: Evaluating Large Language Models Understanding of Workplace Humor

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit a previously overlooked deficiency in humor understanding within professional settings—a critical bottleneck in value alignment. Method: We introduce the first industry-oriented professional humor dataset, comprising humorous utterances annotated with multidimensional appropriateness labels. We propose a context-sensitive appropriateness evaluation framework, integrating human annotation with zero-shot and few-shot automated assessment to systematically benchmark five state-of-the-art LLMs. Results: All models underperform significantly relative to human annotators in judging humor appropriateness (average accuracy deficit of 28.6%), revealing fundamental gaps in modeling implicit workplace contexts—particularly power dynamics, role boundaries, and organizational norms. This work pioneers the integration of humor understanding into professional-domain LLM evaluation, establishing both a novel dimension for value alignment assessment and a foundational benchmark resource for future research.

Technology Category

Application Category

📝 Abstract
With the recent advances in Artificial Intelligence (AI) and Large Language Models (LLMs), the automation of daily tasks, like automatic writing, is getting more and more attention. Hence, efforts have focused on aligning LLMs with human values, yet humor, particularly professional industrial humor used in workplaces, has been largely neglected. To address this, we develop a dataset of professional humor statements along with features that determine the appropriateness of each statement. Our evaluation of five LLMs shows that LLMs often struggle to judge the appropriateness of humor accurately.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' understanding of workplace humor appropriateness
Addressing neglect of professional humor in AI-human value alignment
Assessing LLMs' accuracy in judging context-appropriate humor
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed dataset of professional humor statements
Evaluated five LLMs on humor appropriateness
Identified LLMs' struggles with humor judgment
🔎 Similar Papers
No similar papers found.