Not All Jokes Land: Evaluating Large Language Models Understanding of Workplace Humor

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) exhibit a previously overlooked deficiency in humor understanding within professional settings—a critical bottleneck in value alignment. Method: We introduce the first industry-oriented professional humor dataset, comprising humorous utterances annotated with multidimensional appropriateness labels. We propose a context-sensitive appropriateness evaluation framework, integrating human annotation with zero-shot and few-shot automated assessment to systematically benchmark five state-of-the-art LLMs. Results: All models underperform significantly relative to human annotators in judging humor appropriateness (average accuracy deficit of 28.6%), revealing fundamental gaps in modeling implicit workplace contexts—particularly power dynamics, role boundaries, and organizational norms. This work pioneers the integration of humor understanding into professional-domain LLM evaluation, establishing both a novel dimension for value alignment assessment and a foundational benchmark resource for future research.

Technology Category

Application Category

📝 Abstract

With the recent advances in Artificial Intelligence (AI) and Large Language Models (LLMs), the automation of daily tasks, like automatic writing, is getting more and more attention. Hence, efforts have focused on aligning LLMs with human values, yet humor, particularly professional industrial humor used in workplaces, has been largely neglected. To address this, we develop a dataset of professional humor statements along with features that determine the appropriateness of each statement. Our evaluation of five LLMs shows that LLMs often struggle to judge the appropriateness of humor accurately.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' understanding of workplace humor appropriateness

Addressing neglect of professional humor in AI-human value alignment

Assessing LLMs' accuracy in judging context-appropriate humor

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed dataset of professional humor statements

Evaluated five LLMs on humor appropriateness

Identified LLMs' struggles with humor judgment

🔎 Similar Papers

No similar papers found.

Authors to Follow