How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) genuinely acquire structural knowledge of slang—or merely reproduce surface-level patterns. Method: We introduce the first three-dimensional evaluation framework for slang generation—assessing usage appropriateness, creativity, and informativeness—and systematically compare slang outputs from GPT-4o and Llama-3 against authentic human usages from the Online Slang Dictionary (OSD). Contribution/Results: While models partially mimic morphological regularities, they exhibit significant deficits in semantic fit, contextual constraint adherence, and sociopragmatic appropriateness—revealing a fundamental lack of deep structural representation of slang. This representational gap impairs their performance on out-of-distribution linguistic inference tasks requiring sociolinguistic reasoning. Our findings expose a critical limitation in current LLMs’ modeling of informal language and establish a novel paradigm for evaluating language models’ sociolinguistic competence.

Technology Category

Application Category

📝 Abstract

Slang is a commonly used type of informal language that poses a daunting challenge to NLP systems. Recent advances in large language models (LLMs), however, have made the problem more approachable. While LLM agents are becoming more widely applied to intermediary tasks such as slang detection and slang interpretation, their generalizability and reliability are heavily dependent on whether these models have captured structural knowledge about slang that align well with human attested slang usages. To answer this question, we contribute a systematic comparison between human and machine-generated slang usages. Our evaluative framework focuses on three core aspects: 1) Characteristics of the usages that reflect systematic biases in how machines perceive slang, 2) Creativity reflected by both lexical coinages and word reuses employed by the slang usages, and 3) Informativeness of the slang usages when used as gold-standard examples for model distillation. By comparing human-attested slang usages from the Online Slang Dictionary (OSD) and slang generated by GPT-4o and Llama-3, we find significant biases in how LLMs perceive slang. Our results suggest that while LLMs have captured significant knowledge about the creative aspects of slang, such knowledge does not align with humans sufficiently to enable LLMs for extrapolative tasks such as linguistic analyses.

Problem

Research questions and friction points this paper is trying to address.

Compare human and machine-generated slang usages systematically

Evaluate LLM biases in slang perception and creativity

Assess alignment between AI and human slang knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic comparison of human and machine slang

Evaluative framework assessing creativity and biases

Model distillation using slang as gold-standard examples

🔎 Similar Papers

A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling