What LLMs Think When You Don't Tell Them What to Think About?

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical gap in large language model (LLM) behavioral analysis by moving beyond task-oriented prompting to examine intrinsic generative tendencies under minimally constrained conditions. Using extremely simple and neutral prompts, the authors systematically investigate the spontaneous behaviors of 16 prominent LLMs, employing semantic categorization, content depth assessment, and degeneration detection techniques. The analysis reveals pronounced family-level differences: GPT-OSS models exhibit a strong preference for programming and mathematics, Llama variants lean toward literary content, DeepSeek frequently generates religious texts, and Qwen models often output multiple-choice questions. Furthermore, distinct patterns emerge in technical depth and repetitive degeneration across models. To support reproducibility, the authors release a dataset comprising 256,000 generated samples alongside open-source code.

Technology Category

Application Category

📝 Abstract
Characterizing the behavior of large language models (LLMs) across diverse settings is critical for reliable monitoring and AI safety. However, most existing analyses rely on topic- or task-specific prompts, which can substantially limit what can be observed. In this work, we study what LLMs generate from minimal, topic-neutral inputs and probe their near-unconstrained generative behavior. Despite the absence of explicit topics, model outputs cover a broad semantic space, and surprisingly, each model family exhibits strong and systematic topical preferences. GPT-OSS predominantly generates programming (27.1%) and mathematical content (24.6%), whereas Llama most frequently generates literary content (9.1%). DeepSeek often generates religious content, while Qwen frequently generates multiple-choice questions. Beyond topical preferences, we also observe differences in content specialization and depth: GPT-OSS often generates more technically advanced content (e.g., dynamic programming) compared with other models (e.g., basic Python). Furthermore, we find that the near-unconstrained generation often degenerates into repetitive phrases, revealing interesting behaviors unique to each model family. For instance, degenerate outputs from Llama include multiple URLs pointing to personal Facebook and Instagram accounts. We release the complete dataset of 256,000 samples from 16 LLMs, along with a reproducible codebase.
Problem

Research questions and friction points this paper is trying to address.

large language models
unconstrained generation
topical preferences
generative behavior
AI safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

unconstrained generation
topic-neutral prompts
model behavior characterization
topical preference
LLM degeneration
🔎 Similar Papers
No similar papers found.