PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether brief, task-oriented user prompts in interactions with large language models contain stable and identifiable identity signals. Grounded in the lexical stability hypothesis, the research demonstrates that identity cues are primarily encoded in surface-level word choices rather than abstract communicative intent, and uncovers a “uniqueness–consistency paradox” in stylistic features. By systematically comparing lexical and semantic representations, extracting stylistic metrics, and evaluating robustness through adversarial perturbations, the authors achieve high-accuracy identity recognition across 20,680 real-world prompts from 1,034 users. This work provides the first empirical validation that user prompts can serve as reliable behavioral biometrics.

📝 Abstract

Authorship attribution research has traditionally focused on long-form, expressive texts; however, interactions with large language models (LLMs) are typically brief and task-driven prompts. This raises a fundamental question: do such prompts contain a stable, author-identifiable, and distinctive signal? We introduce PromptPrint, a systematic study of prompt-based identity, the hypothesis that a user's habitual vocabulary, syntax, and discourse patterns form a learnable behavioral biometric. Using 20,680 real prompts from 1,034 users, we establish three key findings. First, lexical representations significantly outperform semantic encoders, supporting the "lexical stability hypothesis": identity is primarily encoded in surface-level word choice rather than abstract intent. Second, stylometric features exhibit a "uniqueness-consistency paradox": users are highly distinctive across the population, yet behaviorally inconsistent across contexts. Third, adversarial analysis reveals a clear vulnerability spectrum: identity signals are robust to minor lexical perturbations but degrade substantially under semantic paraphrasing. Overall, our results demonstrate strong identification performance at scale, establishing prompt-based identity as a viable behavioral biometric. This work introduces a new perspective on user modeling in LLM interactions, with important implications for security and privacy. Data and code will be released upon the acceptance of our work.

Problem

Research questions and friction points this paper is trying to address.

authorship attribution

behavioral biometrics

large language models

prompt-based identity

stylometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral biometrics

prompt-based identity

lexical stability