🤖 AI Summary
This paper challenges the longstanding misconception in large language model (LLM) research that prompting is merely “alchemy” or a pragmatic stopgap. We propose a foundational reframing: treating LLMs as complex, emergent organisms shaped by training, with prompting serving as the core experimental paradigm of their behavioral science. Methodologically, leveraging natural language as the native interaction interface, we systematically integrate few-shot learning, chain-of-thought reasoning, and constitutional AI to reframe prompt engineering as a reproducible, interpretable behavioral probing methodology. Our principal contribution is the establishment of the “prompting-as-scientific-inquiry” paradigm—endowing prompting with theoretical legitimacy and methodological rigor. This yields the first unified framework for systematic characterization, empirical validation, and formal modeling of LLM capabilities, substantially enhancing their interpretability and scientific standing.
📝 Abstract
Prompting is the primary method by which we study and control large language models. It is also one of the most powerful: nearly every major capability attributed to LLMs-few-shot learning, chain-of-thought, constitutional AI-was first unlocked through prompting. Yet prompting is rarely treated as science and is frequently frowned upon as alchemy. We argue that this is a category error. If we treat LLMs as a new kind of complex and opaque organism that is trained rather than programmed, then prompting is not a workaround: it is behavioral science. Mechanistic interpretability peers into the neural substrate, prompting probes the model in its native interface: language. We contend that prompting is not inferior, but rather a key component in the science of LLMs.