Can Large Language Models generalize analogy solving like people can?

📅 2024-11-04
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) possess human-level cross-domain analogical transfer capability—a hallmark of high-order relational reasoning. Method: Drawing on cognitive psychology paradigms, we construct a multi-level analogical reasoning benchmark spanning Latin letters, Greek letters, and abstract symbols. Using zero-shot and few-shot prompting, we systematically evaluate analogical transfer—both near (within-domain) and far (cross-symbol-system)—in children, adults, and state-of-the-art LLMs (e.g., GPT-4, Claude). Contribution/Results: Humans effortlessly generalize across symbol systems, whereas LLMs suffer >40% accuracy drops on Greek-letter and abstract-symbol tasks—substantially underperforming both children and adults. This work provides the first systematic evidence that LLMs exhibit fundamental deficits in abstract relational modeling and brittle cross-symbol generalization in far-transfer analogical reasoning. It introduces a scalable, cognitively inspired benchmark for rigorously assessing higher-order reasoning capabilities in foundation models.

Technology Category

Application Category

📝 Abstract
When we solve an analogy we transfer information from a known context to a new one through abstract rules and relational similarity. In people, the ability to solve analogies such as"body : feet :: table : ?"emerges in childhood, and appears to transfer easily to other domains, such as the visual domain"( : ) ::<: ?". Recent research shows that large language models (LLMs) can solve various forms of analogies. However, can LLMs generalize analogy solving to new domains like people can? To investigate this, we had children, adults, and LLMs solve a series of letter-string analogies (e.g., a b : a c :: j k : ?) in the Latin alphabet, in a near transfer domain (Greek alphabet), and a far transfer domain (list of symbols). As expected, children and adults easily generalized their knowledge to unfamiliar domains, whereas LLMs did not. This key difference between human and AI performance is evidence that these LLMs still struggle with robust human-like analogical transfer.
Problem

Research questions and friction points this paper is trying to address.

Can LLMs generalize analogy solving across domains?
Comparison of human and LLM performance in analogy tasks
Investigation of LLMs' ability in near and far transfer domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs tested on analogy solving across domains
Comparison of human and AI analogy generalization
Letter-string analogies used in Latin, Greek, symbols
🔎 Similar Papers
No similar papers found.
C
Claire E. Stevenson
Psychological Methods, University of Amsterdam, the Netherlands
A
Alexandra Pafford
Psychological Methods, University of Amsterdam, the Netherlands
H
Han L. J. van der Maas
Psychological Methods, University of Amsterdam, the Netherlands
Melanie Mitchell
Melanie Mitchell
Professor, Santa Fe Institute
AICognitive ScienceComplex Systems