🤖 AI Summary
This work challenges the conventional assumption in AI evaluation that intelligence is linear and uniformly distributed across tasks, arguing instead that intelligence may manifest as atypical and inconsistent performance. The paper introduces the concepts of “familiar intelligence” and “alien intelligence,” proposing that AI capabilities are more accurately characterized by domain-specific, nonlinear patterns—excelling in certain tasks while underperforming in others. Drawing on philosophical analysis and cognitive science, the authors reconceptualize general intelligence not as a single measurable metric, but as the capacity to achieve diverse goals across varied environments. This reframing exposes critical limitations in current evaluation paradigms, such as high-performing systems failing on seemingly simple tasks, and offers a novel theoretical foundation and modeling framework for assessing AI capabilities more holistically and realistically.
📝 Abstract
We endorse and expand upon Susan Schneider's critique of the linear model of AI progress and introduce two novel concepts:"familiar intelligence"and"strange intelligence". AI intelligence is likely to be strange intelligence, defying familiar patterns of ability and inability, combining superhuman capacities in some domains with subhuman performance in other domains, and even within domains sometimes combining superhuman insight with surprising errors that few humans would make. We develop and defend a nonlinear model of intelligence on which"general intelligence"is not a unified capacity but instead the ability to achieve a broad range of goals in a broad range of environments, in a manner that defies nonarbitrary reduction to a single linear quantity. We conclude with implications for adversarial testing approaches to evaluating AI capacities. If AI is strange intelligence, we should expect that even the most capable systems will sometimes fail in seemingly obvious tasks. On a nonlinear model of AI intelligence, such errors on their own do not demonstrate a system's lack of outstanding general intelligence. Conversely, excellent performance on one type of task, such as an IQ test, cannot warrant assumptions of broad capacities beyond that task domain.