ARC Prize 2024: Technical Report

📅 2024-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the ARC-AGI benchmark—the most challenging zero-shot generalization evaluation for artificial general intelligence—aiming to enhance AI’s reasoning capability on entirely novel tasks, with a target accuracy of 85%. Method: We propose the first unified framework integrating deep learning–guided program synthesis with test-time training, jointly modeling structural priors, symbolic induction, and dynamic adaptation. The approach encompasses symbolic reasoning modeling and the development of an open-source, reproducible AGI reasoning framework. Contribution/Results: On a private evaluation set, our method achieves a substantial accuracy improvement—from 33% to 55.5%—demonstrating the efficacy of the proposed paradigm. Furthermore, it has spurred multiple high-quality open-source implementations, establishing a critical methodological foundation toward the 85% target.

Technology Category

Application Category

📝 Abstract
As of December 2024, the ARC-AGI benchmark is five years old and remains unbeaten. We believe it is currently the most important unsolved AI benchmark in the world because it seeks to measure generalization on novel tasks -- the essence of intelligence -- as opposed to skill at tasks that can be prepared for in advance. This year, we launched ARC Prize, a global competition to inspire new ideas and drive open progress towards AGI by reaching a target benchmark score of 85%. As a result, the state-of-the-art score on the ARC-AGI private evaluation set increased from 33% to 55.5%, propelled by several frontier AGI reasoning techniques including deep learning-guided program synthesis and test-time training. In this paper, we survey top approaches, review new open-source implementations, discuss the limitations of the ARC-AGI-1 dataset, and share key insights gained from the competition.
Problem

Research questions and friction points this paper is trying to address.

AI Performance
ARC-AGI Test
Task Handling Ability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Learning Guided Programming
Test-Time Training
ARC-AGI Performance Enhancement
🔎 Similar Papers
No similar papers found.