Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates whether the scalability of layer-wise local training methods—such as the Forward-Forward algorithm—is overestimated by synthetic benchmarks. To address this, we introduce the DTG-FF framework, which establishes the first Forward-Forward baseline on large-scale real-world datasets like ImageNet-100 (224×224). Key innovations include dynamic-temperature Goodness scoring, decoupled normalization, and multi-layer fusion, enabling efficient local training without storing full-network activations. Under a unified architecture and fair comparison against backpropagation with deep supervision (BP+DeepSup), our results show that while DTG-FF achieves 91.8% accuracy on CIFAR-10, it lags substantially behind on CIFAR-100 and ImageNet-100, with performance gaps exceeding 5.93 and 25 percentage points, respectively. This reveals that the advantages observed in synthetic tasks with increasing class counts do not translate to realistic image distributions.

📝 Abstract

Forward-Forward (FF) learning [Hinton, 2022] replaces backpropagation with strictly layer-local goodness updates. Recent FF-CNN work has narrowed the gap to BP on 32x32 benchmarks, raising the question of whether layer-local training is becoming a viable alternative at realistic scale. To probe this rigorously, we develop DTG-FF -- dynamic temperature goodness, decoupled normalization, and multi-layer fusion -- as an instrument that sets FF-family state of the art across nine real-data benchmarks (91.8% CIFAR-10 and the first FF baseline at ImageNet-100 224x224), and use it to audit how far layer-local training actually scales. (1) Real-data scaling. Under identical recipe and backbone, an architecture-matched BP-DeepSup baseline beats DTG-FF by 2.40/5.93 pp on CIFAR-10/CIFAR-100, and the gap widens with class count. At 224x224 the same instrument reaches only 49.4% -- the first FF baseline at this scale, versus typical BP above 75% [Tian et al., 2020] -- exposing a real-data ceiling invisible at 32x32. (2) Synthetic vs. real K-conflict. DTG-FF increasingly outperforms BP as class count K grows on synthetic teacher-student tasks, yet on real images the FF-BP gap reverses sign and widens with K. A within-dataset CIFAR-100 coarse vs. fine probe isolates label-hierarchy from image distribution: synthetic K-sweeps confound output dimensionality with fine-grained discrimination difficulty and thereby overstate FF transferability. (3) Systems audit. FF can be implemented without storing depth-wide activations, but on commodity 8 GB hardware standard BP+gradient-accumulation reaches 4.18 GB / 157 imgs/s versus DTG-FF's 7.90 GB / 138 imgs/s, so a memory-based justification for FF at this scale is not supported under fair baselines.

Problem

Research questions and friction points this paper is trying to address.

Forward-Forward learning

layer-local training

real-data scaling

synthetic benchmarks

backpropagation alternative

Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward-Forward learning

layer-local training

real-data benchmarking