🤖 AI Summary
This work addresses the limitations of traditional backpropagation through time (BPTT) in training recurrent neural networks—namely, its computational non-locality, lack of spatial parallelism, and high energy consumption—which hinder efficient modeling of long-range temporal dependencies. The authors propose a novel approach that integrates temporal predictive coding (tPC) with an approximate real-time recurrent learning (RTRL) framework, enabling efficient spatiotemporal credit assignment while preserving local weight updates and parallelizability. For the first time, tPC is successfully scaled to machine translation tasks involving tens of millions of parameters, achieving a test perplexity of 7.62 on a 15-million-parameter model (compared to 7.49 with BPTT). The method demonstrates performance comparable to BPTT across both synthetic and real-world benchmarks, confirming its scalability, effectiveness, and energy efficiency in complex sequential modeling.
📝 Abstract
Predictive Coding (PC) is a biologically-inspired learning framework characterised by local, parallelisable operations, properties that enable energy-efficient implementation on neuromorphic hardware. Despite this, extending PC effectively to recurrent neural networks (RNNs) has been challenging, particularly for tasks involving long-range temporal dependencies. Backpropagation Through Time (BPTT) remains the dominant method for training RNNs, but its non-local computation, lack of spatial parallelism, and requirement to store extensive activation histories results in significant energy consumption. This work introduces a novel method combining Temporal Predictive Coding (tPC) with approximate Real-Time Recurrent Learning (RTRL), enabling effective spatio-temporal credit assignment. Results indicate that the proposed method can closely match the performance of BPTT on both synthetic benchmarks and real-world tasks. On a challenging machine translation task, with a 15-million parameter model, the proposed method achieves a test perplexity of 7.62 (vs. 7.49 for BPTT), marking one of the first applications of tPC to tasks of this scale. These findings demonstrate the potential of this method to learn complex temporal dependencies whilst retaining the local, parallelisable, and flexible properties of the original PC framework, paving the way for more energy-efficient learning systems.