🤖 AI Summary
Existing local learning methods are predominantly confined to image classification, exhibiting limited adaptability to diverse vision tasks such as object detection and super-resolution. Key limitations include insufficient cross-scale feature communication and poor knowledge transfer across tasks. This work pioneers the extension of *supervised* local learning beyond classification to general vision tasks. We propose a Memory-Augmented Auxiliary Network (MAN), integrating a long-term feature bank with a lightweight cross-scale fusion mechanism. MAN enables multi-task compatibility while substantially reducing GPU memory consumption. Evaluated on multiple standard benchmarks—including COCO, PASCAL VOC, and DIV2K—our approach achieves performance on par with end-to-end models, yet with significantly lower memory overhead. These results empirically validate the scalability and effectiveness of the local learning paradigm for broad vision applications.
📝 Abstract
Local learning offers an alternative to traditional end-to-end back-propagation in deep neural networks, significantly reducing GPU memory usage. While local learning has shown promise in image classification tasks, its application to other visual tasks remains limited. This limitation arises primarily from two factors: 1) architectures tailored for classification are often not transferable to other tasks, leading to a lack of reusability of task-specific knowledge; 2) the absence of cross-scale feature communication results in degraded performance in tasks such as object detection and super-resolution. To address these challenges, we propose the Memory-augmented Auxiliary Network (MAN), which introduces a simplified design principle and incorporates a feature bank to enhance cross-task adaptability and communication. This work represents the first successful application of local learning methods beyond classification, demonstrating that MAN not only conserves GPU memory but also achieves performance on par with end-to-end approaches across multiple datasets for various visual tasks.