Activation Function Design Sustains Plasticity in Continual Learning

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

In continual learning, models suffer not only from catastrophic forgetting but also from plasticity degradation—the diminished capacity to adapt to new tasks—yet the role of nonlinear activation functions in this failure mode remains systematically unexplored. This paper identifies that the shape and saturation characteristics of the negative branch of activation functions critically influence plasticity. Building on this insight, we propose two lightweight, architecture-agnostic activations: Smooth-Leaky ReLU and Randomized Smooth-Leaky ReLU. Through rigorous theoretical analysis and cross-paradigm evaluation—including supervised class-incremental classification and non-stationary reinforcement learning on MuJoCo—we validate our approach. Leveraging a novel stress-testing protocol and a dedicated plasticity diagnostic toolkit, we demonstrate that our methods significantly mitigate plasticity loss without introducing additional parameters or requiring task-specific tuning. The improvements are consistent across diverse neural architectures and dynamic task sequences, yielding robust gains in long-term adaptive performance.

Technology Category

Application Category

📝 Abstract

In independent, identically distributed (i.i.d.) training regimes, activation functions have been benchmarked extensively, and their differences often shrink once model size and optimization are tuned. In continual learning, however, the picture is different: beyond catastrophic forgetting, models can progressively lose the ability to adapt (referred to as loss of plasticity) and the role of the non-linearity in this failure mode remains underexplored. We show that activation choice is a primary, architecture-agnostic lever for mitigating plasticity loss. Building on a property-level analysis of negative-branch shape and saturation behavior, we introduce two drop-in nonlinearities (Smooth-Leaky and Randomized Smooth-Leaky) and evaluate them in two complementary settings: (i) supervised class-incremental benchmarks and (ii) reinforcement learning with non-stationary MuJoCo environments designed to induce controlled distribution and dynamics shifts. We also provide a simple stress protocol and diagnostics that link the shape of the activation to the adaptation under change. The takeaway is straightforward: thoughtful activation design offers a lightweight, domain-general way to sustain plasticity in continual learning without extra capacity or task-specific tuning.

Problem

Research questions and friction points this paper is trying to address.

Mitigating plasticity loss in continual learning through activation functions

Exploring activation nonlinearity's role in catastrophic forgetting scenarios

Developing activation functions to sustain adaptation under distribution shifts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Designing activation functions to sustain plasticity

Introducing Smooth-Leaky and Randomized Smooth-Leaky nonlinearities

Providing diagnostics linking activation shape to adaptation

🔎 Similar Papers

No similar papers found.

Anthropic

$500,000—$850,000 USD

San Francisco, CA, USA

IP Validation Engineer - Machine Learning Accelerators