SG-Blend: Learning an Interpolation Between Improved Swish and GELU for Robust Neural Representations

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing activation functions (e.g., Swish, GELU) suffer from limited generalization and strong domain dependence. To address this, we propose SG-Blend: a learnable, dynamic interpolation activation function that uniquely integrates Symmetrized Swish (SSwish) with GELU. Its core innovations are (i) a novel SSwish variant designed to enhance symmetric, non-monotonic representational capacity, and (ii) an end-to-end differentiable interpolation mechanism that adaptively balances expressive power and gradient stability. SG-Blend incurs zero inference overhead and is plug-and-play. Extensive experiments across diverse NLP and CV benchmarks—including multiple tasks and architectures—demonstrate that SG-Blend consistently outperforms mainstream baselines (e.g., Swish, GELU), achieving significant gains in cross-modal generalization robustness without architectural modifications.

Technology Category

Application Category

📝 Abstract
The design of activation functions remains a pivotal component in optimizing deep neural networks. While prevailing choices like Swish and GELU demonstrate considerable efficacy, they often exhibit domain-specific optima. This work introduces SG-Blend, a novel activation function that blends our proposed SSwish, a first-order symmetric variant of Swish and the established GELU through dynamic interpolation. By adaptively blending these constituent functions via learnable parameters, SG-Blend aims to harness their complementary strengths: SSwish's controlled non-monotonicity and symmetry, and GELU's smooth, probabilistic profile, to achieve a more universally robust balance between model expressivity and gradient stability. We conduct comprehensive empirical evaluations across diverse modalities and architectures, showing performance improvements across all considered natural language and computer vision tasks and models. These results, achieved with negligible computational overhead, underscore SG-Blend's potential as a versatile, drop-in replacement that consistently outperforms strong contemporary baselines. The code is available at https://anonymous.4open.science/r/SGBlend-6CBC.
Problem

Research questions and friction points this paper is trying to address.

Designing a robust activation function for deep neural networks
Combining strengths of Swish and GELU via dynamic interpolation
Improving model expressivity and gradient stability across domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic interpolation between SSwish and GELU
Learnable parameters for adaptive blending
Balances expressivity and gradient stability
🔎 Similar Papers
No similar papers found.