Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints

📅 2024-02-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges faced by RNN-based models (e.g., LSTM/GRU) in modeling sparse patterns and long-range dependencies under few-shot time-series settings, this paper proposes SST (Sigmoid–Tanh Squared), a novel activation function. SST composes sigmoid and tanh nonlinearities followed by a squaring operation, thereby significantly enhancing discriminability between strong and weak activations, improving gradient flow, and strengthening information filtering. Crucially, SST maintains full compatibility with standard gating mechanisms—requiring no architectural modifications. Extensive evaluation on low-resource tasks—including sign language recognition, time-series classification, and regression—demonstrates that SST-enhanced LSTM and GRU consistently outperform baseline models, achieving substantial gains in test accuracy. These results validate SST’s effectiveness in boosting nonlinear temporal representation learning under data-scarce conditions.

Technology Category

Application Category

📝 Abstract
Activation functions enable neural networks to learn complex representations by introducing non-linearities. While feedforward models commonly use rectified linear units, sequential models like recurrent neural networks, long short-term memory (LSTMs) and gated recurrent units (GRUs) still rely on Sigmoid and TanH activation functions. However, these classical activation functions often struggle to model sparse patterns when trained on small sequential datasets to effectively capture temporal dependencies. To address this limitation, we propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints. SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time, facilitating improved gradient flow and information filtering. We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks, where the dataset is limited. Our experiments demonstrate that SST models consistently outperform RNN-based models with baseline activations, exhibiting improved test accuracy.
Problem

Research questions and friction points this paper is trying to address.

Enhancing sequential model performance with limited data
Addressing sparse pattern modeling in small sequential datasets
Improving gradient flow in recurrent networks via activation functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

SST activation enhances sequential models under data constraints
Mathematical squaring amplifies activation differences for better gradient flow
SST-powered LSTMs and GRUs outperform baseline models in limited data
🔎 Similar Papers
No similar papers found.
B
B. Subramanian
Kyungpook National University, Daegu, South Korea
R
Rathinaraja Jeyaraj
University of Huston -Victoria, Texas, USA
A
Akhrorjon Akhmadjon Ugli Rakhmonov
Kyungpook National University, Daegu, South Korea
J
Jeonghong Kim
Kyungpook National University, Daegu, South Korea