🤖 AI Summary
This study addresses the challenge of balancing inference efficiency and accuracy in clickbait headline detection by proposing a lightweight hybrid architecture that integrates OpenAI semantic embeddings with six handcrafted heuristic features. Following dimensionality reduction via PCA, the combined representation is fed into XGBoost, GraphSAGE, and GCN classifiers. The approach achieves substantially reduced inference time with only a marginal decrease in F1 score, while maintaining high ROC-AUC performance—demonstrating robust discriminative capability across varying classification thresholds. The key innovation lies in synergistically combining semantic embeddings, feature engineering, and efficient graph-based models to enable accurate and computationally tractable clickbait identification.
📝 Abstract
We propose a lightweight hybrid approach to clickbait detection that combines OpenAI semantic embeddings with six compact heuristic features capturing stylistic and informational cues. To improve efficiency, embeddings are reduced using PCA and evaluated with XGBoost, GraphSAGE, and GCN classifiers. While the simplified feature design yields slightly lower F1-scores, graph-based models achieve competitive performance with substantially reduced inference time. High ROC--AUC values further indicate strong discrimination capability, supporting reliable detection of clickbait headlines under varying decision thresholds.