🤖 AI Summary
This work addresses the trade-off between the high computational cost of large language models (LLMs) and the limited generalization capability of conventional Transformer-based text classifiers. To this end, we propose LabelFusion: an end-to-end learnable vector-level fusion framework. LabelFusion jointly models embedding outputs from lightweight classifiers (e.g., RoBERTa) and structured-prompt-generated class scores from LLMs (e.g., GPT, Gemini), employing a lightweight FusionMLP to dynamically learn fusion weights and offering a unified AutoFusionClassifier interface. Its core innovation lies in the first-of-its-kind embedding-score co-fusion mechanism, enabling multi-objective optimization of accuracy, inference latency, and deployment cost without compromising efficiency. Experiments on AG News and Reuters-21578 (10-class) demonstrate state-of-the-art accuracy—92.4% and 92.3%, respectively—along with significantly improved cross-domain robustness and practical deployability.
📝 Abstract
LabelFusion is a fusion ensemble for text classification that learns to combine a traditional transformer-based classifier (e.g., RoBERTa) with one or more Large Language Models (LLMs such as OpenAI GPT, Google Gemini, or DeepSeek) to deliver accurate and cost-aware predictions across multi-class and multi-label tasks. The package provides a simple high-level interface (AutoFusionClassifier) that trains the full pipeline end-to-end with minimal configuration, and a flexible API for advanced users. Under the hood, LabelFusion integrates vector signals from both sources by concatenating the ML backbone's embeddings with the LLM-derived per-class scores -- obtained through structured prompt-engineering strategies -- and feeds this joint representation into a compact multi-layer perceptron (FusionMLP) that produces the final prediction. This learned fusion approach captures complementary strengths of LLM reasoning and traditional transformer-based classifiers, yielding robust performance across domains -- achieving 92.4% accuracy on AG News and 92.3% on 10-class Reuters 21578 topic classification -- while enabling practical trade-offs between accuracy, latency, and cost.