An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Disease prediction from symptom descriptions in Arabic social media and online health platforms remains challenging due to linguistic complexity, data sparsity, and limited domain-specific resources. Method: We propose a hierarchical large language model (LLM) ensemble framework that integrates LLM-driven text preprocessing—including summarization, refinement, and Arabic medical named entity recognition—with fine-tuning of Arabic-specific Transformer models (CAMeLBERT, AraBERT, AsafayaBERT). The framework fuses representations from raw text and LLM-enhanced inputs via majority voting. Contribution/Results: Our work introduces a novel three-tier collaborative paradigm—LLM preprocessing, domain-adaptive fine-tuning, and multi-model ensemble—that significantly enhances robustness. Evaluated on Arabic medical text classification, the framework achieves 80.56% accuracy, demonstrating the effectiveness and advancement of multi-source representation fusion and ensemble strategies for disease prediction in low-resource languages.

Technology Category

Application Category

📝 Abstract

Social telehealth has made remarkable progress in healthcare by allowing patients to post symptoms and participate in medical consultations remotely. Users frequently post symptoms on social media and online health platforms, creating a huge repository of medical data that can be leveraged for disease classification. Large language models (LLMs) such as LLAMA3 and GPT-3.5, along with transformer-based models like BERT, have demonstrated strong capabilities in processing complex medical text. In this study, we evaluate three Arabic medical text preprocessing methods such as summarization, refinement, and Named Entity Recognition (NER) before applying fine-tuned Arabic transformer models (CAMeLBERT, AraBERT, and AsafayaBERT). To enhance robustness, we adopt a majority voting ensemble that combines predictions from original and preprocessed text representations. This approach achieved the best classification accuracy of 80.56%, thus showing its effectiveness in leveraging various text representations and model predictions to improve the understanding of medical texts. To the best of our knowledge, this is the first work that integrates LLM-based preprocessing with fine-tuned Arabic transformer models and ensemble learning for disease classification in Arabic social telehealth data.

Problem

Research questions and friction points this paper is trying to address.

Classifying diseases from Arabic social media medical texts

Evaluating preprocessing methods for Arabic medical NLP

Integrating ensemble learning with transformer models for accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-layered LLM framework with preprocessing

Majority voting ensemble for robust predictions

Fine-tuned Arabic transformers for medical classification

🔎 Similar Papers

Large Language Models for Disease Diagnosis: A Scoping Review