FT-ARM: Fine-Tuned Agentic Reflection Multimodal Language Model for Pressure Ulcer Severity Classification with Reasoning

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Clinical staging of pressure injuries (stages I–IV) suffers from high subjectivity and subtle visual distinctions, resulting in low inter-rater diagnostic agreement and poor interpretability of existing AI models (e.g., CNNs, ViTs). To address this, we propose a multimodal large language model framework built upon LLaMA-3.2 90B, integrating a tailored vision encoder, fine-grained multimodal instruction tuning, and a proxy-based introspective reasoning mechanism—designed to emulate clinicians’ iterative reflective diagnostic process. Our approach achieves significant improvements in classification accuracy (85% on the PIID dataset, +4 percentage points over CNN baselines) and decision consistency, while generating clinically grounded, natural-language explanations. It maintains real-time inference capability and enhances clinical trustworthiness. The core innovation lies in introducing introspective reasoning into multimodal medical image understanding, thereby unifying high diagnostic accuracy with strong model interpretability.

Technology Category

Application Category

📝 Abstract

Pressure ulcers (PUs) are a serious and prevalent healthcare concern. Accurate classification of PU severity (Stages I-IV) is essential for proper treatment but remains challenging due to subtle visual distinctions and subjective interpretation, leading to variability among clinicians. Prior AI-based approaches using Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) achieved promising accuracy but offered limited interpretability. We present FT-ARM (Fine-Tuned Agentic Reflection Multimodal model), a fine-tuned multimodal large language model (MLLM) with an agentic self-reflection mechanism for pressure ulcer severity classification. Inspired by clinician-style diagnostic reassessment, FT-ARM iteratively refines its predictions by reasoning over visual features and encoded clinical knowledge from text, enhancing both accuracy and consistency. On the publicly available Pressure Injury Image Dataset (PIID), FT-ARM, fine-tuned from LLaMA 3.2 90B, achieved 85% accuracy in classifying PU stages I-IV, surpassing prior CNN-based models by +4%. Unlike earlier CNN/ViT studies that relied solely on offline evaluations, FT-ARM is designed and tested for live inference, reflecting real-time deployment conditions. Furthermore, it produces clinically grounded natural-language explanations, improving interpretability and trust. By integrating fine-tuning and reflective reasoning across multimodal inputs, FT-ARM advances the reliability, transparency, and clinical applicability of automated wound assessment systems, addressing the critical need for consistent and explainable PU staging to support improved patient care.

Problem

Research questions and friction points this paper is trying to address.

Classifying pressure ulcer severity stages with limited interpretability in AI models

Addressing subjective interpretation and variability in clinical PU staging assessments

Improving reliability and transparency of automated wound assessment systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned multimodal model with self-reflection mechanism

Iteratively refines predictions using visual and clinical data

Generates natural-language explanations for clinical interpretability

🔎 Similar Papers

No similar papers found.

Authors to Follow