🤖 AI Summary
Medical process mining faces significant barriers—including high technical complexity, lack of standardized methodologies, and scarce training resources—hindering clinicians’ and researchers’ interpretation and application of analytical outputs. To address this, we propose the first multi-LLM generative AI framework for healthcare process mining, integrating PM4Py and bupaR for foundational process discovery, conformance checking, and enhancement, while leveraging OpenRouter to orchestrate diverse LLMs (e.g., Claude, Gemini) for automated process map interpretation, natural language report generation, and result quality assessment. Our key contribution lies in employing LLMs to enhance cross-disciplinary interpretability of process mining outputs, validated across four clinical scenarios using sepsis progression data. Consistency evaluation across LLMs identifies Claude Sonnet-4 and Gemini 2.5-Pro as top-performing models. The framework substantially lowers adoption barriers, advancing medical process analysis toward intelligence, standardization, and clinical accessibility.
📝 Abstract
Process mining has emerged as a powerful analytical technique for understanding complex healthcare workflows. However, its application faces significant barriers, including technical complexity, a lack of standardized approaches, and limited access to practical training resources. We introduce HealthProcessAI, a GenAI framework designed to simplify process mining applications in healthcare and epidemiology by providing a comprehensive wrapper around existing Python (PM4PY) and R (bupaR) libraries. To address unfamiliarity and improve accessibility, the framework integrates multiple Large Language Models (LLMs) for automated process map interpretation and report generation, helping translate technical analyses into outputs that diverse users can readily understand. We validated the framework using sepsis progression data as a proof-of-concept example and compared the outputs of five state-of-the-art LLM models through the OpenRouter platform. To test its functionality, the framework successfully processed sepsis data across four proof-of-concept scenarios, demonstrating robust technical performance and its capability to generate reports through automated LLM analysis. LLM evaluation using five independent LLMs as automated evaluators revealed distinct model strengths: Claude Sonnet-4 and Gemini 2.5-Pro achieved the highest consistency scores (3.79/4.0 and 3.65/4.0) when evaluated by automated LLM assessors. By integrating multiple Large Language Models (LLMs) for automated interpretation and report generation, the framework addresses widespread unfamiliarity with process mining outputs, making them more accessible to clinicians, data scientists, and researchers. This structured analytics and AI-driven interpretation combination represents a novel methodological advance in translating complex process mining results into potentially actionable insights for healthcare applications.