🤖 AI Summary
This study addresses critical challenges in clinical deployment of virtual physician systems—namely, data privacy, ethical compliance, and localizability—by proposing the first fully offline LLM-diffusion model fusion framework. Methodologically, it employs LoRA-efficient fine-tuning of a lightweight LLM to generate structured clinical dialogue, coupled with a latent-space conditional diffusion model for avatar synthesis; audio-visual synchronization is achieved via Mel-spectrogram alignment. All components execute locally without external connectivity, ensuring zero-data-exfiltration, and interface securely with on-premises structured medical databases via isolated APIs. Contributions include: (1) an end-to-end privacy-preserving automated structured clinical interview system; (2) hardware-efficient deployment on low-cost edge devices; and (3) empirical validation of generalization and training stability on both real and synthetic medical dialogue datasets. Experiments demonstrate high system availability and strong clinical applicability potential.
📝 Abstract
Recent advances in large language models made it possible to achieve high conversational performance with substantially reduced computational demands, enabling practical on-site deployment in clinical environments. Such progress allows for local integration of AI systems that uphold strict data protection and patient privacy requirements, yet their secure implementation in medicine necessitates careful consideration of ethical, regulatory, and technical constraints.
In this study, we introduce MedChat, a locally deployable virtual physician framework that integrates an LLM-based medical chatbot with a diffusion-driven avatar for automated and structured anamnesis. The chatbot was fine-tuned using a hybrid corpus of real and synthetically generated medical dialogues, while model efficiency was optimized via Low-Rank Adaptation. A secure and isolated database interface was implemented to ensure complete separation between patient data and the inference process. The avatar component was realized through a conditional diffusion model operating in latent space, trained on researcher video datasets and synchronized with mel-frequency audio features for realistic speech and facial animation.
Unlike existing cloud-based systems, this work demonstrates the feasibility of a fully offline, locally deployable LLM-diffusion framework for clinical anamnesis. The autoencoder and diffusion networks exhibited smooth convergence, and MedChat achieved stable fine-tuning with strong generalization to unseen data. The proposed system thus provides a privacy-preserving, resource-efficient foundation for AI-assisted clinical anamnesis, also in low-cost settings.