Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization

📅 2026-02-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This study addresses the challenges of automatic speech recognition (ASR) in French clinical dialogues, which typically suffer from high word error rates (often exceeding 30%) and difficulties in speaker diarization. The authors propose a multi-turn alternating post-processing architecture leveraging the Qwen3-Next-80B large language model, featuring a dual-channel iterative mechanism that jointly optimizes speaker attribution and word recognition. The system design encompasses model selection, prompting strategies, processing order, and iteration depth. Evaluated on suicide intervention hotline recordings, the approach significantly reduces the Word Diarization Error Rate (WDER) with statistical significance (p<0.05), while demonstrating robust performance—without output failures—on awake neurosurgical consultation data. With a real-time factor of 0.32, the method exhibits strong potential for offline clinical deployment.
📝 Abstract
Automatic speech recognition for French medical conversations remains challenging, with word error rates often exceeding 30% in spontaneous clinical speech. This study proposes a multi-pass LLM post-processing architecture alternating between Speaker Recognition and Word Recognition passes to improve transcription accuracy and speaker attribution. Ablation studies on two French clinical datasets (suicide prevention telephone counseling and preoperative awake neurosurgery consultations) investigate four design choices: model selection, prompting strategy, pass ordering, and iteration depth. Using Qwen3-Next-80B, Wilcoxon signed-rank tests confirm significant WDER reductions on suicide prevention conversations (p<0.05, n=18), while maintaining stability on awake neurosurgery consultations (n=10), with zero output failures and acceptable computational cost (RTF 0.32), suggesting feasibility for offline clinical deployment.
Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition
Speaker Diarization
Clinical Interviews
French Language
Word Error Rate
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based post-processing
speaker diarization
clinical speech transcription
iterative refinement
French medical ASR