Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This study addresses the challenge of continuous vocal monitoring for agitation in bipolar disorder on edge devices by disentangling stable speaker identity from dynamic emotional cues. The authors propose MP-IB, a novel framework that formulates mixed-precision quantization as an information bottleneck: FP16 preserves speaker identity, while INT4 efficiently encodes agitation states. By integrating dynamic precision scheduling and multi-scale temporal fusion, MP-IB achieves an 8× information asymmetry in disentanglement without adversarial training. Evaluated on Bridge2AI-Voice, the method attains a correlation coefficient (ρ) of 0.117, significantly outperforming existing approaches. It demonstrates strong zero-shot transferability to CREMA-D with an AUC of 0.817, near-random-level identity leakage, a compact model size of 617 KB, and end-to-end latency of 23.4 ms, enabling real-time deployment under extremely constrained resources.
📝 Abstract
Continuous monitoring of bipolar disorder agitation via voice biomarkers requires disentangling stable speaker traits from volatile affective states on resource-constrained edge devices. We introduce MP-IB, the first framework to treat mixed-precision quantization as an information bottleneck for clinical trait-state separation. The core insight is that numerical precision itself controls capacity: an FP16 trait head (1,024 bits) encodes speaker identity, while an INT4 state head (128 bits) captures agitation, yielding 8x information asymmetry without adversarial training. We augment this with Dynamic Precision Scheduling and Multi-Scale Temporal Fusion. On Bridge2AI-Voice (N=833, 4 sessions/participant, strict speaker-independent CV), MP-IB achieves rho = 0.117 (95\% CI: [0.089, 0.145], p=0.003 vs. chance), outperforming 94M-parameter WavLM-Adapter with in-domain SSL continuation (rho = -0.042), beta VAE disentanglement (rho = 0.089), and hand-crafted prosody (rho = 0.031) by 2.8--15.9 points absolute. Zero-shot transfer to CREMA-D achieves AUC=0.817. Identity leakage is suppressed to near-random (EER=0.42, MIA-AUC=0.52). End-to-end latency is 23.4 ms with a 617 KB footprint, enabling real-time monitoring on sub 20 dollar devices.
Problem

Research questions and friction points this paper is trying to address.

trait-state disentanglement
bipolar agitation detection
on-device learning
voice biomarkers
edge computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

mixed-precision quantization
information bottleneck
trait-state disentanglement
on-device learning
voice biomarkers