π€ AI Summary
Addressing the few-shot, open-set, and out-of-distribution challenge of audio environment attribution in criminal investigations, this paper formulates environment identification as a metric learning problemβits first such treatment in forensic acoustics. We propose a contrastive learning-based deep metric learning framework that extracts environment-invariant acoustic features and incorporates a prototype-matching mechanism. This design enables robust generalization to unseen noise types, shifted reverberation characteristics, and varying microphone positions, supporting zero-shot adaptation to novel case scenarios without retraining. Evaluated on a multi-source real-world forensic audio benchmark, our method achieves significantly higher cross-domain recognition accuracy than supervised classification baselines, without requiring case-specific fine-tuning. It thus provides a deployable solution for judicial acoustic provenance analysis under low-quality, unconstrained speech conditions.
π Abstract
Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of a recorded audio to its recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide supervised classification tools for closed-set recording environment identification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, supervised learning techniques are not applicable without retraining a classifier on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining by modeling the task as a few-shot classification problem. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, out-of-distribution reverberation characteristics or recording position mismatches.