🤖 AI Summary
Current biometric authentication research largely overlooks the uncertainty inherent in error rate estimation, leading to distorted performance comparisons. To address this, we propose BioQuake—the first framework for quantifying performance uncertainty in multimodal biometric systems—integrating statistical inference with resampling techniques to establish empirical guidelines linking test set size and estimation reliability. We conduct the first systematic reliability assessment across 62 state-of-the-art datasets spanning eight biometric modalities, revealing substantial bias in many reported SOTA results. We release an open-source, interactive BioQuake web tool enabling visualization of error confidence intervals and cross-dataset reliability benchmarking. This work bridges a critical gap in biometric evaluation by introducing principled uncertainty modeling, providing the community with reproducible, standardized reliability metrics for robust performance assessment.
📝 Abstract
Biometric authentication is increasingly popular for its convenience and accuracy. However, while recent advancements focus on reducing errors and expanding modalities, the reliability of reported performance metrics often remains overlooked. Understanding reliability is critical, as it communicates how accurately reported error rates represent a system's actual performance, considering the uncertainty in error-rate estimates from test data. Currently, there is no widely accepted standard for reporting these uncertainties and indeed biometric studies rarely provide reliability estimates, limiting comparability and interpretation. To address this gap, we introduce BioQuake--a measure to estimate uncertainty in biometric verification systems--and empirically validate it on four systems and three datasets. Based on BioQuake, we provide simple guidelines for estimating performance uncertainty and facilitating reliable reporting. Additionally, we apply BioQuake to analyze biometric recognition performance on 62 biometric datasets used in research across eight modalities: face, fingerprint, gait, iris, keystroke, eye movement, Electroencephalogram (EEG), and Electrocardiogram (ECG). Our analysis shows that reported state-of-the-art performance often deviates significantly from actual error rates, potentially leading to inaccurate conclusions. To support researchers and foster the development of more reliable biometric systems and datasets, we release BioQuake as an easy-to-use web tool for reliability calculations.