🤖 AI Summary
State estimators’ self-assessed uncertainties (e.g., covariance matrices) are often unreliable due to noise or system model mismatch, compromising downstream decision-making safety. To address this, we propose a unified multi-metric credibility assessment framework integrating normalized estimation error squared (NEES), noncredibility index (NUI), negative log-likelihood (NLL), and energy score (ES), augmented by a novel energy-distance-based positional test. Leveraging the asymmetric sensitivity of NLL and ES—NLL to overconfident (underdispersed) covariances and ES to systematic estimation biases—we enable distinguishable diagnosis of distinct model misspecification types. Evaluated across six canonical mismatch scenarios, our framework achieves 80–100% classification accuracy, substantially outperforming single-metric baselines. It provides an interpretable, reproducible paradigm for uncertainty calibration and fault attribution in state estimation.
📝 Abstract
State estimators often provide self-assessed uncertainty metrics, such as covariance matrices, whose reliability is critical for downstream tasks. However, these self-assessments can be misleading due to underlying modeling violations like noise or system model mismatch. This letter addresses the problem of estimator credibility by introducing a unified, multi-metric evaluation framework. We construct a compact credibility portfolio that synergistically combines traditional metrics like the Normalized Estimation Error Squared (NEES) and the Noncredibility Index (NCI) with proper scoring rules, namely the Negative Log-Likelihood (NLL) and the Energy Score (ES). Our key contributions are a novel energy distance-based location test to robustly detect system model misspecification and a method that leverages the asymmetric sensitivities of NLL and ES to distinguish optimism covariance scaling from system bias. Monte Carlo simulations across six distinct credibility scenarios demonstrate that our proposed method achieves high classification accuracy (80-100%), drastically outperforming single-metric baselines which consistently fail to provide a complete and correct diagnosis. This framework provides a practical tool for turning patterns of credibility indicators into actionable diagnoses of model deficiencies.