A Jensen-Shannon divergence based $k$--$NN$ algorithm for missing value imputation in compositional data

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study addresses the challenge of missing data imputation in compositional datasets containing zeros by proposing a nonparametric approach that avoids distributional assumptions. The method constructs k-nearest neighbors based on Jensen–Shannon divergence and performs imputation using the Fréchet mean, while incorporating an adaptive hyperparameter mechanism to accommodate diverse missingness patterns. Notably, this work is the first to jointly leverage Jensen–Shannon divergence and the Fréchet mean for compositional data analysis, offering a natural handling of zero values without imposing strong parametric constraints. Experimental results across multiple real-data simulation scenarios demonstrate that the proposed method consistently achieves higher imputation accuracy and computational efficiency compared to existing approaches.

📝 Abstract

A novel nonparametric method to impute missing values in compositional data is developed. The method is based on the $k$--$NN$ algorithm, utilizes the Jensen-Shannon divergence and employs the Fr{é}chet mean to allow for more flexibility in the estimation process. As an extra feature, the hyper-parameters can be self-adaptive according to the pattern of missing values. Unlike restrictive parametric models, the proposed method makes no assumption about the structure of the data and, most importantly, it is applicable even when compositional data contain zero values. Through simulation studies using real data, it is shown that the proposed algorithm outperforms competing algorithms at various settings, not only in terms of accuracy but also in terms of computational efficiency.

Problem

Research questions and friction points this paper is trying to address.

missing value imputation

compositional data

zero values

nonparametric method

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jensen-Shannon divergence

k-NN imputation

compositional data

Fréchet mean