🤖 AI Summary
Sampling biases in citizen science data distort the inferred structure of plant–pollinator bipartite networks.
Method: This paper introduces a fairness-aware latent-space representation method—specifically, an HSIC-regularized bipartite variational autoencoder (Bipartite VAE)—that explicitly disentangles latent embeddings from continuous sampling covariates (e.g., observer experience, time, location) during representation learning. It is the first work to incorporate sociological fairness principles into ecological network modeling, leveraging the Hilbert–Schmidt Independence Criterion (HSIC) to enforce statistical independence between latent representations and bias-inducing covariates, thereby enabling unbiased interaction probability estimation.
Results: Evaluated on the Spipoll dataset, the method yields reconstructed networks better aligned with empirically observed ecological interactions, significantly mitigating observer-induced bias and improving the reliability of inferring species co-occurrence patterns and functional relationships.
📝 Abstract
We propose a method to represent bipartite networks using graph embeddings tailored to tackle the challenges of studying ecological networks, such as the ones linking plants and pollinators, where many covariates need to be accounted for, in particular to control for sampling bias. We adapt the variational graph auto-encoder approach to the bipartite case, which enables us to generate embeddings in a latent space where the two sets of nodes are positioned based on their probability of connection. We translate the fairness framework commonly considered in sociology in order to address sampling bias in ecology. By incorporating the Hilbert-Schmidt independence criterion (HSIC) as an additional penalty term in the loss we optimize, we ensure that the structure of the latent space is independent of continuous variables, which are related to the sampling process. Finally, we show how our approach can change our understanding of ecological networks when applied to the Spipoll data set, a citizen science monitoring program of plant-pollinator interactions to which many observers contribute, making it prone to sampling bias.