🤖 AI Summary
This study addresses the challenge of improving consistency in lunar crater identification and enhancing the reliability of inferences about Solar System history by proposing a Bayesian nonparametric clustering method that integrates annotations from multiple experts. The approach innovatively incorporates "dysfunctional family" constraints into the Chinese Restaurant Process (CRP) to construct a Dysfunctional Family CRP (DFCRP) model, which naturally embeds annotator information and quantifies clustering uncertainty. Through Gibbs sampling and hyperparameter optimization, DFCRP demonstrates significantly superior performance over the standard CRP in simulated experiments. The method has been successfully applied to real lunar crater data, enabling robust statistical inference based on posterior cluster assignments.
📝 Abstract
Summaries of craters on terrestrial bodies, such as the number and size distribution, are essential for understanding the history of the Solar System. Identifying craters, however, has not been automated and thus relies on expert crater-counters marking static images. Robbins et al. (2014) (hereafter R14) showed that, contrary to previously held assumptions, there exists large variability across expert crater-counters' identified crater lists. How best to combine identified crater lists across multiple experts for the purposes of learning about the Solar System is an open and consequential question. R14 combined identified crater lists via clustering through a modification of the popular DBSCAN clustering method. Their approach did not, however, make use of all the constraining information available nor did it provide an estimate of clustering uncertainty. To address the shortcomings of the DBSCAN method, we present a novel clustering approach that can combine multiple lists of identified objects of interest from the same image. The key innovation is incorporating a dysfunctional family constraint into the Bayesian nonparametric clustering approach, the Chinese restaurant process (CRP), which naturally takes into account information about the crater identifier. The dysfunctional family Chinese restaurant process (DFCRP) provides an estimate of clustering uncertainty. In this work, we provide guidance on hyperparameter specification, present a Gibbs sampler, and perform a simulation study to compare the performance of the DFCRP to the CRP. Finally, we apply the DFCRP to the crater identification problem of R14, comparing results, and also demonstrate the types of analyses that can be performed with posterior draws of cluster assignments.