🤖 AI Summary
This study investigates how many mutually distinguishable synthetic identities can be generated such that each passes face verification under a fixed threshold. The problem is formalized as a constrained embedding distribution on the unit hypersphere and, for the first time, modeled as a spherical coding problem. Under both deterministic and stochastic settings, the authors propose a centralized random model and a prior-constrained random code capacity, respectively. By integrating spherical coding theory, concentration inequalities, and geometric analysis of embeddings, they derive achievable lower bounds on capacity and asymptotic growth rates. A capacity lower bound is established under the full-angular expressiveness assumption, while an exact characterization of capacity together with high-probability performance guarantees is achieved under a strong support assumption.
📝 Abstract
We study how many synthetic identities can be generated so that a face verifier declares same-identity pairs as matches and different-identity pairs as non-matches at a fixed threshold $τ$. We formalize this question for a generative face-recognition pipeline consisting of a generator followed by a normalized recognition map with outputs on the unit hypersphere. We define the capacity of distinguishable identity generation as the largest number of latent identities whose induced embedding distributions satisfy prescribed same-identity and different-identity verification constraints. In the deterministic view-invariant regime, we show that this capacity is characterized by a spherical-code problem over the realizable set of embeddings, and reduces to the classical spherical-code quantity under a full angular expressivity assumption. For stochastic identity generation, we introduce a centered model and derive a sufficient admissibility condition in which the required separation between identity centers is $\arccos(τ)+2ρ$, where $ρ$ is a within-identity concentration radius. Under full angular expressivity, this yields spherical-code-based achievable lower bounds and a positive asymptotic lower bound on the exponential growth rate with embedding dimension. We also introduce a prior-constrained random-code capacity, in which latent identities are sampled independently from a given prior, and derive high-probability lower bounds in terms of pairwise separation-failure probabilities of the induced identity centers. Under a stronger full-cap-support model, we obtain a converse and an exact spherical-code characterization.