🤖 AI Summary
Addressing the challenge of multidimensional risk–utility assessment and the lack of systematic criteria for selecting anonymization or synthetic data methods, this paper proposes an integrated visual analytics framework. Methodologically, it combines block PCA and joint PCA to construct composite scatterplots and dual PCA biplots, augmented by heatmaps, parallel coordinate plots, and radial profile plots to simultaneously quantify and correlate disclosure risks (e.g., re-identification, attribute inference) with utility metrics (e.g., statistical fidelity, modeling performance). Its key innovation is the first integration of Pareto frontier analysis into the visualization pipeline, enabling automatic identification of optimal anonymization schemes within the multidimensional risk–utility space. Experiments demonstrate that the framework significantly enhances assessment comprehensiveness and decision-making efficiency, establishing an interpretable, reproducible evaluation paradigm for privacy-preserving data publishing.
📝 Abstract
Anonymizing microdata requires balancing the reduction of disclosure risk with the preservation of data utility. Traditional evaluations often rely on single measures or two-dimensional risk-utility (R-U) maps, but real-world assessments involve multiple, often correlated, indicators of both risk and utility. Pairwise comparisons of these measures can be inefficient and incomplete. We therefore systematically compare six visualization approaches for simultaneous evaluation of multiple risk and utility measures: heatmaps, dot plots, composite scatterplots, parallel coordinate plots, radial profile charts, and PCA-based biplots. We introduce blockwise PCA for composite scatterplots and joint PCA for biplots that simultaneously reveal method performance and measure interrelationships. Through systematic identification of Pareto-optimal methods in all approaches, we demonstrate how multivariate visualization supports a more informed selection of anonymization methods.