PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This study addresses the challenge of aligning highly heterogeneous clinical data across multiple institutions prior to federated learning, where manual coordination is costly and privacy-sensitive. To this end, the authors propose a privacy-preserving multi-agent semantic alignment framework that introduces a multi-agent mechanism into the pre-federated data harmonization phase. By leveraging local semantic analysis, cross-site feature clustering, and iterative transformation recommendations, the method enables automated, high-precision alignment of structured data across institutions without centralizing sensitive patient records. Experiments on four heterogeneous COVID-19 datasets demonstrate that the approach substantially reduces manual curation effort while efficiently achieving accurate semantic interoperability, thereby establishing a novel paradigm for privacy-aware federated healthcare collaboration.

📝 Abstract

The growing availability of clinical data has increased the use of machine learning, yet centralized data aggregation is often infeasible for sensitive health information. Federated Learning (FL) offers a distributed alternative, but its adoption is limited by substantial heterogeneity across institutional datasets, making harmonization a critical but frequently overlooked prerequisite for multi-site analytics. We introduce PrivFusion, a privacy-preserving multi-agent framework that automates the harmonization of structured datasets prior to federated training. PrivFusion uses agents to analyze local data, cluster semantically similar features across sites, and provide iterative transformation recommendations until alignment is achieved. Evaluation across four heterogeneous COVID-19 datasets demonstrates that PrivFusion effectively and efficiently harmonizes multi-site data while substantially reducing manual effort.

Problem

Research questions and friction points this paper is trying to address.

data harmonization

federated learning

privacy-preserving

clinical data

dataset heterogeneity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Privacy-preserving

Multi-agent framework

Data harmonization