Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

In data-silo scenarios such as healthcare, raw data cannot be shared across institutions, hindering collaborative model optimization. Method: This paper proposes an asynchronous model collaboration framework that requires neither data sharing nor synchronized training. Instead, pre-trained models are exchanged across sites, and lightweight “stitching layers” inserted at intermediate network levels enable feature alignment and model ensemble. For the first time, multi-objective optimization is formulated for asynchronous model fusion, jointly optimizing local performance recovery and cross-domain generalization. Results: Under strict zero-data-sharing constraints, the method significantly improves cross-institutional test accuracy while restoring each participant’s local performance to near that of centralized joint training. It consistently outperforms conventional ensemble and knowledge distillation approaches in both generalization and local fidelity.

Technology Category

Application Category

📝 Abstract

Deep learning has been shown to be very capable at performing many real-world tasks. However, this performance is often dependent on the presence of large and varied datasets. In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared. While federated learning addresses this situation, it is a solution that requires synchronicity of parties training a single model together, exchanging information about model weights. We investigate how asynchronous collaboration, where only already trained models are shared (e.g. as part of a publication), affects performance, and propose to use stitching as a method for combining models. Through taking a multi-objective perspective, where performance on each parties' data is viewed independently, we find that training solely on a single parties' data results in similar performance when merging with another parties' data, when considering performance on that single parties' data, while performance on other parties' data is notably worse. Moreover, while an ensemble of such individually trained networks generalizes better, performance on each parties' own dataset suffers. We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree while maintaining improved generalization, showing that asynchronous collaboration can yield competitive results.

Problem

Research questions and friction points this paper is trying to address.

Enhances model performance without sharing sensitive data across parties.

Enables asynchronous collaboration by combining pre-trained models using stitching layers.

Improves generalization while maintaining performance on each party's own dataset.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stitching layers combine intermediate model representations

Asynchronous collaboration merges pre-trained disjoint models

Multi-objective approach maintains performance across fragmented datasets

🔎 Similar Papers

No similar papers found.