Membership Inference Attacks Expose Participation Privacy in ECG Foundation Encoders

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the membership inference privacy risks inherent in self-supervised pre-trained electrocardiogram (ECG) foundation encoders. It presents the first systematic evaluation of privacy leakage across real-world scenarios for contrastive learning approaches (e.g., SimCLR, TS2Vec) and masked reconstruction methods (e.g., CNN- or Transformer-based MAE). The authors develop a multi-interface attack framework encompassing black-box score access, adaptive statistical aggregation, and geometric probing in embedding spaces, alongside a cross-dataset auditing protocol tailored to deployment environments. Their findings reveal significant privacy leakage for small-scale or institution-specific cohorts, with contrastive learning models exhibiting greater vulnerability to membership exposure in embedding spaces. Notably, large-scale and diverse pre-training data substantially mitigates tail-end privacy risks.

Technology Category

Application Category

📝 Abstract
Foundation-style ECG encoders pretrained with self-supervised learning are increasingly reused across tasks, institutions, and deployment contexts, often through model-as-a-service interfaces that expose scalar scores or latent representations. While such reuse improves data efficiency and generalization, it raises a participation privacy concern: can an adversary infer whether a specific individual or cohort contributed ECG data to pretraining, even when raw waveforms and diagnostic labels are never disclosed? In connected-health settings, training participation itself may reveal institutional affiliation, study enrollment, or sensitive health context. We present an implementation-grounded audit of membership inference attacks (MIAs) against modern self-supervised ECG foundation encoders, covering contrastive objectives (SimCLR, TS2Vec) and masked reconstruction objectives (CNN- and Transformer-based MAE). We evaluate three realistic attacker interfaces: (i) score-only black-box access to scalar outputs, (ii) adaptive learned attackers that aggregate subject-level statistics across repeated queries, and (iii) embedding-access attackers that probe latent representation geometry. Using a subject-centric protocol with window-to-subject aggregation and calibration at fixed false-positive rates under a cross-dataset auditing setting, we observe heterogeneous and objective-dependent participation leakage: leakage is most pronounced in small or institution-specific cohorts and, for contrastive encoders, can saturate in embedding space, while larger and more diverse datasets substantially attenuate operational tail risk. Overall, our results show that restricting access to raw signals or labels is insufficient to guarantee participation privacy, underscoring the need for deployment-aware auditing of reusable biosignal foundation encoders in connected-health systems.
Problem

Research questions and friction points this paper is trying to address.

Membership Inference Attacks
Participation Privacy
ECG Foundation Models
Self-supervised Learning
Privacy Leakage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Membership Inference Attacks
ECG Foundation Models
Self-Supervised Learning
Participation Privacy
Latent Representation Leakage