Privacy-Aware Time Series Synthesis via Public Knowledge Distillation

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Balancing privacy preservation and synthetic data utility remains challenging, particularly for sensitive time-series data (e.g., healthcare, finance). Method: This paper proposes Pub2Priv—a novel framework that leverages publicly available, non-sensitive contextual metadata (e.g., weather, electricity prices) to guide the generation of private time-series data. It introduces a self-attention mechanism to distill heterogeneous public knowledge into joint temporal and feature embeddings, which condition a diffusion-based generative model. Additionally, it proposes a new identifiability-based metric for rigorous privacy evaluation. Contribution/Results: Extensive experiments on multiple real-world datasets demonstrate that Pub2Priv significantly outperforms state-of-the-art methods. Crucially, it achieves high statistical fidelity and strong downstream task performance—even under stringent differential privacy guarantees (ε < 1.5). The framework establishes a scalable, verifiable paradigm for cross-domain secure data sharing.

Technology Category

Application Category

📝 Abstract

Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.

Problem

Research questions and friction points this paper is trying to address.

Generating private time series data using public contextual information

Improving privacy-utility trade-off in sensitive data synthesis

Addressing identifiability concerns in synthetic sequence generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging public knowledge to generate private time series

Using self-attention and diffusion models for synthesis

Introducing identifiability metric to assess privacy

🔎 Similar Papers

EnergyDiff: Universal Time-Series Energy Data Generation using Diffusion Models