Synopsis: Secure and private trend inference from encrypted semantic embeddings

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of privacy-preserving trend analysis on end-to-end encrypted (E2EE) messaging platforms (e.g., WhatsApp), enabling trusted entities—such as journalists—to conduct exploratory or targeted analyses of misinformation or political content propagation without accessing raw messages. We propose a novel hybrid framework integrating local and centralized differential privacy with malicious-secure multi-party computation (MPC), enforcing a single, cryptographically guaranteed legitimate query path. Fine-grained trend aggregation is performed directly in the encrypted domain using voluntarily contributed 500-dimensional semantic embeddings. Evaluated on a dataset of 34,024 Hindi WhatsApp messages, our system achieves >94% analytical accuracy with ~30-second query latency, demonstrating strong privacy guarantees (formal DP bounds), practical efficiency, and flexible analytical expressiveness.

Technology Category

Application Category

📝 Abstract

WhatsApp and many other commonly used communication platforms guarantee end-to-end encryption (E2EE), which requires that service providers lack the cryptographic keys to read communications on their own platforms. WhatsApp's privacy-preserving design makes it difficult to study important phenomena like the spread of misinformation or political messaging, as users have a clear expectation and desire for privacy and little incentive to forfeit that privacy in the process of handing over raw data to researchers, journalists, or other parties. We introduce Synopsis, a secure architecture for analyzing messaging trends in consensually-donated E2EE messages using message embeddings. Since the goal of this system is investigative journalism workflows, Synopsis must facilitate both exploratory and targeted analyses -- a challenge for systems using differential privacy (DP), and, for different reasons, a challenge for private computation approaches based on cryptography. To meet these challenges, we combine techniques from the local and central DP models and wrap the system in malicious-secure multi-party computation to ensure the DP query architecture is the only way to access messages, preventing any party from directly viewing stored message embeddings. Evaluations on a dataset of Hindi-language WhatsApp messages (34,024 messages represented as 500-dimensional embeddings) demonstrate the efficiency and accuracy of our approach. Queries on this data run in about 30 seconds, and the accuracy of the fine-grained interface exceeds 94% on benchmark tasks.

Problem

Research questions and friction points this paper is trying to address.

Secure trend analysis from encrypted messaging platforms

Privacy-preserving inference without raw data access

Balancing exploratory and targeted analyses with differential privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines local and central differential privacy models

Uses malicious-secure multi-party computation

Analyzes encrypted messages via semantic embeddings

🔎 Similar Papers

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models

2024-06-13arXiv.orgCitations: 0

Authors to Follow