Beyond the Crawl: Unmasking Browser Fingerprinting in Real User Interactions

πŸ“… 2025-02-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current automated tools fail to accurately emulate realistic human-browser interactions, introducing systematic bias into browser fingerprinting research. To address this, we conducted a 10-week empirical study involving 30 real users, collecting browsing behavior data across 3,000 mainstream websites. Our analysis revealed that 45% of fingerprinting sites are missed by conventional crawlersβ€”a previously undocumented gap. We further identified three novel fingerprinting vectors: those dependent on authenticated sessions, human-triggered interactions (e.g., mouse movements, focus events), and anti-bot evasion techniques. Building on these insights, we propose a privacy-preserving fingerprint detection framework based on federated learning. Evaluated on real-world telemetry data, our method achieves a 22% improvement in detection accuracy over baselines. This demonstrates the critical role of authentic interaction data in enhancing detection robustness. Our work provides foundational empirical evidence for understanding modern fingerprinting mechanisms and advancing privacy-enhancing technologies.

Technology Category

Application Category

πŸ“ Abstract
Browser fingerprinting is a pervasive online tracking technique used increasingly often for profiling and targeted advertising. Prior research on the prevalence of fingerprinting heavily relied on automated web crawls, which inherently struggle to replicate the nuances of human-computer interactions. This raises concerns about the accuracy of current understandings of real-world fingerprinting deployments. As a result, this paper presents a user study involving 30 participants over 10 weeks, capturing telemetry data from real browsing sessions across 3,000 top-ranked websites. Our evaluation reveals that automated crawls miss almost half (45%) of the fingerprinting websites encountered by real users. This discrepancy mainly stems from the crawlers' inability to access authentication-protected pages, circumvent bot detection, and trigger fingerprinting scripts activated by specific user interactions. We also identify potential new fingerprinting vectors present in real user data but absent from automated crawls. Finally, we evaluate the effectiveness of federated learning for training browser fingerprinting detection models on real user data, yielding improved performance than models trained solely on automated crawl data.
Problem

Research questions and friction points this paper is trying to address.

Browser Fingerprinting
Human-Computer Interaction
Automated Tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real User Behavior
Fingerprinting Techniques
Superiority in Federated Learning