🤖 AI Summary
Real-time detection and mitigation of social engineering attacks—such as phishing, impersonation, and vishing—pose significant challenges due to their dynamic, context-sensitive nature and stringent privacy requirements.
Method: This paper proposes the first privacy-preserving AI-in-the-loop anti-fraud dialogue framework, integrating instruction-tuned large language models (LLMs), federated learning (FedAvg), and differential privacy to enable adjustable security thresholds and dynamic real-time moderation. A multi-layer safety mechanism is implemented via guardian models (e.g., LlamaGuard).
Contribution/Results: It is the first work to jointly model real-time conversational intervention, distributed privacy protection, and adaptive security control. Experiments demonstrate fluent system responses (perplexity = 22.3), high user engagement (0.80), and strong privacy guarantees: PII leakage rate ≤ 0.0085 after 30 federated rounds. Both safety compliance and generative novelty remain stable under stringent privacy constraints.
📝 Abstract
Scams exploiting real-time social engineering -- such as phishing, impersonation, and phone fraud -- remain a persistent and evolving threat across digital platforms. Existing defenses are largely reactive, offering limited protection during active interactions. We propose a privacy-preserving, AI-in-the-loop framework that proactively detects and disrupts scam conversations in real time. The system combines instruction-tuned artificial intelligence with a safety-aware utility function that balances engagement with harm minimization, and employs federated learning to enable continual model updates without raw data sharing. Experimental evaluations show that the system produces fluent and engaging responses (perplexity as low as 22.3, engagement $approx$0.80), while human studies confirm significant gains in realism, safety, and effectiveness over strong baselines. In federated settings, models trained with FedAvg sustain up to 30 rounds while preserving high engagement ($approx$0.80), strong relevance ($approx$0.74), and low PII leakage ($leq$0.0085). Even with differential privacy, novelty and safety remain stable, indicating that robust privacy can be achieved without sacrificing performance. The evaluation of guard models (LlamaGuard, LlamaGuard2/3, MD-Judge) shows a straightforward pattern: stricter moderation settings reduce the chance of exposing personal information, but they also limit how much the model engages in conversation. In contrast, more relaxed settings allow longer and richer interactions, which improve scam detection, but at the cost of higher privacy risk. To our knowledge, this is the first framework to unify real-time scam-baiting, federated privacy preservation, and calibrated safety moderation into a proactive defense paradigm.