🤖 AI Summary
This study addresses the widespread prevalence of TLS man-in-the-middle (MitM) vulnerabilities in Android applications caused by improper certificate validation, a problem exacerbated by existing detection approaches that suffer from low UI coverage, high analysis overhead, and difficulty in root cause identification. To overcome these limitations, this work proposes the first end-to-end framework leveraging foundation models for automated detection and attribution of TLS vulnerabilities. The approach integrates an LLM-driven GUI agent for enhanced exploration, dynamic instrumentation for runtime monitoring, and a vulnerability code classifier, alongside a novel taxonomy for TLS Misconfiguration Vulnerabilities (TMV). Evaluation on 37,349 real-world apps revealed 8,374 vulnerable instances (22.42%), with 41% originating from third-party libraries and a median vulnerability lifespan exceeding 1,300 days. A large-scale responsible disclosure process has been initiated.
📝 Abstract
Transport Layer Security (TLS) is fundamental to secure online communication, yet vulnerabilities in certificate validation that enable Man-in-the-Middle (MitM) attacks remain a pervasive threat in Android apps. Existing detection tools are hampered by low-coverage UI interaction, costly instrumentation, and a lack of scalable root-cause analysis. We present Okara, a framework that leverages foundation models to automate the detection and deep attribution of TLS MitM Vulnerabilities (TMVs). Okara's detection component, TMV-Hunter, employs foundation model-driven GUI agents to achieve high-coverage app interaction, enabling efficient vulnerability discovery at scale. Deploying TMV-Hunter on 37,349 apps from Google Play and a third-party store revealed 8,374 (22.42%) vulnerable apps. Our measurement shows these vulnerabilities are widespread across all popularity levels, affect critical functionalities like authentication and code delivery, and are highly persistent with a median vulnerable lifespan of over 1,300 days. Okara's attribution component, TMV-ORCA, combines dynamic instrumentation with a novel LLM-based classifier to locate and categorize vulnerable code according to a comprehensive new taxonomy. This analysis attributes 41% of vulnerabilities to third-party libraries and identifies recurring insecure patterns, such as empty trust managers and flawed hostname verification. We have initiated a large-scale responsible disclosure effort and will release our tools and datasets to support further research and mitigation.