🤖 AI Summary
Automatically verifying thread safety of Java classes remains challenging due to the complexity of concurrent semantics and absence of a rigorous, verifiable definition grounded in the Java Memory Model (JMM).
Method: This paper introduces a formally verifiable definition of thread safety based on JMM’s data-race-freedom principle and constructs, for the first time, a fine-grained property set covering field accesses, synchronization primitives, and method invocations. These properties are encoded into CodeQL using precise semantic modeling and context-sensitive data-flow analysis.
Results: Evaluated on the GitHub Top 1000 repositories (over 3.63 million Java classes), our approach achieves high precision and scalability—averaging under two minutes per project (≤200 KLOC) and detecting thousands of real-world concurrency bugs. Several recommended fixes have been adopted by maintainers, and the core queries are now integrated into the official CodeQL repository, enabling automated concurrency auditing via GitHub Actions.
📝 Abstract
In object-oriented languages software developers rely on thread-safe classes to implement concurrent applications. However, determining whether a class is thread-safe is a challenging task. This paper presents a highly scalable method to analyze thread-safety in Java classes. We provide a definition of thread-safety for Java classes founded on the correctness principle of the Java memory model, data race freedom. We devise a set of properties for Java classes that are proven to ensure thread-safety. We encode these properties in the static analysis tool CodeQL to automatically analyze Java source code. We perform an evaluation on the top 1000 GitHub repositories. The evaluation comprises 3632865 Java classes; with 1992 classes annotated as @ThreadSafe from 71 repositories. These repositories include highly popular software such as Apache Flink (24.6k stars), Facebook Fresco (17.1k stars), PrestoDB (16.2k starts), and gRPC (11.6k starts). Our queries detected thousands of thread-safety errors. The running time of our queries is below 2 minutes for repositories up to 200k lines of code, 20k methods, 6000 fields, and 1200 classes. We have submitted a selection of detected concurrency errors as PRs, and developers positively reacted to these PRs. We have submitted our CodeQL queries to the main CodeQL repository, and they are currently in the process of becoming available as part of GitHub actions. The results demonstrate the applicability and scalability of our method to analyze thread-safety in real-world code bases.