OODTE: A Differential Testing Engine for the ONNX Optimizer

📅 2025-05-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

ONNX Optimizer lacks rigorous accuracy validation for optimized models. Method: We propose the first fine-grained differential testing framework tailored for ONNX optimizers, automatically comparing outputs of original and optimized models across multiple inputs to pinpoint specific optimization passes causing crashes, invalid graphs, or accuracy degradation. Our approach integrates ONNX graph traversal, cross-task benchmarks (classification, detection, segmentation, NLP), iterative precision deviation detection, and a transferable evaluation paradigm. Contribution/Results: Evaluated on 130 real-world ONNX models, we uncovered 15 defects—14 previously unreported—including runtime crashes or invalid graph generation in 9.2% of cases, and significant accuracy degradation in 30% of classification models. This work establishes a novel, systematic methodology for reliability verification of compiler-level model optimizers.

Technology Category

Application Category

📝 Abstract

With $700$ stars on GitHub and part of the official ONNX repository, the ONNX Optimizer consists of the standard method to apply graph-based optimizations on ONNX models. However, its ability to preserve model accuracy across optimizations, has not been rigorously explored. We propose OODTE, a utility to automatically and thoroughly assess the correctness of the ONNX Optimizer. OODTE follows a simple, yet effective differential testing and evaluation approach that can be easily adopted to other compiler optimizers. In particular, OODTE utilizes a number of ONNX models, then optimizes them and executes both the original and the optimized variants across a user-defined set of inputs, while automatically logging any issues with the optimization process. Finally, for successfully optimized models, OODTE compares the results, and, if any accuracy deviations are observed, it iteratively repeats the process for each pass of the ONNX Optimizer, to localize the root cause of the differences observed. Using OODTE, we sourced well-known $130$ models from the official ONNX Model Hub, used for a wide variety of tasks (classification, object detection, semantic segmentation, text summarization, question and answering, sentiment analysis) from the official ONNX model hub. We detected 15 issues, 14 of which were previously unknown, associated with optimizer crashes and accuracy deviations. We also observed $9.2$% of all model instances presenting issues leading into the crash of the optimizer, or the generation of an invalid model while using the primary optimizer strategies. In addition, $30$% of the classification models presented accuracy differences across the original and the optimized model variants, while $16.6$% of semantic segmentation and object detection models are also affected, at least to a limited extent.

Problem

Research questions and friction points this paper is trying to address.

Assessing ONNX Optimizer's accuracy preservation across optimizations

Detecting optimizer crashes and accuracy deviations in ONNX models

Localizing root causes of optimization issues in ONNX models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential testing for ONNX Optimizer correctness

Automated issue logging in optimization process

Iterative root cause analysis for accuracy deviations

🔎 Similar Papers

No similar papers found.

Authors to Follow