Hidden costs for inference with deep network on embedded system devices

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of using multiply-accumulate operations (MACs) as a sole metric for evaluating the computational complexity of deep models on embedded devices, as it neglects non-computational overheads such as memory access and data movement. The authors systematically benchmark ten deep models on the CIFAR-100 image classification task across embedded platforms, comparing measured inference latency against theoretical MAC counts. Their analysis reveals a significant discrepancy between actual runtime performance and MAC-based predictions, demonstrating for the first time the inadequacy of relying exclusively on MACs to assess model efficiency in edge deployment scenarios. The findings highlight that auxiliary tensor operations substantially impact on-device inference performance, underscoring the necessity of jointly optimizing both computational and non-computational factors during model deployment.

Technology Category

Application Category

📝 Abstract
This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.
Problem

Research questions and friction points this paper is trying to address.

inference time
embedded systems
deep learning models
Multiply-Accumulate operations
computational load
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiply-Accumulate operations
inference time
embedded systems
deep learning optimization
tensor overhead
🔎 Similar Papers
No similar papers found.