🤖 AI Summary
This study addresses the limitations of using multiply-accumulate operations (MACs) as a sole metric for evaluating the computational complexity of deep models on embedded devices, as it neglects non-computational overheads such as memory access and data movement. The authors systematically benchmark ten deep models on the CIFAR-100 image classification task across embedded platforms, comparing measured inference latency against theoretical MAC counts. Their analysis reveals a significant discrepancy between actual runtime performance and MAC-based predictions, demonstrating for the first time the inadequacy of relying exclusively on MACs to assess model efficiency in edge deployment scenarios. The findings highlight that auxiliary tensor operations substantially impact on-device inference performance, underscoring the necessity of jointly optimizing both computational and non-computational factors during model deployment.
📝 Abstract
This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.