MING: An Automated CNN-to-Edge MLIR HLS framework

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying convolutional neural networks (CNNs) on edge devices, where existing MLIR-based high-level synthesis (HLS) frameworks struggle to simultaneously satisfy stringent hardware resource constraints and low-latency requirements. To bridge this gap, the authors propose an automated MLIR-driven HLS flow that integrates, for the first time, resource-aware optimization strategies tailored for edge deployment. By leveraging a streaming data architecture and fine-grained, resource-conscious buffer management, the approach enables efficient end-to-end mapping of CNNs onto FPGAs. Experimental results demonstrate that, compared to state-of-the-art frameworks, the proposed method achieves an average 15× speedup across a standard four-layer CNN kernel, with peak per-layer acceleration reaching 200×, while effectively supporting large input dimensions and adhering to the resource and energy efficiency constraints typical of edge FPGA platforms.

Technology Category

Application Category

📝 Abstract
Driven by the increasing demand for low-latency and real-time processing, machine learning applications are steadily migrating toward edge computing platforms, where Field-Programmable Gate Arrays (FPGAs) are widely adopted for their energy efficiency compared to CPUs and GPUs. To generate high-performance and low-power FPGA designs, several frameworks built upon High Level Synthesis (HLS) vendor tools have been proposed, among which MLIR-based frameworks are gaining significant traction due to their extensibility and ease of use. However, existing state-of-the-art frameworks often overlook the stringent resource constraints of edge devices. To address this limitation, we propose MING, an Multi-Level Intermediate Representation (MLIR)-based framework that abstracts and automates the HLS design process. Within this framework, we adopt a streaming architecture with carefully managed buffers, specifically designed to handle resource constraints while ensuring low-latency. In comparison with recent frameworks, our approach achieves on average 15x speedup for standard Convolutional Neural Network (CNN) kernels with up to four layers, and up to 200x for single-layer kernels. For kernels with larger input sizes, MING is capable of generating efficient designs that respect hardware resource constraints, whereas state-of-the-art frameworks struggle to meet.
Problem

Research questions and friction points this paper is trying to address.

edge computing
resource constraints
FPGA
CNN
MLIR
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLIR
HLS
FPGA
edge computing
streaming architecture
🔎 Similar Papers
No similar papers found.