MARCO: Hardware-Aware Neural Architecture Search for Edge Devices with Multi-Agent Reinforcement Learning and Conformal Prediction Filtering

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural architecture search (NAS) for resource-constrained edge devices faces challenges in jointly optimizing network topology, quantization bit-widths, and hardware configurations under tight computational and energy budgets. Method: This paper proposes a hardware-aware NAS framework that integrates multi-agent reinforcement learning (MARL) with conformal prediction (CP). A statistically guaranteed early architecture pruning mechanism drastically reduces the search space; a centralized-critic–decentralized-execution (CTDE) paradigm decouples hardware scheduling from quantization decisions. Hardware-accurate simulation modeling and hierarchical quantization are incorporated to bridge the software–hardware gap. Contribution/Results: On MNIST and CIFAR benchmarks, the framework achieves 3–4× speedup in search time with <0.3% accuracy degradation. Physical deployment on the MAX78000 microcontroller demonstrates significantly reduced inference latency, and hardware simulation error remains below 5%.

Technology Category

Application Category

📝 Abstract
This paper introduces MARCO (Multi-Agent Reinforcement learning with Conformal Optimization), a novel hardware-aware framework for efficient neural architecture search (NAS) targeting resource-constrained edge devices. By significantly reducing search time and maintaining accuracy under strict hardware constraints, MARCO bridges the gap between automated DNN design and CAD for edge AI deployment. MARCO's core technical contribution lies in its unique combination of multi-agent reinforcement learning (MARL) with Conformal Prediction (CP) to accelerate the hardware/software co-design process for deploying deep neural networks. Unlike conventional once-for-all (OFA) supernet approaches that require extensive pretraining, MARCO decomposes the NAS task into a hardware configuration agent (HCA) and a Quantization Agent (QA). The HCA optimizes high-level design parameters, while the QA determines per-layer bit-widths under strict memory and latency budgets using a shared reward signal within a centralized-critic, decentralized-execution (CTDE) paradigm. A key innovation is the integration of a calibrated CP surrogate model that provides statistical guarantees (with a user-defined miscoverage rate) to prune unpromising candidate architectures before incurring the high costs of partial training or hardware simulation. This early filtering drastically reduces the search space while ensuring that high-quality designs are retained with a high probability. Extensive experiments on MNIST, CIFAR-10, and CIFAR-100 demonstrate that MARCO achieves a 3-4x reduction in total search time compared to an OFA baseline while maintaining near-baseline accuracy (within 0.3%). Furthermore, MARCO also reduces inference latency. Validation on a MAX78000 evaluation board confirms that simulator trends hold in practice, with simulator estimates deviating from measured values by less than 5%.
Problem

Research questions and friction points this paper is trying to address.

Efficient neural architecture search for edge devices
Reducing search time under hardware constraints
Accelerating hardware/software co-design for DNNs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent reinforcement learning for hardware-aware NAS
Conformal prediction for early architecture pruning
Decentralized execution with centralized critic paradigm
🔎 Similar Papers
No similar papers found.
Arya Fayyazi
Arya Fayyazi
Research Assistant, University of Southern California
Machine LearningHardware/Software Co-optimizationEDAAI FairnessML Compiler
M
M. Kamal
University of Southern California, Los Angeles, California, US
M
M. Pedram
University of Southern California, Los Angeles, California, US