BIDENT: Heterogeneous Operator-level Mapping for Efficient Edge Inference

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing edge AI inference systems are constrained by model-level mapping strategies, which hinder efficient utilization of heterogeneous computing resources to accommodate diverse operator characteristics. This work proposes the first unified operator-level scheduling framework that dynamically assigns each operator to the optimal processing unit (CPU/GPU/NPU) based on empirical performance profiling. By constructing a weighted execution graph and solving a shortest-path problem, the framework enables latency- or energy-efficiency-oriented scheduling. It transcends conventional limitations by uniformly supporting sequential execution, intra-model parallelism, and multi-model concurrency, all without relying on model-specific heuristics, thus achieving model-agnostic applicability. Experiments on an Intel Core Ultra SoC demonstrate up to 1.60× speedup with intra-model parallelism, a geometric mean acceleration of 3.42× for concurrent multi-model execution, and an average energy saving of 48.2% under energy-efficient scheduling.

📝 Abstract

Modern edge System-on-Chips (SoCs) integrate heterogeneous processing units (PUs) such as CPUs, GPUs, and NPUs, yet current inference stacks map entire models to a single PU, leaving significant performance and energy efficiency on the table. This is exacerbated by emerging architectures such as state-space models (SSMs), Kolmogorov-Arnold networks (KANs), and multi-stage vision-language-action (VLA) pipelines, whose diverse operator characteristics are not uniformly suited to any single PU. We present BIDENT, a unified operator-level orchestration framework for heterogeneous edge inference that maps individual operators to the most suitable PU based on profiled execution characteristics. BIDENT formulates operator-to-PU assignment as a shortest-path problem over a weighted execution graph, enabling efficient and optimal scheduling under the cost model for both latency- and energy-minimization objectives. Unlike prior work relying on model-specific heuristics or coarse-grained partitioning, BIDENT is model-agnostic and jointly supports sequential execution, intra-model parallelism across independent operators, and multi-model concurrent scheduling in a single formulation. We implement BIDENT on an Intel Core Ultra SoC and evaluate it across 10 model families spanning CNNs, Transformers, SSMs, KANs, spiking networks, and multi-stage pipelines. BIDENT achieves up to 1.60x speedup via intra-model parallelism and a 3.42x geometric mean speedup across 190 multi-model combinations by utilizing otherwise idle compute. Sequential heterogeneous mapping yields more modest gains (up to 1.58x, 1.09x geometric mean), while energy-aware scheduling reduces energy consumption by 48.2% on average in concurrent settings. These results show that operator-level orchestration, not model-level mapping, is the key abstraction for fully exploiting heterogeneity in next-generation edge AI.

Problem

Research questions and friction points this paper is trying to address.

heterogeneous edge inference

operator-level mapping

System-on-Chip

model heterogeneity

edge AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

operator-level scheduling

heterogeneous edge inference

model-agnostic orchestration