🤖 AI Summary
To address computational and energy constraints in real-time multi-label video classification on embedded devices, this paper proposes a context-aware dynamic modular inference framework. The method exploits label sparsity, temporal continuity, and co-occurrence patterns in video sequences to construct a lightweight low-rank adapter (LoRA) pool; it dynamically activates only the most semantically relevant subset of adapters per frame, thereby avoiding full-model switching and weight merging. The backbone network is shared across tasks, while adapters are composed on-demand, significantly improving energy efficiency and scalability. Evaluated on the TAO dataset, the framework reduces energy consumption by 40% compared to strong baselines while increasing mean average precision (mAP) by 9 percentage points—achieving joint optimization of high accuracy and low power consumption.
📝 Abstract
Real-time multi-label video classification on embedded devices is constrained by limited compute and energy budgets. Yet, video streams exhibit structural properties such as label sparsity, temporal continuity, and label co-occurrence that can be leveraged for more efficient inference. We introduce Polymorph, a context-aware framework that activates a minimal set of lightweight Low Rank Adapters (LoRA) per frame. Each adapter specializes in a subset of classes derived from co-occurrence patterns and is implemented as a LoRA weight over a shared backbone. At runtime, Polymorph dynamically selects and composes only the adapters needed to cover the active labels, avoiding full-model switching and weight merging. This modular strategy improves scalability while reducing latency and energy overhead. Polymorph achieves 40% lower energy consumption and improves mAP by 9 points over strong baselines on the TAO dataset. Polymorph is open source at https://github.com/inference-serving/polymorph/.