ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Visual language models (VLMs) struggle to simultaneously achieve low latency, high output quality, and low energy consumption in edge–cloud collaborative inference. Method: This paper proposes a scenario-aware dynamic routing framework that introduces the first multi-dimensional scenario modeling—explicitly capturing requirements on response speed, output quality, and energy efficiency. A lightweight, trainable router is trained on a purpose-built multimodal query dataset, and an end-to-end differentiable multimodal quality assessment module is designed. The framework jointly optimizes local inference on compact edge models and on-demand invocation of large cloud models, enabling query-level adaptive model selection. Results: Experiments show that over 80% of queries are accurately routed to edge models; task completion rate degrades by less than 10%; average latency decreases by 42%; and energy consumption drops by 57%. The framework significantly outperforms static deployment and heuristic scheduling baselines.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) excel in diverse multimodal tasks. However, user requirements vary across scenarios, which can be categorized into fast response, high-quality output, and low energy consumption. Relying solely on large models deployed in the cloud for all queries often leads to high latency and energy cost, while small models deployed on edge devices are capable of handling simpler tasks with low latency and energy cost. To fully leverage the strengths of both large and small models, we propose ECVL-ROUTER, the first scenario-aware routing framework for VLMs. Our approach introduces a new routing strategy and evaluation metrics that dynamically select the appropriate model for each query based on user requirements, maximizing overall utility. We also construct a multimodal response-quality dataset tailored for router training and validate the approach through extensive experiments. Results show that our approach successfully routes over 80% of queries to the small model while incurring less than 10% drop in problem solving probability.

Problem

Research questions and friction points this paper is trying to address.

Optimizing model selection for varying user requirements

Balancing response speed, quality, and energy consumption trade-offs

Dynamically routing queries between large and small VLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scenario-aware routing for vision-language models

Dynamic model selection based on user requirements

Multimodal dataset for router training and validation

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs