🤖 AI Summary
Traditional cloud architectures struggle to efficiently process the massive, high-velocity data generated by AI workloads, resulting in suboptimal storage, computation, and data migration performance, while lacking adaptability to device heterogeneity and dynamically evolving AI models. To address this, we propose an active-storage-enhanced AI task offloading architecture tailored for the compute continuum, deeply integrating active storage into an edge–cloud collaborative framework to enable proximity-based execution and dynamic scheduling of AI inference tasks. Our approach synergistically combines active storage, computational offloading, and distributed AI inference, thereby overcoming the fundamental bottleneck of the conventional “move data to compute” paradigm. Experimental results demonstrate that the proposed architecture significantly reduces inference latency and network bandwidth consumption, while improving system throughput and resource utilization. It achieves joint optimization of energy efficiency and performance across diverse application scenarios.