EMLIO: Minimizing I/O Latency and Energy Consumption for Large-Scale AI Training

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In large-scale AI training, datasets often exceed local storage capacity while GPU compute far outpaces I/O bandwidth, making I/O latency and energy consumption critical bottlenecks. This paper introduces the first machine learning I/O service that jointly optimizes end-to-end data loading latency and I/O energy consumption. It adopts a service-oriented architecture, deploying lightweight daemons on storage nodes to handle sample serialization, batching, TCP-based streaming, and out-of-order prefetching—while seamlessly integrating GPU-accelerated preprocessing (e.g., DALI) and adapting to variable-latency networks. Experiments across local disks, LANs, and WANs demonstrate that our approach achieves up to 8.6× higher I/O throughput and 10.9× lower energy consumption compared to state-of-the-art loaders, with minimal degradation in performance or energy efficiency as network distance increases.

Technology Category

Application Category

📝 Abstract

Large-scale deep learning workloads increasingly suffer from I/O bottlenecks as datasets grow beyond local storage capacities and GPU compute outpaces network and disk latencies. While recent systems optimize data-loading time, they overlook the energy cost of I/O - a critical factor at large scale. We introduce EMLIO, an Efficient Machine Learning I/O service that jointly minimizes end-to-end data-loading latency T and I/O energy consumption E across variable-latency networked storage. EMLIO deploys a lightweight data-serving daemon on storage nodes that serializes and batches raw samples, streams them over TCP with out-of-order prefetching, and integrates seamlessly with GPU-accelerated (NVIDIA DALI) preprocessing on the client side. In exhaustive evaluations over local disk, LAN (0.05 ms & 10 ms RTT), and WAN (30 ms RTT) environments, EMLIO delivers up to 8.6X faster I/O and 10.9X lower energy use compared to state-of-the-art loaders, while maintaining constant performance and energy profiles irrespective of network distance. EMLIO's service-based architecture offers a scalable blueprint for energy-aware I/O in next-generation AI clouds.

Problem

Research questions and friction points this paper is trying to address.

Minimizing I/O latency for large-scale AI training

Reducing energy consumption during data-loading

Optimizing performance across variable-latency networked storage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight data-serving daemon on storage nodes

TCP streaming with out-of-order prefetching

Seamless GPU-accelerated preprocessing integration

🔎 Similar Papers

No similar papers found.

Authors to Follow