🤖 AI Summary
High-performance computing (HPC) centers face challenges in supporting containerized generative AI (GenAI) services and ensuring cross-platform reproducibility. Method: This paper proposes a unified architecture integrating HPC and cloud-native technologies—orchestrated via Kubernetes and incorporating the vLLM inference server, multi-container runtimes (e.g., Singularity/CRI-O), object storage, and vector databases—to enable seamless deployment and coordinated execution of GenAI components across heterogeneous HPC environments. Contribution/Results: The approach breaks down traditional isolation between HPC and cloud-native ecosystems, enabling high-fidelity, cross-platform reproducibility of containerized AI workloads. Evaluated on Llama-series models, the system demonstrates superior stability, inference throughput, and deployment consistency compared to pure-HPC or pure-cloud alternatives. It establishes a reusable deployment paradigm for GenAI services, significantly enhancing HPC centers’ capability to support large-model inference workloads.
📝 Abstract
Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected via web-based APIs. While these components are often containerized and deployed in cloud environments, such capabilities are still emerging at High-Performance Computing (HPC) centers. In this paper, we share our experience deploying GenAI workloads within an established HPC center, discussing the integration of HPC and cloud computing environments. We describe our converged computing architecture that integrates HPC and Kubernetes platforms running containerized GenAI workloads, helping with reproducibility. A case study illustrates the deployment of the Llama Large Language Model (LLM) using a containerized inference server (vLLM) across both Kubernetes and HPC platforms using multiple container runtimes. Our experience highlights practical considerations and opportunities for the HPC container community, guiding future research and tool development.