🤖 AI Summary
To overcome technical bottlenecks in energy efficiency, scalability, and application readiness hindering exascale computing, this work designed and deployed Aurora—the first U.S. exascale system at Argonne National Laboratory. Aurora employs an Intel Xeon Sapphire Rapids CPU and Ponte Vecchio GPU heterogeneous architecture, integrated with high-bandwidth memory (HBM), the DAOS distributed object storage system, and the HPE Slingshot 11 interconnect. System-level co-design and the oneAPI unified programming model enable deep hardware–software co-optimization. The project introduced the first exascale-optimized node architecture tailored for scientific computing and established a full-stack software ecosystem, validated through early scientific applications. Aurora achieved leading performance on HPL and HPCG benchmarks and has enabled large-scale simulations across multiple domains—including climate modeling, materials science, and particle physics—demonstrating substantial improvements in real-world scientific throughput and exascale system readiness.
📝 Abstract
Aurora is Argonne National Laboratory's pioneering Exascale supercomputer, designed to accelerate scientific discovery with cutting-edge architectural innovations. Key new technologies include the Intel(TM) Xeon(TM) Data Center GPU Max Series (code-named Sapphire Rapids) with support for High Bandwidth Memory (HBM), alongside the Intel(TM) Data Center GPU Max Series (code-named Ponte Vecchio) on each compute node. Aurora also integrates the Distributed Asynchronous Object Storage (DAOS), a novel exascale storage solution, and leverages Intel's oneAPI programming environment. This paper presents an in-depth exploration of Aurora's node architecture, the HPE Slingshot interconnect, the supporting software ecosystem, and DAOS. We provide insights into standard benchmark performance and applications readiness efforts via Aurora's Early Science Program and the Exascale Computing Project.