PIUMA: Programmable Integrated Unified Memory Architecture

📅 2020-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional processors exhibit low resource utilization and poor scalability for large-scale graph analytics. Method: Intel proposes PIUMA—a programmable, integrated, unified-memory architecture leveraging silicon photonics and co-packaged optical interconnects. It introduces a novel optically driven global shared address space, extended on-chip network protocols, and heterogeneous multithreaded cores to realize a “virtual single-chip” system spanning over one thousand sockets. Fabricated in 7 nm FinFET technology, a 316 mm² prototype was tape-out and validated on a 16-node platform using a full-system simulation toolchain. Results: Empirical and projected evaluations show 10×–100× speedup in key graph algorithms versus conventional server nodes. This work pioneers deep integration of optical interconnects into general-purpose graph computing architectures, establishing a scalable hardware–software co-design paradigm for ultra-large-scale graph processing; its core innovations are being incorporated into Intel’s next-generation products.
📝 Abstract
High performance large scale graph analytics are essential to timely analyze relationships in big data sets. Conventional processor architectures suffer from inefficient resource usage and bad scaling on those workloads. To enable efficient and scalable graph analysis, Intel developed the Programmable Integrated Unified Memory Architecture (PIUMA) as a part of the DARPA Hierarchical Identify Verify Exploit (HIVE) program. PIUMA consists of many multi-threaded cores, fine-grained memory and network accesses, a globally shared address space, powerful offload engines and a tightly integrated optical interconnection network. By utilizing co-packaged optical silicon photonics and extending the on-chip mesh protocol directly to the optical fabric, all PIUMA chips in a system are glued together in a large virtual die which allows for extremely low socket-to-socket latencies even as the system scales to thousands of sockets. Performance estimations project that a PIUMA node will outperform a conventional compute node by one to two orders of magnitude. Furthermore, PIUMA continues to scale across multiple nodes, which is a challenge in conventional multi-node setups. This paper presents the PIUMA architecture, and documents our experience in designing and building a prototype chip and its bring-up process. We summarize the methodology for our co-design of the architecture together with the software stack using simulation tools and FPGA emulation. These tools provided early performance estimations of realistic applications and allowed us to implement many optimizations across the hardware, compilers, libraries and applications. We built the PIUMA chip as a 316mm2 7nm FinFET CMOS die and constructed a 16-node system. PIUMA silicon has successfully powered on demonstrating key aspects of the architecture, some of which will be incorporated into future Intel products.
Problem

Research questions and friction points this paper is trying to address.

Enables efficient and scalable large-scale graph analytics.
Addresses inefficiencies in conventional processor architectures.
Integrates optical silicon photonics for low-latency multi-node scaling.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated optical silicon photonics for low latency
Multi-threaded cores with fine-grained memory access
Co-design methodology using simulation and FPGA emulation
🔎 Similar Papers
No similar papers found.
S
S. Aananthakrishnan
Intel Corporation
S
Shamsul Abedin
Intel Corporation
V
Vincent Cave
Intel Corporation
Fabio Checconi
Fabio Checconi
RSM, IBM Research
K
Kristof Du Bois
Intel Corporation
S
Stijn Eyerman
Intel Corporation
J
J. Fryman
Intel Corporation
W
W. Heirman
Intel Corporation
J
Jason Howard
Intel Corporation
Ibrahim Hur
Ibrahim Hur
Intel Corporation
S
Samkit Jain
Intel Corporation
M
Marek M. Landowski
Intel Corporation
K
Kevin Ma
Intel Corporation
J
Jarrod Nelson
Intel Corporation
R
Robert Pawlowski
Intel Corporation
F
Fabrizio Petrini Sebastian Szkoda
Intel Corporation
S
Sanjaya Tayal
Intel Corporation
Jesmin Jahan Tithi
Jesmin Jahan Tithi
Intel Corporation, Stony Brook University
High Performance Computingsoftware-hardware co-designEthics In AIMachine LearningMachine Programming
Y
Yves Vandriessche
Intel Corporation