Mapple: A Domain-Specific Language for Mapping Distributed Heterogeneous Parallel Programs

πŸ“… 2025-07-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the complexity of parallel program mapping on distributed heterogeneous systems and the low-level, unwieldy interfaces of existing task-based systems, this paper proposes Mappleβ€”a domain-specific language (DSL) that declaratively decouples application logic from performance optimization. Mapple introduces the novel *decompose* primitive to resolve dimensional mismatches between iteration spaces and processor topologies, thereby significantly reducing communication overhead. Built atop the Legion runtime, Mapple employs high-order transformation primitives to automatically compile mapper specifications into efficient C++ code, supporting diverse applications including matrix multiplication and scientific computing. Evaluation across nine benchmarks shows that Mapple reduces mapper code size by 14Γ— compared to conventional approaches; achieves up to 1.34Γ— speedup over expert-written C++ mappers; and accelerates mapping via the *decompose* primitive by up to 1.83Γ— relative to state-of-the-art alternatives.

Technology Category

Application Category

πŸ“ Abstract
Optimizing parallel programs for distributed heterogeneous systems remains a complex task, often requiring significant code modifications. Task-based programming systems improve modularity by separating performance decisions from core application logic, but their mapping interfaces are often too low-level. In this work, we introduce Mapple, a high-level, declarative programming interface for mapping distributed applications. Mapple provides transformation primitives to resolve dimensionality mismatches between iteration and processor spaces, including a key primitive, decompose, that helps minimize communication volume. We implement Mapple on top of the Legion runtime by translating Mapple mappers into its low-level C++ interface. Across nine applications, including six matrix multiplication algorithms and three scientific computing workloads, Mapple reduces mapper code size by 14X and enables performance improvements of up to 1.34X over expert-written C++ mappers. In addition, the decompose primitive achieves up to 1.83X improvement over existing dimensionality-resolution heuristics. These results demonstrate that Mapple simplifies the development of high-performance mappers for distributed applications.
Problem

Research questions and friction points this paper is trying to address.

Simplifying optimization of distributed heterogeneous parallel programs
Providing high-level declarative mapping for distributed applications
Reducing code size and improving performance of mappers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Declarative interface for distributed application mapping
Transformation primitives resolve dimensionality mismatches
Decompose primitive minimizes communication volume
πŸ”Ž Similar Papers
No similar papers found.