FirecREST v2: lessons learned from redesigning an API for scalable HPC resource access

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the low throughput, high latency, and weak security coupling of HPC-oriented proxy-based RESTful APIs under intensive I/O workloads, this paper proposes a novel API architecture tailored for high-performance computing. Methodologically, we introduce the first end-to-end performance modeling and bottleneck attribution framework; deeply integrate security mechanisms (JWT/OAuth 2.1) into stateless service design—rather than applying them as post-hoc hardening—and implement asynchronous I/O, zero-copy data transfer, and load-aware routing in Rust. Our contributions include a 100× throughput improvement and reduction of P99 latency to the millisecond level; rigorous independent peer review; and production deployment across multiple European supercomputing centers, supporting scientific workflows with over one thousand concurrent clients.

Technology Category

Application Category

📝 Abstract

Introducing FirecREST v2, the next generation of our open-source RESTful API for programmatic access to HPC resources. FirecREST v2 delivers a 100x performance improvement over its predecessor. This paper explores the lessons learned from redesigning FirecREST from the ground up, with a focus on integrating enhanced security and high throughput as core requirements. We provide a detailed account of our systematic performance testing methodology, highlighting common bottlenecks in proxy-based APIs with intensive I/O operations. Key design and architectural changes that enabled these performance gains are presented. Finally, we demonstrate the impact of these improvements, supported by independent peer validation, and discuss opportunities for further improvements.

Problem

Research questions and friction points this paper is trying to address.

Redesigning an API for scalable HPC resource access

Integrating enhanced security and high throughput requirements

Addressing performance bottlenecks in proxy-based APIs with I/O

Innovation

Methods, ideas, or system contributions that make the work stand out.

Redesigned RESTful API for scalable HPC access

Integrated enhanced security and high throughput

Achieved 100x performance improvement over predecessor

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

AI/HPC System Performance Engineer