Timing and Memory Telemetry on GPUs for AI Governance

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of monitoring GPU usage in deployment settings where trusted telemetry mechanisms are absent, making it difficult to detect unauthorized AI training or policy violations. To this end, the authors propose a computation-behavior telemetry framework that operates without reliance on trusted firmware, secure enclaves, or vendor-provided counters. Leveraging intrinsic features of modern GPU architectures, the framework employs four verifiable and tamper-resistant computational primitives: proof-of-work-inspired parallel workloads, verifiable delay functions, tensor-core GEMM tests, and memory-resident hash bandwidth probes. These primitives generate timing- and memory-access-based observable signals. Experimental results demonstrate that timing deviations and memory access latencies effectively reflect GPU utilization, memory pressure, and execution patterns, thereby offering a practical foundation for AI governance through remote telemetry.

Technology Category

Application Category

📝 Abstract
The rapid expansion of GPU-accelerated computing has enabled major advances in large-scale artificial intelligence (AI), while heightening concerns about how accelerators are observed or governed once deployed. Governance is essential to ensure that large-scale compute infrastructure is not silently repurposed for training models, circumventing usage policies, or operating outside legal oversight. Because current GPUs expose limited trusted telemetry and can be modified or virtualized by adversaries, we explore whether compute-based measurements can provide actionable signals of utilization when host and device are untrusted. We introduce a measurement framework that leverages architectural characteristics of modern GPUs to generate timing- and memory-based observables that correlate with compute activity. Our design draws on four complementary primitives: (1) a probabilistic, workload-driven mechanism inspired by Proof-of-Work (PoW) to expose parallel effort, (2) sequential, latency-sensitive workloads derived via Verifiable Delay Functions (VDFs) to characterize scalar execution pressure, (3) General Matrix Multiplication (GEMM)-based tensor-core measurements that reflect dense linear-algebra throughput, and (4) a VRAM-residency test that distinguishes on-device memory locality from off-chip access through bandwidth-dependent hashing. These primitives provide statistical and behavioral indicators of GPU engagement that remain observable even without trusted firmware, enclaves, or vendor-controlled counters. We evaluate their responses to contention, architectural alignment, memory pressure, and power overhead, showing that timing shifts and residency latencies reveal meaningful utilization patterns. Our results illustrate why compute-based telemetry can complement future accountability mechanisms by exposing architectural signals relevant to post-deployment GPU governance.
Problem

Research questions and friction points this paper is trying to address.

GPU governance
trusted telemetry
AI accountability
compute utilization monitoring
hardware observability
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU telemetry
compute-based observables
Verifiable Delay Functions
tensor-core measurement
memory residency
S
Saleh K. Monfared
Worcester Polytechnic Institute, USA
F
Fatemeh Ganji
Worcester Polytechnic Institute, USA
D
Dan Holcomb
University of Massachusetts, Amherst, USA
Shahin Tajik
Shahin Tajik
Assistant Professor, Worcester Polytechnic Institute (WPI)
Hardware SecurityCryptographySide-channel AttacksFault AttacksPhysical Security