TOPLOC: A Locality Sensitive Hashing Scheme for Trustless Verifiable Inference

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the trust gap in LLM services—where users cannot verify whether the deployed model, prompt, or quantization accuracy matches the provider’s claims—this paper proposes the first third-party-free, lightweight inference verification framework. Methodologically, it integrates compact locality-sensitive hashing (LSH) with polynomial commitment encoding, augmented by intermediate activation compression and hardware- and algebra-agnostic reordering for robustness against tampering with model identity, input prompt, or quantization precision—achieving 100% detection accuracy (zero false positives, zero false negatives). Empirical evaluation on Llama-3.1-8B-Instruct shows verification latency is substantially lower than original inference time, with memory overhead of only 258 bytes per 32 tokens—three orders of magnitude smaller than raw embedding storage. The core contribution is the first end-to-end LLM inference verification scheme that simultaneously achieves high accuracy, low overhead, and strong robustness.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have proven to be very capable, but access to the best models currently rely on inference providers which introduces trust challenges -- how can we be sure that the provider is using the model configuration they claim? We propose TOPLOC, a novel method for verifiable inference that addresses this problem. TOPLOC leverages a compact locality sensitive hashing mechanism for intermediate activations which can detect unauthorized modifications to models, prompts, or precision with 100% accuracy, achieving no false positives or negatives in our empirical evaluations. Our approach is robust across diverse hardware configurations, GPU types, and algebraic reorderings, which allows for validation speeds significantly faster than the original inference. By introducing a polynomial encoding scheme, TOPLOC minimizes memory overhead of the generated commits by $1000 imes$, requiring only 258 bytes of storage per 32 new tokens compared to the 262KB requirement of storing the token embeddings directly for Llama-3.1-8B-Instruct. Our method empowers users to verify LLM inference computations efficiently, fostering greater trust and transparency in open ecosystems and lays a foundation for decentralized and verifiable AI services.
Problem

Research questions and friction points this paper is trying to address.

Language Model Verification
Trust Issues
Service Provider Authenticity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Sensitive Hashing
Polynomial Encoding
Decentralized AI Verification
🔎 Similar Papers
No similar papers found.
J
Jack Min Ong
Prime Intellect
M
Matthew Di Ferrante
Prime Intellect
A
Aaron Pazdera
Prime Intellect
R
Ryan Garner
Prime Intellect
Sami Jaghouar
Sami Jaghouar
Research Engineer
distributed training
M
Manveer Basra
Prime Intellect
J
Johannes Hagemann
Prime Intellect