A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of lightweight, incentive-compatible mechanisms for multidimensional output quality evaluation in decentralized large language model inference networks. The authors propose a systematic quality scoring framework that decomposes output quality into multiple dimensions—namely, model and cost priors, structural quality, semantic quality, query-output alignment, and consistency/uncertainty. A calibrated, dimensionally filtered, and weighted fusion of these components yields a composite quality signal, which is integrated into a Proof-of-Quality (PoQ) incentive mechanism. Experimental results demonstrate that the calibrated composite score matches or exceeds the performance of the best individual evaluators and consensus-based baselines on question-answering and summarization tasks, while significantly enhancing system robustness under adversarial attacks. The study further reveals, for the first time, the task-dependent nature and potential negative correlations among multidimensional quality metrics.

Technology Category

Application Category

📝 Abstract
Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.
Problem

Research questions and friction points this paper is trying to address.

decentralized LLM inference
quality assessment
multi-dimensional scoring
evaluator heterogeneity
adversarial behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-dimensional quality scoring
Proof of Quality
decentralized LLM inference
quality calibration
adversarial robustness
🔎 Similar Papers
No similar papers found.
A
Arther Tian
DGrid AI
A
Alex Ding
DGrid AI
F
Frank Chen
DGrid AI
S
Simon Wu
DGrid AI
Aaron Chan
Aaron Chan
Sahara AI
Machine LearningLarge Language ModelsAI AgentsDecentralized AI