A Multi-Dimensional Quality Scoring Framework for Decentralized LLM Inference with Proof of Quality

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the lack of lightweight, incentive-compatible mechanisms for multidimensional output quality evaluation in decentralized large language model inference networks. The authors propose a systematic quality scoring framework that decomposes output quality into multiple dimensions—namely, model and cost priors, structural quality, semantic quality, query-output alignment, and consistency/uncertainty. A calibrated, dimensionally filtered, and weighted fusion of these components yields a composite quality signal, which is integrated into a Proof-of-Quality (PoQ) incentive mechanism. Experimental results demonstrate that the calibrated composite score matches or exceeds the performance of the best individual evaluators and consensus-based baselines on question-answering and summarization tasks, while significantly enhancing system robustness under adversarial attacks. The study further reveals, for the first time, the task-dependent nature and potential negative correlations among multidimensional quality metrics.

Technology Category

Application Category

📝 Abstract

Decentralized large language model (LLM) inference networks can pool heterogeneous compute to scale serving, but they require lightweight and incentive-compatible mechanisms to assess output quality. Prior work introduced cost-aware Proof of Quality (PoQ) and adaptive robust PoQ to allocate rewards under evaluator heterogeneity and adversarial behavior. In this paper, we focus on the quality signal itself and propose a multi-dimensional quality scoring framework that decomposes output quality into modular dimensions, including model and cost priors, structure quality, semantic quality, query-output alignment, and agreement/uncertainty. Using logged outputs from QA and summarization tasks, we systematically audit dimension reliability and show that seemingly reasonable dimensions can be task-dependent and even negatively correlated with reference quality without calibration. While the default composite underperforms a strong single semantic evaluator, ablations reveal that removing unreliable dimensions and re-normalizing weights yields a calibrated composite that matches or exceeds the best single- evaluator and consensus baselines. Finally, we integrate the composite score as a drop-in quality signal in PoQ and demonstrate complementary benefits with robust aggregation and adaptive trust weighting under adversarial evaluator attacks.

Problem

Research questions and friction points this paper is trying to address.

decentralized LLM inference

quality assessment

multi-dimensional scoring

evaluator heterogeneity

adversarial behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-dimensional quality scoring

Proof of Quality

decentralized LLM inference

quality calibration

adversarial robustness

🔎 Similar Papers

No similar papers found.

Authors to Follow