🤖 AI Summary
Black-box large language models (LLMs) are prone to hallucination due to their inability to recognize the boundaries of their own knowledge, which limits their practical deployment. This work proposes a knowledge distillation–based framework that maps the internal knowledge state of a black-box LLM using its input queries, output responses, and token-level probabilities, enabling, for the first time, quantification and representation of knowledge boundaries using only API access. Furthermore, the study introduces an adaptive alternative for scenarios where token probabilities are unavailable. Experimental results across multiple public benchmarks and mainstream black-box LLMs demonstrate that the proposed approach substantially outperforms existing baselines in both accuracy and recall. Notably, the alternative method achieves performance close to the primary approach and remains significantly superior to current models.
📝 Abstract
Large Language Models (LLMs) have achieved remarkable success, however, the emergence of content generation distortion (hallucination) limits their practical applications. The core cause of hallucination lies in LLMs'lack of awareness regarding their stored internal knowledge, preventing them from expressing their knowledge state on questions beyond their internal knowledge boundaries, as humans do. However, existing research on knowledge boundary expression primarily focuses on white-box LLMs, leaving methods suitable for black-box LLMs which offer only API access without revealing internal parameters-largely unexplored. Against this backdrop, this paper proposes LSCL (LLM-Supervised Confidence Learning), a deep learning-based method for expressing the knowledge boundaries of black-box LLMs. Based on the knowledge distillation framework, this method designs a deep learning model. Taking the input question, output answer, and token probability from a black-box LLM as inputs, it constructs a mapping between the inputs and the model'internal knowledge state, enabling the quantification and expression of the black-box LLM'knowledge boundaries. Experiments conducted on diverse public datasets and with multiple prominent black-box LLMs demonstrate that LSCL effectively assists black-box LLMs in accurately expressing their knowledge boundaries. It significantly outperforms existing baseline models on metrics such as accuracy and recall rate. Furthermore, considering scenarios where some black-box LLMs do not support access to token probability, an adaptive alternative method is proposed. The performance of this alternative approach is close to that of LSCL and surpasses baseline models.