π€ AI Summary
This work addresses the lack of objective, quantifiable unsupervised methods for evaluating conversational engagement by proposing PMIScoreβthe first framework to incorporate pointwise mutual information (PMI) into this task. PMIScore constructs positive and negative dialogue pairs, leverages large language model embeddings, and efficiently estimates PMI via its dual form of mutual information to train a lightweight neural network for unsupervised engagement quantification. The resulting metric not only offers interpretability but also demonstrates the validity of PMI estimation and its suitability as an engagement indicator, as evidenced by evaluations on both synthetic and real-world dialogue datasets.
π Abstract
High dialogue engagement is a crucial indicator of an effective conversation. A reliable measure of engagement could help benchmark large language models, enhance the effectiveness of human-computer interactions, or improve personal communication skills. However, quantifying engagement is challenging, since it is subjective and lacks a "gold standard". This paper proposes PMIScore, an efficient unsupervised approach to quantify dialogue engagement. It uses pointwise mutual information (PMI), which is the probability of generating a response conditioning on the conversation history. Thus, PMIScore offers a clear interpretation of engagement. As directly computing PMI is intractable due to the complexity of dialogues, PMIScore learned it through a dual form of divergence. The algorithm includes generating positive and negative dialogue pairs, extracting embeddings by large language models (LLMs), and training a small neural network using a mutual information loss function. We validated PMIScore on both synthetic and real-world datasets. Our results demonstrate the effectiveness of PMIScore in PMI estimation and the reasonableness of the PMI metric itself.