On the Limitations of Large Language Models (LLMs): False Attribution

📅 2024-04-06

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 0

🤖 AI Summary

This study identifies a critical flaw in large language models (LLMs) for zero-shot author attribution: systematic misattribution—i.e., “false attribution” hallucinations. We systematically evaluate LLaMA-2-13B, Mixtral 8x7B, and Gemma-7B on 400-word text segments drawn from ten canonical literary works. Methodologically, we propose the Simple Hallucination Index (SHI), a lightweight, interpretable metric quantifying attribution inconsistency across model outputs. Empirically, we demonstrate a near-perfect negative correlation between author prediction accuracy and SHI (r = −0.9996), establishing SHI as a valid, transferable hallucination metric for attribution tasks. Mixtral 8x7B achieves the highest overall performance (best accuracy, lowest SHI), yet exhibits SHI values up to 0.87 on challenging instances—highlighting inherent task difficulty. To foster reproducibility and advancement, we open-source a high-quality annotated dataset, evaluation code, and full experimental pipeline. This work provides both a methodological foundation and an empirical benchmark for trustworthy LLM-based author attribution.

Technology Category

Application Category

📝 Abstract

In this work, we provide insight into one important limitation of large language models (LLMs), i.e. false attribution, and introduce a new hallucination metric - Simple Hallucination Index (SHI). The task of automatic author attribution for relatively small chunks of text is an important NLP task but can be challenging. We empirically evaluate the power of 3 open SotA LLMs in zero-shot setting (LLaMA-2-13B, Mixtral 8x7B, and Gemma-7B), especially as human annotation can be costly. We collected the top 10 most popular books, according to Project Gutenberg, divided each one into equal chunks of 400 words, and asked each LLM to predict the author. We then randomly sampled 162 chunks for human evaluation from each of the annotated books, based on the error margin of 7% and a confidence level of 95% for the book with the most chunks (Great Expectations by Charles Dickens, having 922 chunks). The average results show that Mixtral 8x7B has the highest prediction accuracy, the lowest SHI, and a Pearson's correlation (r) of 0.737, 0.249, and -0.9996, respectively, followed by LLaMA-2-13B and Gemma-7B. However, Mixtral 8x7B suffers from high hallucinations for 3 books, rising as high as an SHI of 0.87 (in the range 0-1, where 1 is the worst). The strong negative correlation of accuracy and SHI, given by r, demonstrates the fidelity of the new hallucination metric, which is generalizable to other tasks. We publicly release the annotated chunks of data and our codes to aid the reproducibility and evaluation of other models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' false attribution in author prediction tasks

Introducing Simple Hallucination Index (SHI) to measure LLM hallucinations

Analyzing correlation between prediction accuracy and Wikipedia presence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Simple Hallucination Index (SHI)

Evaluates 3 LLMs in zero-shot author attribution

Strong negative correlation between accuracy and SHI

🔎 Similar Papers

No similar papers found.

Authors to Follow