🤖 AI Summary
Large language models (LLMs) commonly exhibit blind acceptance of false premises, leading to flawed reasoning and unreliable outputs; existing evaluations predominantly assume ideal conditions and neglect premise critique capability. Method: This work introduces the novel concept of “premise critique ability” and proposes PCBench—a comprehensive benchmark covering four types of false premises across three difficulty levels—accompanied by a multidimensional evaluation framework. It employs human-crafted adversarial examples, explicit prompt ablation, controlled manipulation of error type and difficulty, and analyses of response length and logical consistency. Results: Experiments across 15 mainstream LLMs reveal widespread deficiency in premise critique ability; strong reasoning performance does not imply strong premise critique capability; false premises consistently trigger verbose, unproductive retries. This study establishes a new paradigm and benchmark tool for assessing LLM robustness and trustworthy reasoning.
📝 Abstract
Large language models (LLMs) have witnessed rapid advancements, demonstrating remarkable capabilities. However, a notable vulnerability persists: LLMs often uncritically accept flawed or contradictory premises, leading to inefficient reasoning and unreliable outputs. This emphasizes the significance of possessing the extbf{Premise Critique Ability} for LLMs, defined as the capacity to proactively identify and articulate errors in input premises. Most existing studies assess LLMs' reasoning ability in ideal settings, largely ignoring their vulnerabilities when faced with flawed premises. Thus, we introduce the extbf{Premise Critique Bench (PCBench)}, designed by incorporating four error types across three difficulty levels, paired with multi-faceted evaluation metrics. We conducted systematic evaluations of 15 representative LLMs. Our findings reveal: (1) Most models rely heavily on explicit prompts to detect errors, with limited autonomous critique; (2) Premise critique ability depends on question difficulty and error type, with direct contradictions being easier to detect than complex or procedural errors; (3) Reasoning ability does not consistently correlate with the premise critique ability; (4) Flawed premises trigger overthinking in reasoning models, markedly lengthening responses due to repeated attempts at resolving conflicts. These insights underscore the urgent need to enhance LLMs' proactive evaluation of input validity, positioning premise critique as a foundational capability for developing reliable, human-centric systems. The code is available at https://github.com/MLGroupJLU/Premise_Critique.