🤖 AI Summary
Computational cognitive modeling struggles to capture the full spectrum of human reasoning—including suboptimal and diverse inference patterns—often termed the “complete reasoning spectrum.”
Method: We propose a personified large language model (LLM) framework that integrates the Five-Factor Model of personality into prompt engineering. Personality trait weights are dynamically optimized via a genetic algorithm, and model training targets the empirical distribution of human群体 responses—not just majority-vote answers—on natural language inference tasks. We evaluate this framework on open-source LLMs, including Llama and Mistral.
Contribution/Results: Our approach significantly improves fidelity to human dual-process reasoning distributions: it achieves a 37% reduction in KL divergence over baseline models in fitting both intuitive (System 1) and deliberative (System 2) response patterns. Moreover, it outperforms GPT-series models on distributional prediction accuracy, establishing a higher-fidelity, human-centered cognitive modeling paradigm for LLMs.
📝 Abstract
In computational cognitive modeling, capturing the full spectrum of human judgment and decision-making processes, beyond just optimal behaviors, is a significant challenge. This study explores whether Large Language Models (LLMs) can emulate the breadth of human reasoning by predicting both intuitive, fast System 1 and deliberate, slow System 2 processes. We investigate the potential of AI to mimic diverse reasoning behaviors across a human population, addressing what we call the {em full reasoning spectrum problem}. We designed reasoning tasks using a novel generalization of the Natural Language Inference (NLI) format to evaluate LLMs' ability to replicate human reasoning. The questions were crafted to elicit both System 1 and System 2 responses. Human responses were collected through crowd-sourcing and the entire distribution was modeled, rather than just the majority of the answers. We used personality-based prompting inspired by the Big Five personality model to elicit AI responses reflecting specific personality traits, capturing the diversity of human reasoning, and exploring how personality traits influence LLM outputs. Combined with genetic algorithms to optimize the weighting of these prompts, this method was tested alongside traditional machine learning models. The results show that LLMs can mimic human response distributions, with open-source models like Llama and Mistral outperforming proprietary GPT models. Personality-based prompting, especially when optimized with genetic algorithms, significantly enhanced LLMs' ability to predict human response distributions, suggesting that capturing suboptimal, naturalistic reasoning may require modeling techniques incorporating diverse reasoning styles and psychological profiles. The study concludes that personality-based prompting combined with genetic algorithms is promising for enhancing AI's extit{human-ness} in reasoning.