🤖 AI Summary
This study addresses the irreproducibility and inconsistent responsiveness of large language model (LLM) agents when intervening in public forums—a critical challenge to auditable governance. The authors systematically investigate how four structural deployment factors—model version, weight openness, service provider, and system prompt strategy—independently and jointly influence intervention decisions. Through controlled comparative experiments across both open- and closed-weight LLMs, the work reveals for the first time that these four factors collectively shape intervention behavior, moving beyond the common oversimplification of attributing outcomes solely to model identity. Notably, closed-weight models exhibit a stronger tendency to decline intervention under visible challenges, whereas open-weight models show either opposite tendencies or no significant differences, demonstrating that all four factors significantly affect intervention rates.
📝 Abstract
LLM agents are increasingly used in moderation-relevant public forum workflows, where their choices to answer, acknowledge, repair, or decline are routinely challenged by users, platforms, and regulators. The same agent often returns different responses on identical content, so any defense based on the agent's behavior cannot be reliably reproduced. The variation is structural. Four deployment choices typically invisible to the operator each shift the agent's response rate, and their combinations can produce substantially different interventions on the same forum posts. The four choices are (1) which model version is currently served, which can change between calls without notice; (2) the model's weight-release status (open-weight, with weights publicly downloadable, vs. closed-weight, with weights held by the provider); (3) which provider serves the request; and (4) which system-prompt policy is in force. Across LLMs spanning both open-weight and closed-weight families, we find that the previously reported tendency to decline more on visible than hidden challenges aligns with the open/closed weight boundary in our panel more than with access surface. Every closed-weight cell declines more on visible challenges; every open-weight cell reverses this or shows no gap. Auditable forum-agent governance requires awareness of all four choices, not just the model name, since each independently shifts behavior.