🤖 AI Summary
This study addresses the challenges faced by the Human-Computer Interaction (HCI) community when evaluating systems research that integrates large language models (LLMs), stemming from interdisciplinary normative differences, inconsistent review standards, and a lack of mutual trust. Through semi-structured interviews with 18 authors and 6 domain experts, complemented by qualitative analysis and expert feedback, the work uncovers value conflicts between HCI and ML/NLP communities regarding the recognition of contributions and expectations of technical rigor. It critically questions prescriptive mandates—such as requiring all prompts to be disclosed or mandating the use of open-source models—as overly rigid and context-insensitive. In response, the paper proposes a set of context-aware reporting and reviewing guidelines that offer a practical framework for authors, reviewers, and communities to foster fairer, more consistent, and HCI-aligned scholarly practices.
📝 Abstract
What should HCI scholars consider when reporting and reviewing papers that involve LLM-integrated systems? We interview 18 authors of LLM-integrated system papers on their authoring and reviewing experiences. We find that norms of trust-building between authors and reviewers appear to be eroded by the uncertainty of LLM behavior and hyperbolic rhetoric surrounding AI. Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence. Authors'views challenge blanket directives to report all prompts and use open models, arguing that prompt reporting is context-dependent and justifying proprietary model usage despite ethical concerns. Finally, some tensions in peer review appear to stem from clashes between the norms and values of HCI and ML/NLP communities, particularly around what constitutes a contribution and an appropriate level of technical rigor. Based on our findings and additional feedback from six expert HCI researchers, we present a set of guidelines and considerations for authors, reviewers, and HCI communities around reporting and reviewing papers that involve LLM-integrated systems.