Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants

📅 2025-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the misalignment with user needs, poor scalability, and weak interpretability in evaluating LLM-driven software engineering (SE) dialogue assistants, this paper proposes the first interdisciplinary evaluation framework integrating human-computer interaction (HCI) principles with AI-based automated assessment. Methodologically, it systematically unifies HCI-driven requirement modeling with LLM behavioral modeling, designs a developer-centered automated metric suite, and ensures reliability through multi-dimensional validity validation. Key contributions include: (1) articulating six core human-centered requirements and associated challenges for automated SE assistant evaluation; (2) establishing a scalable, reproducible, and user-aligned evaluation paradigm; and (3) providing both theoretical foundations and an open, implementable toolchain. The framework significantly enhances the practicality, transparency, and human-factor compatibility of SE assistant evaluation.

Technology Category

Application Category

📝 Abstract
As Large Language Models (LLMs) are increasingly adopted in software engineering, recently in the form of conversational assistants, ensuring these technologies align with developers' needs is essential. The limitations of traditional human-centered methods for evaluating LLM-based tools at scale raise the need for automatic evaluation. In this paper, we advocate combining insights from human-computer interaction (HCI) and artificial intelligence (AI) research to enable human-centered automatic evaluation of LLM-based conversational SE assistants. We identify requirements for such evaluation and challenges down the road, working towards a framework that ensures these assistants are designed and deployed in line with user needs.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LLM-based conversational assistants
Combine HCI and AI research insights
Ensure alignment with developers' needs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines HCI and AI insights
Human-centered automatic evaluation
Framework for user-aligned deployment
🔎 Similar Papers
No similar papers found.