Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the lack of systematic understanding regarding how developers continuously use and evolve AI-generated code in real-world projects. By analyzing 35,361 GitHub code comments referencing AI and their associated 12,996 subsequent commits, the authors construct the first taxonomy of AI-assisted development activities. Integrating open coding, LLM-based dual-classifier annotation, Dawid-Skene aggregation, and longitudinal temporal analysis, they reveal a long-term evolutionary trend wherein AI tools shift from initial code generation toward knowledge support and code enhancement. The findings indicate that developers primarily employ large language models (LLMs) for implementation, debugging, and code augmentation, while subsequent commits predominantly involve refactoring and feature extension. Moreover, AI references increasingly reflect conceptual collaboration, suggesting that AI is becoming an embedded development partner.

📝 Abstract

Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but prior studies often evaluate LLM outputs in isolation rather than examining how developers adapt them in real projects. We analyze 35,361 GitHub code comments that explicitly reference AI use and their associated code blocks. We first open-code 500 unique comments and code blocks to derive a taxonomy of AI-assisted development activities, then annotate the full dataset using two LLM-based classifiers and aggregate predictions with Dawid-Skene expectation-maximization. We also analyze 12,996 subsequent commit messages to study how AI-assisted code evolves after introduction, and examine temporal trends from December 2022 to March 2026. Our results show that developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement. These findings suggest that AI tools are becoming embedded not only as code-generation aids, but also as collaborative support mechanisms whose outputs are refined, extended, and corrected by developers over time.

Problem

Research questions and friction points this paper is trying to address.

AI usage

GitHub repositories

code comments

LLM integration

empirical study

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-assisted development

code comments analysis

LLM-based classification