🤖 AI Summary
This study addresses the theory–practice gap in industrial deployment of AI fairness testing. Through semi-structured interviews with 22 AI/ML practitioners and thematic coding analysis, it identifies— for the first time—five core bottlenecks: ambiguous fairness definitions, poor data quality, inconsistent fairness metrics, lack of interoperable tools compatible with CI/CD pipelines, and insufficient engineering support. Grounded in an interdisciplinary perspective bridging software engineering and AI ethics, the work proposes practitioner-centered design principles for fairness testing frameworks. The findings provide empirically grounded guidelines and a conceptual foundation for developing lightweight, embeddable, and pipeline-integrated fairness testing tools. By directly addressing real-world engineering constraints and stakeholder needs, this research significantly narrows the chasm between academic fairness research and industrial practice.
📝 Abstract
Software testing ensures that a system functions correctly, meets specified requirements, and maintains high quality. As artificial intelligence and machine learning (ML) technologies become integral to software systems, testing has evolved to address their unique complexities. A critical advancement in this space is fairness testing, which identifies and mitigates biases in AI applications to promote ethical and equitable outcomes. Despite extensive academic research on fairness testing, including test input generation, test oracle identification, and component testing, practical adoption remains limited. Industry practitioners often lack clear guidelines and effective tools to integrate fairness testing into real-world AI development. This study investigates how software professionals test AI-powered systems for fairness through interviews with 22 practitioners working on AI and ML projects. Our findings highlight a significant gap between theoretical fairness concepts and industry practice. While fairness definitions continue to evolve, they remain difficult for practitioners to interpret and apply. The absence of industry-aligned fairness testing tools further complicates adoption, necessitating research into practical, accessible solutions. Key challenges include data quality and diversity, time constraints, defining effective metrics, and ensuring model interoperability. These insights emphasize the need to bridge academic advancements with actionable strategies and tools, enabling practitioners to systematically address fairness in AI systems.