π€ AI Summary
Systematic trade-off evaluation between fairness and effectiveness in machine learning models remains challenging. Method: This paper proposes a fairness benchmarking framework based on Software Product Line (SPL) engineering, implemented as a web-based low-code platform. Contribution/Results: It introduces the first SPL-based fairness benchmark, featuring an extended feature model and formal constraint mechanisms to ensure semantic correctness and experimental reproducibility. The framework automatically enumerates valid configurations, generates executable experiment pipelines, and computes Pareto-optimal trade-offs between effectiveness (e.g., accuracy) and fairness (e.g., demographic parity). Empirical evaluation demonstrates that the framework achieves high expressiveness and zero runtime configuration errors, significantly enhancing the reliability, customizability, and engineering rigor of fairness assessment.
π Abstract
This paper presents MANILA, a web-based low-code application to benchmark machine learning models and fairness-enhancing methods and select the one achieving the best fairness and effectiveness trade-off. It is grounded on an Extended Feature Model that models a general fairness benchmarking workflow as a Software Product Line. The constraints defined among the features guide users in creating experiments that do not lead to execution errors. We describe the architecture and implementation of MANILA and evaluate it in terms of expressiveness and correctness.