VirnyFlow: A Design Space for Responsible Model Development

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenges of black-box automation, accountability deficits, and insufficient customization in real-world multi-objective AutoML problems, this paper introduces the first accountability-oriented design space for ML model development. It enables data scientists to explicitly specify optimization objectives, configure cross-stage experiments, and iteratively refine pipelines under constraint alignment. Technically, it integrates multi-objective Bayesian optimization, cost-aware multi-armed bandits, query optimization, and a distributed parallel architecture. Evaluated on five real-world benchmarks, our system significantly outperforms state-of-the-art AutoML methods in both optimization quality—measured by Pareto front coverage—and scalability—supporting concurrent scheduling of over one thousand components. This work establishes a new paradigm for responsible, interpretable, and customizable ML pipeline development.

Technology Category

Application Category

📝 Abstract

Developing machine learning (ML) models requires a deep understanding of real-world problems, which are inherently multi-objective. In this paper, we present VirnyFlow, the first design space for responsible model development, designed to assist data scientists in building ML pipelines that are tailored to the specific context of their problem. Unlike conventional AutoML frameworks, VirnyFlow enables users to define customized optimization criteria, perform comprehensive experimentation across pipeline stages, and iteratively refine models in alignment with real-world constraints. Our system integrates evaluation protocol definition, multi-objective Bayesian optimization, cost-aware multi-armed bandits, query optimization, and distributed parallelism into a unified architecture. We show that VirnyFlow significantly outperforms state-of-the-art AutoML systems in both optimization quality and scalability across five real-world benchmarks, offering a flexible, efficient, and responsible alternative to black-box automation in ML development.

Problem

Research questions and friction points this paper is trying to address.

Designing responsible ML models for multi-objective real-world problems

Enabling customized optimization criteria in ML pipeline development

Outperforming AutoML systems in optimization quality and scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Customized optimization criteria for ML pipelines

Multi-objective Bayesian optimization integration

Cost-aware multi-armed bandits for efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow