iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification

📅 2024-09-24

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address data distribution imbalance and label scarcity arising from emerging classes in text classification, this paper proposes iGAiVA—a novel framework that maps four core machine learning tasks to corresponding visual analytics (VA) views, establishing a closed-loop workflow of “defect identification → targeted synthesis → performance validation.” Integrating interactive VA, LLM-driven synthetic data generation, text embedding dimensionality reduction, cluster visualization, and model diagnostics, iGAiVA enables interpretable localization of data defects and on-demand augmentation. Evaluated across multiple text classification scenarios, the method significantly improves model accuracy, empirically validating the efficacy of defect-driven synthesis. The open-source toolkit iGAiVA implements this paradigm, offering a reproducible, interpretable, and interactive generative AI–visual analytics co-design for low-resource text classification.

Technology Category

Application Category

📝 Abstract

In developing machine learning (ML) models for text classification, one common challenge is that the collected data is often not ideally distributed, especially when new classes are introduced in response to changes of data and tasks. In this paper, we present a solution for using visual analytics (VA) to guide the generation of synthetic data using large language models. As VA enables model developers to identify data-related deficiency, data synthesis can be targeted to address such deficiency. We discuss different types of data deficiency, describe different VA techniques for supporting their identification, and demonstrate the effectiveness of targeted data synthesis in improving model accuracy. In addition, we present a software tool, iGAiVA, which maps four groups of ML tasks into four VA views, integrating generative AI and VA into an ML workflow for developing and improving text classification models.

Problem

Research questions and friction points this paper is trying to address.

Addressing non-ideal data distribution in text classification

Using visual analytics to guide synthetic data generation

Improving model accuracy via targeted data synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual analytics guides synthetic data generation

Targeted data synthesis improves model accuracy

Integrated generative AI with VA in ML workflow

🔎 Similar Papers

No similar papers found.