iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification

๐Ÿ“… 2024-09-24
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address data distribution imbalance and label scarcity arising from emerging classes in text classification, this paper proposes iGAiVAโ€”a novel framework that maps four core machine learning tasks to corresponding visual analytics (VA) views, establishing a closed-loop workflow of โ€œdefect identification โ†’ targeted synthesis โ†’ performance validation.โ€ Integrating interactive VA, LLM-driven synthetic data generation, text embedding dimensionality reduction, cluster visualization, and model diagnostics, iGAiVA enables interpretable localization of data defects and on-demand augmentation. Evaluated across multiple text classification scenarios, the method significantly improves model accuracy, empirically validating the efficacy of defect-driven synthesis. The open-source toolkit iGAiVA implements this paradigm, offering a reproducible, interpretable, and interactive generative AIโ€“visual analytics co-design for low-resource text classification.

Technology Category

Application Category

๐Ÿ“ Abstract
In developing machine learning (ML) models for text classification, one common challenge is that the collected data is often not ideally distributed, especially when new classes are introduced in response to changes of data and tasks. In this paper, we present a solution for using visual analytics (VA) to guide the generation of synthetic data using large language models. As VA enables model developers to identify data-related deficiency, data synthesis can be targeted to address such deficiency. We discuss different types of data deficiency, describe different VA techniques for supporting their identification, and demonstrate the effectiveness of targeted data synthesis in improving model accuracy. In addition, we present a software tool, iGAiVA, which maps four groups of ML tasks into four VA views, integrating generative AI and VA into an ML workflow for developing and improving text classification models.
Problem

Research questions and friction points this paper is trying to address.

Addressing non-ideal data distribution in text classification
Using visual analytics to guide synthetic data generation
Improving model accuracy via targeted data synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual analytics guides synthetic data generation
Targeted data synthesis improves model accuracy
Integrated generative AI with VA in ML workflow
๐Ÿ”Ž Similar Papers
No similar papers found.