Text2Data: Low-Resource Data Generation with Textual Control

📅 2024-02-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-data generation in low-resource domains—such as molecular conformations, motion trajectories, and multivariate time series—is severely hindered by the scarcity of text annotations. Method: We propose an unsupervised diffusion modeling framework: (i) a generic diffusion model pretrained solely on unlabeled data; (ii) a constraint-optimized controllable fine-tuning mechanism that preserves original generative capacity while enabling fine-grained text conditioning; and (iii) a text–latent space alignment strategy to enhance semantic consistency. Contribution/Results: To our knowledge, this is the first approach to achieve high-fidelity, text-driven generation across multiple low-resource multimodal domains without relying on labeled paired data. It significantly outperforms existing supervised and weakly supervised baselines on molecular conformation generation, human motion synthesis, and multivariate time-series forecasting, while effectively mitigating catastrophic forgetting. Our framework establishes a novel paradigm for cross-modal generation under fully unsupervised conditions.

Technology Category

Application Category

📝 Abstract
Natural language serves as a common and straightforward signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data that is semantically coherent with textual instructions. While strides have been made in text-to-data generation spanning image editing, audio synthesis, video creation, and beyond, low-resource areas characterized by expensive annotations or complex data structures, such as molecules, motion dynamics, and time series, often lack textual labels. This deficiency impedes supervised learning, thereby constraining the application of advanced generative models for text-to-data tasks. In response to these challenges in the low-resource scenario, we propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model. Subsequently, it undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting. Comprehensive experiments demonstrate that Text2Data is able to achieve enhanced performance regarding controllability across various modalities, including molecules, motions and time series, when compared to existing baselines.
Problem

Research questions and friction points this paper is trying to address.

Generating data with textual control in low-resource domains
Addressing lack of textual labels for complex data structures
Enabling supervised learning where annotations are expensive
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised diffusion model for data distribution
Constraint optimization-based controllable finetuning
Cross-modal application to molecules and time series
🔎 Similar Papers
No similar papers found.
S
Shiyu Wang
Salesforce AI Research
Yihao Feng
Yihao Feng
Apple AIML
Machine LearningReinforcement Learning
T
Tian Lan
Salesforce AI Research
N
Ning Yu
Salesforce AI Research
Y
Yu Bai
Salesforce AI Research
R
Ran Xu
Salesforce AI Research
H
Huan Wang
Salesforce AI Research
Caiming Xiong
Caiming Xiong
Salesforce Research
Machine LearningNLPComputer VisionMultimediaData Mining
Silvio Savarese
Silvio Savarese
Associate Professor of Computer Science at Stanford University
Computer vision