🤖 AI Summary
Fine-tuning large language models (LLMs) for multi-domain reasoning remains hindered by reliance on large-scale, costly-to-annotate datasets and prohibitive computational overhead.
Method: We propose NanoFlux—a lightweight, adversarial framework featuring an attacker-defender dual-model architecture, augmented by a tool-enhanced referee model that automatically generates high-quality, multi-step reasoning questions with explanatory annotations. It further incorporates embedded novelty filtering and multi-hop reasoning evaluation to enable automated synthesis and intelligent curation of compact, high-fidelity training data (“small but precise”).
Contribution/Results: On a 4B-parameter model, fine-tuning with only a small set of generated samples yields +5.9%, +3.6%, and +16.6% improvements in mathematical, scientific, and medical reasoning accuracy, respectively, while reducing computational cost by 3–14×. Crucially, we empirically uncover a non-monotonic relationship between data quality and model performance—demonstrating the superior optimization potential of highly targeted, minimal datasets.
📝 Abstract
We present NanoFlux, a novel adversarial framework for generating targeted training data to improve LLM reasoning, where adversarially-generated datasets containing fewer than 200 examples outperform conventional fine-tuning approaches. The framework employs a competitive dynamic between models alternating as Attacker and Defender, supervised by a tool-augmented Judge, synthesizing multi-step questions with explanatory annotations that target specific reasoning capabilities. Fine-tuning a 4B-parameter model on NanoFlux-generated data yields performance gains across diverse domains compared to full-benchmark fine-tuning: +5.9% on mathematical reasoning (GSMHard), +3.6% on scientific reasoning (GenomeBench), and +16.6% on medical reasoning (MultiMedQA), while reducing computational requirements by 3-14x. Ablation studies reveal a non-monotonic relationship between dataset characteristics and model performance, uncovering domain-specific optimal points for question complexity and reasoning quality. NanoFlux automates training data generation through embedding-based novelty filtering, tool-augmented evaluation, and multi-hop reasoning, suggesting that future model improvements may lie in the intelligent synthesis of small, precisely targeted training datasets.