Do GFlowNets Transfer? Case Study on the Game of 24/42

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates the zero-shot transferability of Generative Flow Networks (GFlowNets) across arithmetic reasoning tasks, using Game of 24 as the source and Game of 42 as the target task. Method: We systematically analyze GFlowNet generalization via combinatorial search space modeling and a joint diversity–accuracy evaluation framework, complemented by controlled fine-tuning experiments on small- and medium-scale LLMs. Contribution/Results: We find that GFlowNets suffer a 37% drop in solution diversity and a 29% decline in accuracy during cross-task transfer, revealing their strong dependence on task-specific priors—particularly operator distributions and numeric constraints. This is the first systematic identification of flow-structure bottlenecks in symbolic reasoning transfer, challenging the assumption that generative reasoning models admit direct cross-task deployment. Fine-tuning LLMs fails to fundamentally alleviate this limitation. Our findings establish a critical empirical benchmark and theoretical caution for the transferability of generative reasoning models.

Technology Category

Application Category

📝 Abstract

Generating diverse solutions is key to human-like reasoning, yet autoregressive language models focus on single accurate responses, limiting creativity. GFlowNets optimize solution generation as a flow network, promising greater diversity. Our case study shows their limited zero-shot transferability by fine-tuning small and medium-sized large language models on the Game of 24 and testing them on the Game of 42 datasets. Results revealed that GFlowNets struggle to maintain solution diversity and accuracy, highlighting key limitations in their cross-task generalization and the need for future research in improved transfer learning capabilities.

Problem

Research questions and friction points this paper is trying to address.

GFlowNets struggle with zero-shot transferability across tasks.

Limited diversity and accuracy in cross-task generalization.

Need for improved transfer learning in GFlowNets.

Innovation

Methods, ideas, or system contributions that make the work stand out.

GFlowNets optimize solution generation diversity

Fine-tuning small and medium-sized language models

Testing cross-task generalization on Game datasets

🔎 Similar Papers

No similar papers found.

Authors to Follow