π€ AI Summary
To address the inherent trade-off between capability and computational cost in large language model (LLM) deployment, this paper introduces Jan-nanoβa lightweight 4B-parameter language model designed for efficient knowledge retrieval. Methodologically, it abandons conventional next-token prediction in supervised fine-tuning and instead proposes a task-driven, multi-stage RLVR (Reinforcement Learning for Verifiable Retrieval) framework, integrated with a Memory-Constrained Prompting (MCP) mechanism to natively support 128K context length. Jan-nano is fine-tuned from Qwen3-4B and achieves 83.2% accuracy on the SimpleQA benchmark. It enables efficient inference on a single consumer-grade GPU (e.g., RTX 4090). This work represents the first effort to deeply embed end-to-end reinforcement learning into the knowledge retrieval pipeline of a lightweight LLM, significantly lowering the deployment barrier for high-performance models.
π Abstract
Most language models face a fundamental tradeoff where powerful capabilities require substantial computational resources. We shatter this constraint with Jan-nano, a 4B parameter language model that redefines efficiency through radical specialization: instead of trying to know everything, it masters the art of finding anything instantly. Fine-tuned from Qwen3-4B using our novel multi-stage RLVR system that completely eliminates reliance on next token prediction training (SFT), Jan-nano achieves 83.2% on SimpleQA benchmark with MCP integration while running on consumer hardware. With 128K context length, Jan-nano proves that intelligence isn't about scale, it's about strategy.