InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the high computational overhead and privacy risks hindering edge deployment of large language models (LLMs) and multimodal LLMs (MLLMs), this paper proposes a lightweight training paradigm tailored for efficient inference—achieving, for the first time, small-model performance on complex reasoning tasks comparable to that of large models. Methodologically, we integrate instruction tuning, chain-of-thought distillation, and multimodal alignment optimization to build an end-to-end text–vision joint reasoning framework. Our contributions are fourfold: (1) We release high-performance small language models (SLMs) and multimodal SLMs (MSLMs) with <1B parameters; (2) These models achieve state-of-the-art (SOTA) results on major reasoning benchmarks; (3) Inference latency is reduced by 5×, enabling deployment on real-world edge devices; (4) Significant reductions in computational resource consumption and data privacy leakage risk are achieved—thereby jointly optimizing efficiency, security, and deployability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have made significant advancements in reasoning capabilities. However, they still face challenges such as high computational demands and privacy concerns. This paper focuses on developing efficient Small Language Models (SLMs) and Multimodal Small Language Models (MSLMs) that retain competitive reasoning abilities. We introduce a novel training pipeline that enhances reasoning capabilities and facilitates deployment on edge devices, achieving state-of-the-art performance while minimizing development costs. InfR~ aims to advance AI systems by improving reasoning, reducing adoption barriers, and addressing privacy concerns through smaller model sizes. Resources are available at https://github. com/Reallm-Labs/InfiR.

Problem

Research questions and friction points this paper is trying to address.

Developing efficient Small Language Models

Enhancing reasoning capabilities on edge devices

Addressing privacy concerns with smaller models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops efficient Small Language Models

Introduces novel training pipeline

Facilitates deployment on edge devices

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting