🤖 AI Summary
This paper addresses the high computational cost, substantial carbon emissions, and deployment challenges arising from the uncontrolled scaling of large language models (LLMs). We propose a systematic downscaling paradigm that prioritizes performance preservation. Methodologically, we integrate multiple lightweighting techniques—including knowledge distillation, structured pruning, quantization-aware training, and task-adaptive sparsification—and establish a comprehensive green AI evaluation framework centered on sustainability metrics. Our key contribution is the first formal articulation of a tripartite LLM evaluation axis—efficiency, environmental sustainability, and accessibility—challenging the prevailing “bigger-is-better” paradigm. Experiments across multiple benchmarks demonstrate that our approach reduces model parameter count by 60% and inference energy consumption by 75%, while retaining over 95% of the original performance. These results validate the feasibility and superiority of performance-preserving downscaling as a viable alternative to brute-force scaling.
📝 Abstract
We challenge the dominant focus on neural scaling laws and advocate for a paradigm shift toward downscaling in the development of large language models (LLMs). While scaling laws have provided critical insights into performance improvements through increasing model and dataset size, we emphasize the significant limitations of this approach, particularly in terms of computational inefficiency, environmental impact, and deployment constraints. To address these challenges, we propose a holistic framework for downscaling LLMs that seeks to maintain performance while drastically reducing resource demands. This paper outlines practical strategies for transitioning away from traditional scaling paradigms, advocating for a more sustainable, efficient, and accessible approach to LLM development.