ZeroLM: Data-Free Transformer Architecture Search for Language Models

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing zero-cost proxy methods exhibit poor performance in Transformer architecture ranking—often underperforming even simple parameter count baselines—and suffer from high search overhead, overfitting susceptibility, and modeling complexity. This paper proposes a training-free, gradient-free zero-cost method for Transformer architecture search. We introduce the first weight-statistics-based proxy metric and, for the first time, decouple Transformers into functional submodules, dynamically weighting their capacity contributions to overcome ranking performance bottlenecks. On the FlexiBERT benchmark, our method achieves a Spearman correlation of 0.76 and Kendall’s tau of 0.53 with ground-truth validation accuracy, while incurring near-zero search cost. It demonstrates strong cross-task robustness and significantly outperforms existing zero-cost approaches.

Technology Category

Application Category

📝 Abstract
Neural architecture search (NAS) provides a systematic framework for automating the design of neural network architectures, yet its widespread adoption is hindered by prohibitive computational requirements. Existing zero-cost proxy methods, while reducing search overhead, demonstrate inadequate performance in architecture ranking tasks, particularly for Transformer-based models where they often underperform simple parameter counting metrics. Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics computation while decomposing Transformer architectures into functionally distinct sub-modules, thereby optimizing the balance of their contributions to overall performance. Our comprehensive evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark. The proposed method exhibits exceptional computational efficiency while maintaining robust performance across diverse NAS benchmark tasks, offering a practical solution for large-scale architecture search.
Problem

Research questions and friction points this paper is trying to address.

Automates neural architecture design with low computational cost
Improves ranking accuracy for Transformer-based model architectures
Balances sub-module contributions to enhance overall model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-cost proxy for efficient architecture ranking
Decomposes Transformers into functional sub-modules
Uses weight statistics to quantify model capacity
🔎 Similar Papers
No similar papers found.
Zhen-Song Chen
Zhen-Song Chen
Associate Professor at School of Civil Engineering, Wuhan University
Large Language ModelsConstruction ManagementDecision SupportSupply Chain Management
H
Hong-Wei Ding
School of Civil Engineering, Wuhan University, Wuhan 430072, China
X
Xian-Jia Wang
Economic and Management School, Wuhan University, Wuhan 430071, China
Witold Pedrycz
Witold Pedrycz
Unknown affiliation