Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Selected publications include 'Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective', 'Quadratic Gating Functions in Mixture of Experts: A Statistical Insight', 'Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts', and more, which have been accepted or are under review in conferences such as ICLR and ICML.
Research Experience
Currently working on two primary research directions: 1. Efficient Training and Inference for Foundation Models, focusing on statistical efficiency and training dynamics of Mixture of Experts (MoE) architectures to enhance scalability and performance in large foundation models; 2. Time Series Foundation Models, exploring the fundamental limits and methodologies for developing scalable and generalizable models for time series analysis, with an emphasis on enhancing numerical reasoning capabilities.
Education
Prior to joining UT Austin, completed a Bachelor’s degree in Electrical Engineering with a minor in Computer Engineering at the University of Tehran, Iran.
Background
A PhD student in the Electrical and Computer Engineering Department at the University of Texas at Austin, advised by Prof. Nhat Ho and Prof. Atlas Wang. Broadly interested in theoretical and practical aspects of modern machine learning, with a focus on understanding the fundamental principles of designing and training scalable and efficient foundation models.