Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem (2024 arXiv)
Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking (2023 NeurIPS)
Noise is not the main factor behind the gap between SGD and Adam on transformers, but sign descent might be (2023 ICLR)
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent (2021 AISTATS)
BackPACK: Packing more into backprop (2020 ICLR)
Research Experience
Post-doctoral researcher at INRIA Paris, focusing on optimization for machine learning.
Education
Received his PhD from UBC in 2024, under the supervision of Mark Schmidt; studied at EPFL with Martin Jaggi, and interned at MPI with Philipp Hennig and at RIKEN with Emtiyaz Khan.
Background
A post-doctoral researcher at INRIA Paris in the Sierra team, working on optimization for machine learning. Advised by Francis Bach.
Miscellany
Developed Tex2UTF8, a tool for converting LaTeX to UTF8; created DSDL, an automated dataset downloader for libsvm datasets.