2024: Two papers accepted to NeurIPS 2024 — TC-MDP (a new algorithm for Robust RL) and a theoretical paper on optimal sample complexity of Robust MDPs (co-authored with Laixi).
2024: One paper accepted at ICML 2024 on Bandits with Variational Inference; one oral presentation at UAI 2024 on Robust MDP theory.
May 2024: Released three preprints — RRLS (a new benchmark for Robust RL), and two new algorithms: ExpectRL and TC-MDP.
January 2025: Contributed to Cohere’s AI model "Command A", fine-tuned using theoretically grounded RLHF algorithms CoPG and SRPO.