Bilal Chughtai
Scholar

Bilal Chughtai

Google Scholar ID: i-L98bwAAAAJ
Google DeepMind
AI SafetyMechanistic Interpretability
Citations & Impact
All-time
Citations
355
 
H-index
8
 
i10-index
7
 
Publications
10
 
Co-authors
24
list available
Resume (English only)
Academic Achievements
  • Book Summary: Zero to One
  • Detecting strategic deception using linear probes
  • Open problems in mechanistic interpretability
  • Activation space interpretability may be doomed
  • Understanding positional features in layer 0 SAEs
  • Unlearning via RMU is mostly shallow
  • Transformer circuit faithfulness metrics are not robust
  • Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Research Experience
  • Research Engineer @ Google DeepMind
Background
  • Research interests include language model interpretability and AGI safety. Aiming to make the development of transformative AI go well for humanity.
Miscellany
  • Personal interests include writing and sharing other interests on this blog.