Scholar
Bilal Chughtai
Google Scholar ID: i-L98bwAAAAJ
Google DeepMind
AI Safety
Mechanistic Interpretability
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
355
H-index
8
i10-index
7
Publications
10
Co-authors
24
list available
Contact
Email
brchughtaii@gmail.com
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
5 items
Building Production-Ready Probes For Gemini
2026
Cited
1
Difficulties with Evaluating a Deception Detector for AIs
2025
Cited
0
Detecting Strategic Deception Using Linear Probes
2025
Cited
0
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
2025
Cited
0
Open Problems in Mechanistic Interpretability
2025
Cited
0
Resume (English only)
Academic Achievements
Book Summary: Zero to One
Detecting strategic deception using linear probes
Open problems in mechanistic interpretability
Activation space interpretability may be doomed
Understanding positional features in layer 0 SAEs
Unlearning via RMU is mostly shallow
Transformer circuit faithfulness metrics are not robust
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Research Experience
Research Engineer @ Google DeepMind
Background
Research interests include language model interpretability and AGI safety. Aiming to make the development of transformative AI go well for humanity.
Miscellany
Personal interests include writing and sharing other interests on this blog.
Co-authors
24 total
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Owain Evans
Affiliate, CHAI, UC Berkeley
Co-author 3
Jérémy Scheurer
Apollo Research
Mikita Balesni
Research Scientist, Apollo Research
Alexander Meinke
Apollo Research
Lawrence Chan
PhD Student, UC Berkeley
Nicholas Goldowsky-Dill
Research Scientist, Apollo Research
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up