Scholar

Bilal Chughtai

Google Scholar ID: i-L98bwAAAAJ

Google DeepMind

AI SafetyMechanistic Interpretability

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

355

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailbrchughtaii@gmail.com TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

5 items

Building Production-Ready Probes For Gemini

2026

Cited

Difficulties with Evaluating a Deception Detector for AIs

2025

Cited

Detecting Strategic Deception Using Linear Probes

2025

Cited

Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities

2025

Cited

Open Problems in Mechanistic Interpretability

2025

Cited

Resume (English only)

Academic Achievements

Book Summary: Zero to One
Detecting strategic deception using linear probes
Open problems in mechanistic interpretability
Activation space interpretability may be doomed
Understanding positional features in layer 0 SAEs
Unlearning via RMU is mostly shallow
Transformer circuit faithfulness metrics are not robust
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

Research Experience

Research Engineer @ Google DeepMind

Background

Research interests include language model interpretability and AGI safety. Aiming to make the development of transformative AI go well for humanity.

Miscellany

Personal interests include writing and sharing other interests on this blog.

Co-authors

24 total

Neel Nanda

Mechanistic Interpretability Team Lead, Google DeepMind

Owain Evans

Affiliate, CHAI, UC Berkeley

Research Scientist, Apollo Research

Alexander Meinke

Apollo Research

Lawrence Chan

PhD Student, UC Berkeley

Nicholas Goldowsky-Dill

Research Scientist, Apollo Research