Scholar

Alex Mallen

Google Scholar ID: EZe6n8EAAAAJ

Redwood Research

AI evaluationsscalable oversightinterpretabilityAI alignment

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,567

H-index

8

i10-index

7

Publications

18

Co-authors

6

list available

Contact

Publications

3 items

Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment

2025

Cited

0

Why Do Some Language Models Fake Alignment While Others Don't?

2025

Cited

0

Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?

2024

Cited

0

Resume (English only)

Co-authors

6 total

Hannaneh Hajishirzi

University of Washington; Allen AI

Daniel Khashabi

Johns Hopkins University

Allen Institute for AI, Carnegie Mellon University

Professor of Applied Mathematics & Electrical and Computer Engineering