Scholar
Alex Mallen
Google Scholar ID: EZe6n8EAAAAJ
Redwood Research
AI evaluations
scalable oversight
interpretability
AI alignment
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
1,567
H-index
8
i10-index
7
Publications
18
Co-authors
6
list available
Contact
GitHub
Open ↗
Publications
3 items
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
2025
Cited
0
Why Do Some Language Models Fake Alignment While Others Don't?
2025
Cited
0
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
2024
Cited
0
Resume (English only)
Co-authors
6 total
Hannaneh Hajishirzi
University of Washington; Allen AI
Daniel Khashabi
Johns Hopkins University
Akari Asai
Allen Institute for AI, Carnegie Mellon University
J. Nathan Kutz
Professor of Applied Mathematics & Electrical and Computer Engineering
Co-author 5
Co-author 6
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up