AgoraResearch hub
ExploreLibraryProfile
Account
Alex Mallen
Scholar

Alex Mallen

Google Scholar ID: EZe6n8EAAAAJ
Redwood Research
AI evaluationsscalable oversightinterpretabilityAI alignment
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
1,567
 
H-index
8
 
i10-index
7
 
Publications
18
 
Co-authors
6
list available
Contact
GitHubOpen ↗
Publications
3 items
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
2025
Cited
0
Why Do Some Language Models Fake Alignment While Others Don't?
2025
Cited
0
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
2024
Cited
0
Resume (English only)
Co-authors
6 total
Hannaneh Hajishirzi
Hannaneh Hajishirzi
University of Washington; Allen AI
Daniel Khashabi
Daniel Khashabi
Johns Hopkins University
Akari Asai
Akari Asai
Allen Institute for AI, Carnegie Mellon University
J. Nathan Kutz
J. Nathan Kutz
Professor of Applied Mathematics & Electrical and Computer Engineering
Co-author 5
Co-author 5
Co-author 6
Co-author 6

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?