AgoraResearch hub
ExploreLibraryProfile
Account
Jan Betley
Scholar

Jan Betley

Google Scholar ID: TT2YCN0AAAAJ
TruthfulAI
LLMsAI safety
Google Scholar↗
Citations & Impact
All-time
Citations
233
 
H-index
7
 
i10-index
5
 
Publications
11
 
Co-authors
0
 
Contact
No contact links provided.
Publications
7 items
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
2026
Cited
0
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
2025
Cited
0
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
2025
Cited
0
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
2025
Cited
0
Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models
2025
Cited
0
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
2025
Cited
0
Tell me about yourself: LLMs are aware of their learned behaviors
2025
Cited
0
Resume (English only)
Co-authors
0 total
Co-authors: 0 (list not available)

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?