Scholar

Jan Betley

Google Scholar ID: TT2YCN0AAAAJ

TruthfulAI

LLMsAI safety

Google Scholar↗

Citations & Impact

All-time

Citations

233

H-index

7

i10-index

5

Publications

11

Co-authors

0

Contact

No contact links provided.

Publications

6 items

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

2025

Cited

0

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

2025

Cited

0

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

2025

Cited

0

Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models

2025

Cited

0

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

2025

Cited

0

Tell me about yourself: LLMs are aware of their learned behaviors

2025

Cited

0

Resume (English only)

Co-authors

0 total

Co-authors: 0 (list not available)