Scholar

Chris Cundy

Google Scholar ID: TWAn5XoAAAAJ

FAR.AI

Machine LearningArtificial IntelligenceReinforcement Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,318

H-index

i10-index

Publications

Co-authors

Contact

CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

10 items

The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes

2026

Cited

Auditing Games for Sandbagging

2025

Cited

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

2025

Cited

Preference Learning with Lie Detectors can Induce Honesty or Evasion

2025

Cited

Bare Minimum Mitigations for Autonomous AI Development

2025

Cited

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

2025

Cited

AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations

2025

Cited

No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data

2025

Cited

Resume (English only)

Academic Achievements

- Publication: Preference Learning with Lie Detectors can Induce Honesty or Evasion
- Publication: SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
- Publication: Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients
- Publication: LMPriors: Pre-Trained Language Models as Task-Specific Priors
- Publication: IQ-Learn: Inverse soft-Q Learning for Imitation

Research Experience

- Research Scientist at FAR AI
- During his PhD, he studied a diverse range of topics including constrained reinforcement learning, variational inference, and autoregressive models
- Interned at the Centre for Human Compatible AI, the Future of Humanity Institute at Oxford University, and DeepMind

Education

- PhD in Computer Science, 2018-2024, Stanford University, advised by Stefano Ermon
- MEng in Computer Science, 2017, Cambridge University
- BA in Natural Sciences (Physics), 2016, Cambridge University, worked under Carl E. Rasmussen

Background

A Research Scientist focusing on reducing catastrophic risks from advanced AI systems. Interests include Deceptive Behavior from LLMs, Risk Evaluation and Elicitation, Governance of Frontier Models, Adversarial Robustness, and Probabilistic Machine Learning.

Miscellany

Contact via email: chris dot j dot cundy at gmail dot com.

Co-authors

0 total

Co-authors: 0 (list not available)