Chris Cundy
Latest
The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
Auditing Games for Sandbagging
The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models
Preference Learning with Lie Detectors can Induce Honesty or Evasion
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients
LMPriors: Pre-Trained Language Models as Task-Specific Priors
IQ-Learn: Inverse soft-Q Learning for Imitation
Scalable Variational Approaches for Bayesian Causal Discovery
Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients
Flexible Approximate Inference via Stratified Normalizing Flows
Exploring hierarchy-aware inverse reinforcement learning
Parallelizing linear recurrent neural nets over sequence length
Investigating Variational Gaussian Process State-Space Models