Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning

Client: LASR Labs, Google DeepMind

Year: 2025 - 2026

Role: AI Researcher

Duration: ~7 months

Relevant Links: Preliminary preprint, First blogpost, Second blogpost

We worked on data poisoning attacks on frontier models at the post training stage, and evals, dataset level defences and audits to defend against them.

This work was supervised by Mary Phuong (Google DeepMind). The project has received $120K+ in extended funding and was be submitted to a top AI conference (ICML 2026).

A preliminary preprint is available below for anyone interested. We also presented a poster (on the left) at the Ellis UnConference and wrote a blogpost about one part of our project on LessWrong.

Page updated

Google Sites

Report abuse