Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning
Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning
Year: 2025 - 2026
Role: AI Researcher
Duration: ~7 months
Relevant Links: Preliminary preprint, First blogpost, Second blogpost
We worked on data poisoning attacks on frontier models at the post training stage, and evals, dataset level defences and audits to defend against them.
This work was supervised by Mary Phuong (Google DeepMind). The project has received $120K+ in extended funding and was be submitted to a top AI conference (ICML 2026).
A preliminary preprint is available below for anyone interested. We also presented a poster (on the left) at the Ellis UnConference and wrote a blogpost about one part of our project on LessWrong.