Understanding Search in Transformers

Previously, I was at Conjecture working with Janus and Nicholas Kees on mechanistic interpretability for LLMs. As part of my thesis work and my role as a research lead for AI safety camp and UnSearch, we continued work on toy models trained on spatial tasks1 (in particular, mazes).

Some relevant outputs:

Please see github.com/understanding-search and unsearch.org for updates.

Not directly related to mechinterp for transformers, but we also did some work using maze-dataset on implicit networks: arxiv.org/abs/2410.03020


  1. See alignmentforum.org/posts/FDjTgDcGPc7B98AES/searching-for-search-4↩︎