Current
SPD Stochastic Parametric Decomposition
Attention Motifs understanding attention heads by organizing them via the patterns they produce
Cyborg Language Models replacing LLM components with explicitly programmed features for interpretability
Past research
Understanding Search in Transformers
mechanistic interpretability of transformer networks trained on toy spatial tasks (mazes)Inverse Scaling
resistance to prompt injection might be an example of a task which exhibits inverse scaling in large transformer networksCE-learn
Studying learning in the C. elegans nematodeRL-dreams
value function robustness for world-model RL by giving it weird dreamsBNBP
Training of Hodgkin-Huxley neural nets with SGDML for medical imaging
Segmentation of CT images using a U-Net architecture