@neev_parikh
tldr: you can RL qwen3 8b to fool gpt-4o that it's not doing a hidden side task (when it is) this is somewhat surprising given the disparity in model capabilities between an 8b agent and gpt-4o as a relatively strong monitor https://t.co/TRefJQa8Rz