@emollick
So Qwen 3-235B with thinking seems good, but not blowing away any of my weird frontier tests, some of which DeepSeek r1 did better. It did okay generating a p5js starship (though it had errors to correct), but failed the Lem Test and couldn't do a twigl shader in many attempts. https://t.co/bcdtTXq3HZ