@emollick
Deleted this post because Kevin is right - it isn’t clear that a better prompted LLM, or a consensus or pass experiment with multiple attempts, would not be able to solve more of these problems. Testing LLMs is challenging for these reasons. https://t.co/nM6wc4Rg7b