@ziv_ravid
Some more thoughts about Yann interview: Even if LLMs work great, that's missing the point. Everyone's doing the same thing now. More scale, more data, longer CoT, tweak RL. But the path to get there was completely stochastic. Attention, transformers, scaling laws, RLHF, none of it was obvious, it came from people trying very different things. We wouldn't be here if everyone had agreed early on which direction to go. Assuming the next leap comes from all of us optimizing the same recipe is dangerous. I agree that the situation today is indeed way more complicated. Back then you could test a wild idea on a few GPUs. Now every serious bet costs millions in compute. That makes it harder to explore, which makes the convergence problem even worse. But that's exactly why we need to be more intentional about funding diverse approaches, not less.