@arankomatsuzaki
@teortaxesTex @TheZvi As for context length, I think we will have more sophisticated context management, which should have greater impact than brute-force extension of attention span extension as 100M tokens length. No need to have everything stored as kv cache but make sure to store all the info organized and reachable in some way, and we can possibly store the table of contents fo the recipe in the kv cache. For example, we humans remember all the memorable GRPO papers w/o every fine-details, as we remember how to retrieve them (i.e. how to search the paper and which pages to go to find the relevant info, etc). This and some Claude Code updates may be relevant: https://t.co/FKWvyoSRFs