@omarsar0
Where the field is headed (agentic workflows with advanced tool/computer use) open-source code LLMs are going to be a big deal! Great to see this new effort, OpenCode, a fully open-source LLM specialized for code generation and understanding. Main factors for building high-performing code LLMs: - effective data cleaning with code-optimized heuristic rules for deduplication, - recall of relevant text corpus related to code - high-quality synthetic in both annealing and supervised fine-tuning stages OpenCoder surpasses previous fully open models at the 6B+ parameter scale and releases not just the model weights but also the complete training pipeline, datasets, and protocols to enable reproducible research.