@omarsar0
Reasoning models are expensive. Not because the models are huge. It's because they generate thousands of tokens just to think. But what if smaller models could learn to reason efficiently? This new paper compares training 12B models on reasoning traces from two frontier systems: - DeepSeek-R1 - gpt-oss (OpenAI's open-source reasoner) The key finding: gpt-oss traces produce 4x more efficient reasoning. DeepSeek-R1 averages ~15,500 tokens per response. gpt-oss averages ~3,500 tokens. Yet accuracy stays nearly identical across benchmarks. Verbose reasoning doesn't mean better reasoning. Why does this matter? Inference cost scales linearly with tokens. If your reasoning model generates 4x fewer tokens with the same accuracy, you cut costs by 75%. That's a massive efficiency gain. Interesting observation: Nemotron base models already had DeepSeek-R1 traces in pretraining. Training loss on DeepSeek traces started low and stayed flat. Training loss on gpt-oss traces started high and dropped gradually. They showed that the model was learning something new, which also means you can distill reasoning capabilities from frontier models into smaller systems. But the source matters. Different reasoning styles produce different efficiency profiles. (bookmark it) Paper: arxiv. org/abs/2511.19333