@BoWang87
Two major AI releases this week: • Qwen3.5 — new open-source small models • GPT-5.4 — newest frontier closed model Most benchmarks compare math and coding. But the real test for frontier AI should be biology and healthcare. That’s where mistakes actually matter. So our team at @UHN ran them on EURORAD — 207 expert-validated radiology differential diagnosis cases. Results: GPT-5.4: 92.2% Qwen3.5-27B: 85% Gemini 3.1 Pro: ~79% A 27B open model that runs on a laptop is only 7 points behind the most powerful AI model on earth — and already beating Gemini on this benchmark. That gap is much smaller than people expected. And it matters. For years hospitals faced an impossible tradeoff: Frontier models → patient data leaves the hospital Local models → not good enough That tradeoff may finally be ending. Qwen3.5-27B runs fully local. No API. No cloud. No patient data leaving the building. HIPAA / PHIPA compliance becomes architecture, not paperwork. Interesting detail: 27B and 122B score almost identically here. Scaling bigger didn’t help much. One caveat: with web-scale training, it’s hard to completely rule out that frontier models like GPT-5.4 may have seen parts of evaluation datasets. Still, the signal is clear: Small models are getting good enough for real clinical AI. And if we want to measure real AI progress, biology and healthcare should be the benchmark. Huge credit to the team @alifmunim @AlhusainAbdalla @JunMa_AI4Health @Omar_Ibr12 @oliviaamwei