Your curated collection of saved posts and media
The founder of Cursor wrote a banger. This is a must read. π
Xcode 26.3 with Claude Agent & Codex hits the Mac App Store today! With advanced reasoning capabilities in Xcode, you can streamline workflows and build faster. And MCP support lets you easily connect other compatible agents. https://t.co/88NjaznE6E
My new favorite tmux dev layout features @opencode (with Kimi K2.5 running on @FireworksAI_HQ) on top and Claude Code on the bottom. I start almost all agent tasks with Kimi (so fast!), then ask Claude if I need a second opinion/more advanced stuff. Great combo! https://t.co/cUxfPgHFlW
How can graphs improve coding agents? Multi-agent systems can boost code generation, but fixed interaction topologies don't adapt to task difficulty. This research introduces AgentConductor, a system where an orchestrator agent uses RL to dynamically generate task-adapted interaction topologies based on inferred agent roles and difficulty levels. A topological density function that captures communication-aware characterizations of multi-agent interactions, plus difficulty interval partitioning that prevents excessive pruning and provides precise topology control. Across five code datasets, AgentConductor achieves up to 14.6% improvement in pass@1 accuracy while reducing density by 13% and token costs by 68%. The great benefit of this approach is better performance with lower costs. Dynamic agent coordination is more efficient than static workflows for complex code generation. Paper: https://t.co/BypJZfU49q Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
If you want to get started with Claude Cowork, look no further. I recorded this 1hr session on how to use Cowork. Powerful for knowledge work like Claude Code. But I also use it for image generation with Skills. Has a nice guide to go along with it. https://t.co/u14Z2MemM9
Wait, what?! PewDiePie using @axolotl_ai for his project! π₯ https://t.co/vnXeDfMzcc
π pplx-embed is @perplexity_ai new collection of state-of-the-art multilingual embedding models optimized for real-world, web-scale retrieval tasks! - Built on Qwen3 w/ diffusion-based pretraining and bidirectional attention - Available at 0.6B and 4B parameters w/ native INT8 quantization - pplx-embed-v1 for independent text embeddings - pplx-embed-context-v1 for document chunks in RAG - Validated on real-world search scenarios over tens of millions of documents - Permissive MIT License - Available on the @huggingface Hub, and supported on Text Embeddings Inference, Sentence Transformers, and Transformers.js
Want to bring open-source vision language models to the edge? π» Check out our @huggingface article on deploying NVIDIA Cosmos Reasoning 2B across the NVIDIA Jetson family with vLLM and a Live VLM WebUI. π https://t.co/Tp0tZtjgRp https://t.co/tytkmCRJzx
Thanks @_akhaliq for featuring our work! Detailed thread can be found here https://t.co/8FgHQYCPht
Imagination Helps Visual Reasoning, But Not Yet in Latent Space Causal mediation analysis reveals latent visual reasoning in MLLMs fails: latent tokens ignore inputs and barely affect answers. CapImagine, a text-based alternative, teaches explicit imagination and significantly outperforms latent baselines.
Top AI Papers of The Week (Feb 24 - Mar 2) - A Very Big Video Reasoning Suite: 200 tasks, 1M+ video clips for video reasoning research - Does Your Reasoning Model Implicitly Know When to Stop Thinking? Introducing SAGE paradigm - AgentFly: Fine-tuning LLM agents without fine-tuning LLMs - Microsoft rStar2-Agent: 80.6% on AIME24 with just 14B parameters - From Blind Spots to Gains: Diagnostic-driven iterative training for LMMs - VibeVoice: Synthesizing 90-minute multi-speaker conversational speech - Alibaba MobilityBench: Benchmarking real-world route-planning agents - NVIDIA's data engineering strategies for scaling LLM terminal capabilities - VESPO: Variational sequence-level soft policy optimization for stable RL training - Beyond Pass@1: Self-play with variational problem synthesis sustains RLVR Find them below:
Top AI Papers of The Week (Feb 24 - Mar 2) - A Very Big Video Reasoning Suite: 200 tasks, 1M+ video clips for video reasoning research - Does Your Reasoning Model Implicitly Know When to Stop Thinking? Introducing SAGE paradigm - AgentFly: Fine-tuning LLM agents without fine-tuning LLMs - Microsoft rStar2-Agent: 80.6% on AIME24 with just 14B parameters - From Blind Spots to Gains: Diagnostic-driven iterative training for LMMs - VibeVoice: Synthesizing 90-minute multi-speaker conversational speech - Alibaba MobilityBench: Benchmarking real-world route-planning agents - NVIDIA's data engineering strategies for scaling LLM terminal capabilities - VESPO: Variational sequence-level soft policy optimization for stable RL training - Beyond Pass@1: Self-play with variational problem synthesis sustains RLVR Find them below:
Thanks AK for reposting our work! Here are all the links for anyone who wants to check out more! Paper:Β https://t.co/6PajZXj6V0 Project Website:Β https://t.co/5VTiCqTDhN EvalKit:Β https://t.co/lxhyzMaI8j Cloud Infra:Β https://t.co/QNJRfOKQN3 Training Set:Β https://t.co/DlzLojQjsR Eval Set:Β https://t.co/Tzs2jAN99C Leaderboard:Β https://t.co/peZ1XkelYY Model:Β https://t.co/gFFJofrlNR

TranslateGemma 4B by @GoogleDeepMind now runs 100% in your browser on WebGPU with Transformers.js v4. 55 languages. No server. No data leaks. Works offline. A 4B parameter translation powerhouse, right in your browser. Try the demo π https://t.co/YgYskHqBRm
What happens when you make an LLM drive a car where physics are real and actions can't be undone? I ported CARLA, the autonomous driving simulator, to OpenEnv and added training via TRL + HF Spaces In 50 steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians https://t.co/QR4FJS70h7
Marco built Reachy Phone Home so Reachy Mini can detect when youβre on your phone, using @Ultralytics YOLO26 vision, and respond in real time with voice + motion. Built on Arm (Apple Mac / Raspberry Pi 5) with @huggingface π€ + @pollenrobotics π¦Ύ, itβs now an award-winning project, earning an @NVIDIAGTC Golden Ticket π It's great to see our developers build and win in the open AI ecosystem π https://t.co/C8atY3fwLv
Impressive inference speed from Inception Labsβ diffusion LLMs. Diffusion LLMs are a fascinating alternative to conventional autoregressive LLMs. Well done @StefanoErmon and team!
@idzikbartosz It's weird because logit softcap is not a standard feature you'll see in many LLMs, but somehow in the specific state nanochat is in I can't seem to remove it, everything I tried made the performance worse.
Even after the steep progress of the past 3 months, it remains that AI performance is tied to task familiarity. In domains that can be densely sampled (via programmatic generation + verification), performance is effectively unbounded, and will keep increasing from current levels. In novel, unfamiliar domains, performance remains low and further progress still requires new ideas, not just more data and compute.
For benchmarks that target novel tasks, a common form of benchmark hacking that arbitrages this gap is to generate a dense sampling of potential tasks by manually parameterizing the space and then brute-forcing it. Very expensive but it works. There's little you can do to restore benchmark validity here besides increasing the dimensionality of the task space.
By explicitly training on specific tasks, we ended up covering a very large area (in absolute terms) of the space of all possible tasks humans can do, but this large area only amounts to 0.00...01% of the total space. And that's why we still need general intelligence.
@mwcrutcher I don't have a shared expert in that figure, so that should be correct. Regarding routing details: yeah, covering those for all archs would be a nice interesting MoE future article
@mwcrutcher No worries and thanks for the follow-up. I am not sure I am seeing the problem correctly. I.e. out of the 8 routed experts, are the *not* (weighted) summing over them? Or do you mean the top-k expert selection + weighted sum should be shown in more detail?
@DnuLkjkjh In my experience, if the teacher model is too good and too different, it's a bit harder for the small student model to learn. Probably because it's too OOD. So it makes sense to first distill from medium-sized, more similar models before using data from larger teachers.