Your curated collection of saved posts and media
Marco built Reachy Phone Home so Reachy Mini can detect when youβre on your phone, using @Ultralytics YOLO26 vision, and respond in real time with voice + motion. Built on Arm (Apple Mac / Raspberry Pi 5) with @huggingface π€ + @pollenrobotics π¦Ύ, itβs now an award-winning project, earning an @NVIDIAGTC Golden Ticket π It's great to see our developers build and win in the open AI ecosystem π https://t.co/C8atY3fwLv
Impressive inference speed from Inception Labsβ diffusion LLMs. Diffusion LLMs are a fascinating alternative to conventional autoregressive LLMs. Well done @StefanoErmon and team!
@idzikbartosz It's weird because logit softcap is not a standard feature you'll see in many LLMs, but somehow in the specific state nanochat is in I can't seem to remove it, everything I tried made the performance worse.
Even after the steep progress of the past 3 months, it remains that AI performance is tied to task familiarity. In domains that can be densely sampled (via programmatic generation + verification), performance is effectively unbounded, and will keep increasing from current levels. In novel, unfamiliar domains, performance remains low and further progress still requires new ideas, not just more data and compute.
For benchmarks that target novel tasks, a common form of benchmark hacking that arbitrages this gap is to generate a dense sampling of potential tasks by manually parameterizing the space and then brute-forcing it. Very expensive but it works. There's little you can do to restore benchmark validity here besides increasing the dimensionality of the task space.
By explicitly training on specific tasks, we ended up covering a very large area (in absolute terms) of the space of all possible tasks humans can do, but this large area only amounts to 0.00...01% of the total space. And that's why we still need general intelligence.
@mwcrutcher I don't have a shared expert in that figure, so that should be correct. Regarding routing details: yeah, covering those for all archs would be a nice interesting MoE future article
@mwcrutcher No worries and thanks for the follow-up. I am not sure I am seeing the problem correctly. I.e. out of the 8 routed experts, are the *not* (weighted) summing over them? Or do you mean the top-k expert selection + weighted sum should be shown in more detail?
@DnuLkjkjh In my experience, if the teacher model is too good and too different, it's a bit harder for the small student model to learn. Probably because it's too OOD. So it makes sense to first distill from medium-sized, more similar models before using data from larger teachers.
this is probably the nicest I've heard david goggins talk. craziest part is that the data wasn't even cleaned that well LOL. They were all just random youtube motivational shorts that I downloaded https://t.co/vpAFwjWxtC
@ivanleomk @rachpradhan https://t.co/OfWXd1VtPy my script is here. will make it a cli as well & an agent skill for data prep (probably the most important step) ran in a TTS->ASR loop w/ slopu...
The Copilot CLI - how to configure and use it, some tweaks I make and a few of my own workflows / custom agent. I was proud of this one. Pretty much a one shot. It's wild how easy it is to make content when a thing is good. https://t.co/79xZ4E6mqo
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
The theory explains some surprising results. For example, in an experiment where we taught Claude to cheat at coding, it also learned to sabotage safety guardrails. Why? Because pro-cheating training taught that the Claude character was broadly malicious. https://t.co/y6DHdnzfyC
GPT-5.2 derived a new result in theoretical physics. Weβre releasing the result in a preprint with researchers from @the_IAS, @VanderbiltU, @Cambridge_Uni, and @Harvard. It shows that a gluon interaction many physicists expected would not occur can arise under specific conditions. https://t.co/EAZhKWacsG
We're live for Agent Sessions Day! Right now we're exploring how the @code team builds with AI π https://t.co/V2nTK4y7L3
π£ @GoogleAIβs Gemini 3.1 Pro is now rolling out in public preview in GitHub Copilot. Early testing shows β‘οΈ High tool precision β achieving strong results with fewer tool calls β‘οΈ Effective and efficient edit-then-test loops Try it out in @code. https://t.co/oYCncQMfNX https://t.co/13r1FFEjpF
Come hang with @burkeholland and @pierceboggan to see what they were able to build live (and with no notice) during Agent Sessions Day https://t.co/V2nTK4yFAB
The @code community contributors website we built during the Agent Sessions Day stream today is up on GitHub! - Contributors by release - Leaderboard (PRs/releases) - "Ask Copilot" about contributions - Generate thank you messages with HeyGen avatars Repo: https://t.co/IODb6jxkGv Add more things and let's make this a real site to celebrate the @code community :)

When using Copilot CLI in terminal in @code , the agent will update the title in realtime. @burkeholland your loved feature is back! https://t.co/9wjxK8NWKu
Next edit suggestions just leveled-up in @code: with long-distance NES, you get edit suggestions anywhere in your file, not just near your cursor's position. Learn how the team built this - creating the training dataset, refining the UX, evaluating success, & more: https://t.co/xDaJRpikCi

in the 1 hour downtime of github, I finally got claude code teams setup with tmux. @AnthropicAI I finally have something to compare it with @augmentcode Intent. Both use tons of tokens but help you orchestrate discover-plan-build-eval-verify-precommit-commit-submit pr loops.
@theo Can we get a deeper dive on the Claude code CLI & codex app. Automations angle. I like codex for defined features & tests and can have it loop and push pr. I run Claude code in tmux with teams which allow for parallelizion but also human intervention..
Excited to launch Gemini 3.1 Pro! Major improvements across the board including in core reasoning and problem solving. For example scoring 77.1% on the ARC-AGI-2 benchmark - more than 2x the performance of 3 Pro. Rolling out today in @GeminiApp, @antigravity and more - enjoy! https://t.co/hOgEFtJ57w
What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ @drfeifei+... 1/n https://t.co/UQVUJOqvWz