Your curated collection of saved posts and media
TIL: There's a whole bunch of interesting skills in the oss codex repo: https://t.co/gNFHV3MD2j $skill-installer playwright-interactive (also /fast is sweeeeet, 1.5x codex makes a huge diff!) https://t.co/XTENPuZ9Ie

Someone just bypassed Apple's Neural Engine to train models. The Neural Engine inside every M-series Mac was designed for inference. Run models, don't train them. No public API, no documentation, and certainly no backpropagation. A researcher reverse-engineered the private APIs anyway and built a transformer training loop that runs forward and backward passes directly on the ANE hardware. The method bypasses CoreML entirely. Instead of using Apple's official tools, the project constructs programs in MIL (Model Intermediate Language), compiles them in-memory using undocumented `_ANEClient` APIs, and feeds data through IOSurface shared memory buffers. Weights get baked into the compiled programs as constants. E ach training step dispatches six custom kernels: attention forward, feedforward forward, then four backward passes that compute gradients with respect to inputs. Weight gradients still run on the CPU using Accelerate's matrix libraries, but the heavy lifting (matrix multiplies, softmax, activation functions) happens on the ANE. This makes three things possible that weren't before: 1. Training small models locally without burning through your battery 2. Fine-tuning on-device without sending data to a server or spinning up the GPU 3. Research into what the ANE hardware can actually do when you ignore Apple's guardrails If this approach scales, the next wave of on-device AI stops being about running someone else's frozen model.
Can AI agents agree? Communication is one of the biggest challenges in multi-agent systems. New research tests LLM-based agents on Byzantine consensus games, scenarios where agents must agree on a value even when some participants behave adversarially. The main finding: valid agreement is unreliable even in fully benign settings, and degrades further as group size grows. Most failures come from convergence stalls and timeouts, not subtle value corruption. Why does it matter? Multi-agent systems are being deployed in high-stakes coordination tasks. This paper is an early signal that reliable consensus is not an emergent property you can assume. It needs to be designed explicitly. Paper: https://t.co/3fllhchiKX Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
MCP is dead? What are your thoughts? I mostly use Skills and CLI lately. I still use a few MCP tools for orchestrating agents more efficiently. https://t.co/o6saSxNQ9s
MCP is dead? What are your thoughts? I mostly use Skills and CLI lately. I still use a few MCP tools for orchestrating agents more efficiently. https://t.co/o6saSxNQ9s
@alex_prompter Without opening the paper, how did they gather the ground truth? My naive assumption is if they are able to gather the ground truth, it is somewhere out there.
Narrative violation. Cursor goes $1B to $2B in 3mos. Claude Code went $0 to $2.5B in 8mos. Everyone in the tech/X bubble think people are wholesale ditching Cursor, but enterprise diffusion is glacial. Most of the world just got a hold of it. https://t.co/7RBU7mvosz
Narrative violation. Cursor goes $1B to $2B in 3mos. Claude Code went $0 to $2.5B in 8mos. Everyone in the tech/X bubble think people are wholesale ditching Cursor, but enterprise diffusion is glacial. Most of the world just got a hold of it. https://t.co/7RBU7mvosz
If you need to split complex or composite documents into structured categories or sections, LlamaSplit is built for the job โ๏ธ With the intuitive UI, you can: โขDefine a custom configuration for how your documents should be categorized โขSpecify the exact sections or impact types you want extracted โขRun the job and explore the results through an interactive interface ๐ In this walkthrough, @itsclelia demonstrates how to configure LlamaSplit to break down Environmental Impact Reports into clearly defined impact categories ๐ณ ๐ฅ Watch the full video here: ๐ Or get started right away with the docs (UI + code examples): https://t.co/kAMUqwOCDW
Building for the AI era means rethinking the stack from the ground up. Modular co-founder and CEO @clattner_llvm joined @shanselman on @Hanselminutes to talk about Mojo ๐ฅ, heterogeneous compute, and why AI infrastructure demands new abstractions. Watch hereโ https://t.co/AKCJQEoKNJ
We just shipped Designs. Here's the problem it solves: most UI work fails because you don't know what it should look like until after your coding agent already built it wrong. You describe a dashboard. The agent builds it. You realize the layout doesn't work. You prompt again. The agent rebuilds. Something else breaks. Three iterations later you're debugging CSS instead of shipping features. Designs puts the iteration where it belongs, before a single line of code gets written. BrainGrid now generates actual UI designs for your requirements. You can iterate on them with the agent, annotate what needs to change, select specific elements to tweak. Once you lock it in, that design becomes part of the requirement that gets handed to your coding tool. No more building the wrong UI three times because you couldn't visualize it from a text prompt. It works with new apps and existing ones. If you're adding a feature to something you've already built, BrainGrid matches your existing app's look and feel so the new design doesn't feel bolted on. The designs get included in your Requirements doc when you fetch from CLI or MCP. Your coding agent knows exactly what to build. This is the part most builders skip, and it's why UI work takes twice as long as it should. Now you can see it, fix it, and lock it before the agent touches your codebase.
Tackling a really gnarly issue. Played Codex 5.4 and Opus 4.6 side by side. Codex came back relatively fast with a diagnostic/solution that seemed feasible Opus took a while, a long one, and came back with the correct root cause.
You can now run three frontier models at once and select your orchestrator model directly inside Perplexity Computer. Model Council automatically runs GPT-5.4, Claude Opus 4.6 and Gemini 3.1 Pro simultaneously. Three frontier models. One workflow. Best answer wins. https://t.co/40rPcXpr6s
It is increasingly clear that the constraint is going to be compute, just as the AI labs warned. The token requirements for agentic work are high, making using frontier agents only cost efficient for high value tasks. There are tons of other opportunities waiting for lower costs
Had early access to GPT-5.4 and Pro. They are very good. One fun illustration of progress, this is the same prompt I used in GPT-4 below (making a 3D space inspired by Piranesi) now in GPT-5.4 Pro. There were no errors, made in a single prompt plus one to "make it better." https://t.co/7Vgmc60SKc
To clarify: Gemini Deep Think is a really smart model, but it doesn't have access to the same tools as Claude or ChatGPT - it can't download files, cannot consistently run code on its own, cannot produce downloadable files, does not clearly show when it does web search, etc
Two major AI releases this week: โข Qwen3.5 โ new open-source small models โข GPT-5.4 โ newest frontier closed model Most benchmarks compare math and coding. But the real test for frontier AI should be biology and healthcare. Thatโs where mistakes actually matter. So our team at @UHN ran them on EURORAD โ 207 expert-validated radiology differential diagnosis cases. Results: GPT-5.4: 92.2% Qwen3.5-27B: 85% Gemini 3.1 Pro: ~79% A 27B open model that runs on a laptop is only 7 points behind the most powerful AI model on earth โ and already beating Gemini on this benchmark. That gap is much smaller than people expected. And it matters. For years hospitals faced an impossible tradeoff: Frontier models โ patient data leaves the hospital Local models โ not good enough That tradeoff may finally be ending. Qwen3.5-27B runs fully local. No API. No cloud. No patient data leaving the building. HIPAA / PHIPA compliance becomes architecture, not paperwork. Interesting detail: 27B and 122B score almost identically here. Scaling bigger didnโt help much. One caveat: with web-scale training, itโs hard to completely rule out that frontier models like GPT-5.4 may have seen parts of evaluation datasets. Still, the signal is clear: Small models are getting good enough for real clinical AI. And if we want to measure real AI progress, biology and healthcare should be the benchmark. Huge credit to the team @alifmunim @AlhusainAbdalla @JunMa_AI4Health @Omar_Ibr12 @oliviaamwei
Two major AI releases this week: โข Qwen3.5 โ new open-source small models โข GPT-5.4 โ newest frontier closed model Most benchmarks compare math and coding. But the real test for frontier AI should be biology and healthcare. Thatโs where mistakes actually matter. So our team at @UHN ran them on EURORAD โ 207 expert-validated radiology differential diagnosis cases. Results: GPT-5.4: 92.2% Qwen3.5-27B: 85% Gemini 3.1 Pro: ~79% A 27B open model that runs on a laptop is only 7 points behind the most powerful AI model on earth โ and already beating Gemini on this benchmark. That gap is much smaller than people expected. And it matters. For years hospitals faced an impossible tradeoff: Frontier models โ patient data leaves the hospital Local models โ not good enough That tradeoff may finally be ending. Qwen3.5-27B runs fully local. No API. No cloud. No patient data leaving the building. HIPAA / PHIPA compliance becomes architecture, not paperwork. Interesting detail: 27B and 122B score almost identically here. Scaling bigger didnโt help much. One caveat: with web-scale training, itโs hard to completely rule out that frontier models like GPT-5.4 may have seen parts of evaluation datasets. Still, the signal is clear: Small models are getting good enough for real clinical AI. And if we want to measure real AI progress, biology and healthcare should be the benchmark. Huge credit to the team @alifmunim @AlhusainAbdalla @JunMa_AI4Health @Omar_Ibr12 @oliviaamwei
@CFGeek @xeophon Third party audits detected an irregularity in a Pythia model that had gone undetected in 2 years. https://t.co/pXefsdzIkm
@CFGeek @xeophon Actually it was their second paper: https://t.co/7FO6BwP9Mz
WE WON THE @MistralAI LONDON HACKATHON ๐ฌ๐ง๐ซ๐ท We made Mistralverse, here's our demo vid. @HarryStebbings who says the UK isn't shipping?? https://t.co/lVWr43XkNj
WE WON THE @MistralAI LONDON HACKATHON ๐ฌ๐ง๐ซ๐ท We made Mistralverse, here's our demo vid. @HarryStebbings who says the UK isn't shipping?? https://t.co/lVWr43XkNj
What if AI could see the world the way we do? Thatโs the idea we bet our weekend on at the Mistral Worldwide Hackathon. With @haaspierre_ and Arman Artola-Zanganeh, we built ๐ฃ๐ผ๐ฟ๐:๐ช๐ผ๐ฟ๐น๐ฑ๐, an open-source framework that lets anyone connect their Meta glasses to any AI system. Let me take you back to saturday morning. So before knowing it could work we needed the hardware. So I ran to Rue de Rivoli and bought โฌ500 Meta glasses on the spot. If thatโs not commitment, I donโt know what is (a true bet). We then built non-stop for 36 hours to make it usable. End-to-end. The glasses stream what you see โ the AI makes sense of it โ it answers back through the glassesโ speaker. And suddenly when we understood that it was going to work, the question changed. It was no longer โ๐๐ ๐๐ต๐ถ๐ ๐ฑ๐ผ๐ฎ๐ฏ๐น๐ฒ?โ It became โ๐ช๐ต๐ฎ๐ ๐ฐ๐ฎ๐ป ๐ฝ๐ฒ๐ผ๐ฝ๐น๐ฒ ๐ฏ๐๐ถ๐น๐ฑ ๐๐ถ๐๐ต ๐๐ต๐ถ๐?โ - A plumber getting live assistance while repairing something. - A technician repairing industrial machinery. - A traveler exploring a new country. - A visually impaired person navigating space. At first, we were looking for the โrightโ use case. Then we realized something more interesting. If AI can share your perspective, continuously, the use cases are not ours to decide. Thatโs why ๐ฃ๐ผ๐ฟ๐:๐ช๐ผ๐ฟ๐น๐ฑ๐ is fully open source. If you want to connect your Meta glasses, plug in your own models, customize with your own prompts, your own MCP, your Openclawโฆ you can. Link to the open source repo (you can contribute and give it a little star โค๏ธ): https://t.co/UueLnkMZpM Link to the demo video: https://t.co/qcTDqKGvax Huge thanks to the organizing team of the hackathon, it was truly great. @Jthmas404

What if AI could see the world the way we do? Thatโs the idea we bet our weekend on at the Mistral Worldwide Hackathon. With @haaspierre_ and Arman Artola-Zanganeh, we built ๐ฃ๐ผ๐ฟ๐:๐ช๐ผ๐ฟ๐น๐ฑ๐, an open-source framework that lets anyone connect their Meta glasses to any AI system. Let me take you back to saturday morning. So before knowing it could work we needed the hardware. So I ran to Rue de Rivoli and bought โฌ500 Meta glasses on the spot. If thatโs not commitment, I donโt know what is (a true bet). We then built non-stop for 36 hours to make it usable. End-to-end. The glasses stream what you see โ the AI makes sense of it โ it answers back through the glassesโ speaker. And suddenly when we understood that it was going to work, the question changed. It was no longer โ๐๐ ๐๐ต๐ถ๐ ๐ฑ๐ผ๐ฎ๐ฏ๐น๐ฒ?โ It became โ๐ช๐ต๐ฎ๐ ๐ฐ๐ฎ๐ป ๐ฝ๐ฒ๐ผ๐ฝ๐น๐ฒ ๐ฏ๐๐ถ๐น๐ฑ ๐๐ถ๐๐ต ๐๐ต๐ถ๐?โ - A plumber getting live assistance while repairing something. - A technician repairing industrial machinery. - A traveler exploring a new country. - A visually impaired person navigating space. At first, we were looking for the โrightโ use case. Then we realized something more interesting. If AI can share your perspective, continuously, the use cases are not ours to decide. Thatโs why ๐ฃ๐ผ๐ฟ๐:๐ช๐ผ๐ฟ๐น๐ฑ๐ is fully open source. If you want to connect your Meta glasses, plug in your own models, customize with your own prompts, your own MCP, your Openclawโฆ you can. Link to the open source repo (you can contribute and give it a little star โค๏ธ): https://t.co/UueLnkMZpM Link to the demo video: https://t.co/qcTDqKGvax Huge thanks to the organizing team of the hackathon, it was truly great. @Jthmas404