Your curated collection of saved posts and media
One of the more interesting and thought provoking research papers I've seen in a while. A system for reading and reimplementing NeRF papers, and it seems to work very well. Pretty easy to extrapolate out from here to what CVPR 2027 papers will look like. https://t.co/gokzG27mIT https://t.co/jPpRESdKkd
The legendary Donald Knuth just witnessed something remarkable. π₯π₯π₯ After 30 years, a problem he posed about Hamiltonian cycle decompositions in a 3-dimensional Cayley digraph has finally been cracked. And the solver? Not a PhD student. Not a research group. Not a math department.
zoe was burning 24M+ opus tokens/day monitoring agents that weren't running. replaced her cron with a 2-layer system: - bash pre-check, zero tokens when idle - webhook fires opus only when needed. ~95% token reduction and more reliable output. details below. (set up a cron to watch this performance, if it works well I'll double down on this event driven stack, seems like the future)
i find it fucking hilarious how Apple "failing" at AI is now the exact reason they're about to win it: - watched everyone else burn $1.4T+ building models... then picked the winner (gemini) to use for... $1B - while everyone fights to grow users, apple flips a switch and 2.5 billion devices get AI siri tmrw. - $150B to splurge on the device / app layer. zero competition (because everyones spent their cash). - while openAI charges $200/mo subscriptions, Apple lets you run models on-device (cheaper, faster, private, personal) - while openAI struggles to build an AI device, Apple just dropped 5 powered by the best AI chips for hand-held devices. they "lost" the model race because they didn't need to win it in the first place greatest to (accidentally) ever do it.

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.
TIL: There's a whole bunch of interesting skills in the oss codex repo: https://t.co/gNFHV3MD2j $skill-installer playwright-interactive (also /fast is sweeeeet, 1.5x codex makes a huge diff!) https://t.co/XTENPuZ9Ie

TIL: There's a whole bunch of interesting skills in the oss codex repo: https://t.co/gNFHV3MD2j $skill-installer playwright-interactive (also /fast is sweeeeet, 1.5x codex makes a huge diff!) https://t.co/XTENPuZ9Ie
Still a little sick, here is 'TheraFlu' crystalized on a slide, PLM. https://t.co/fKT14EkGIC

My mom made some children's books! She did the stories when I was little, and tried a few times to illustrate them. Now, finally, with some nano banana help and plenty of hard work, she has them finished and out in the world π https://t.co/NaWukMCRe1
LTX-2.3 is here. For decades, creative software has been defined by its interface. We think the next era gets defined by the engine underneath. LTX-2.3 is a major engine upgrade: β Sharper detail β Stronger motion β Cleaner audio β Native vertical format https://t.co/mzw4iECfno
Clever strategy from LTX. Working with the community to try and build what they want, tons of rich feedback thanks to millions of users, now shipping software that you can run for free on top of the models, but charging for API use. https://t.co/0X3GgpJv3U

@Miles_Brundage @yonashav Good toast https://t.co/iF3yveZEn2

I've decided on a long-term goal for my bio side quest: I would like to make a version of this drink where the duckweed floating on the surface has been engineered to have an interesting flavor. And make it easy for others to dabble with genetic gastronomy too π§¬π¨βπ³ https://t.co/dOUPdpFqdf

A start: the red frond here has had the "RUBY" genes inserted, coding for red betalain pigment. Hopefully it's able to produce red daughter fronds. Lots more work to do from here :) https://t.co/smQCwpS5fj

π Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B Β· Qwen3.5-2B Β· Qwen3.5-4B Β· Qwen3.5-9B β¨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation β native multimodal, improved architecture, scaled RL: β’ 0.8B / 2B β tiny, fast, great for edge device β’ 4B β a surprisingly strong multimodal base for lightweight agents β’ 9B β compact, but already closing the gap with much larger models And yes β weβre also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: https://t.co/wFMdX5pDjU ModelScope: https://t.co/9NGXcIdCWI

YES! Someone reverse-engineered Apple's Neural Engine and trained a neural network on it. Apple never allowed this. ANE is inference-only. No public API, no docs. They cracked it open anyway. Why it matters: β’ M4 ANE = 6.6 TFLOPS/W vs 0.08 for an A100 (80Γ more efficient) β’ "38 TOPS" is a lie - real throughput is 19 TFLOPS FP16 β’ Your Mac mini has this chip sitting mostly idle Translation: local AI inference that's faster AND uses almost no power. Still early research but the door is now open. β https://t.co/qPwddSyV3f #AI #MachineLearning #AppleSilicon #LocalAI #OpenSource #ANE #CoreML #AppleSilicon #NPU #KCORES

A trillion-parameter model just made half its brain disappear. It got smarter. Yuan3.0 Ultra is a new open-source multimodal MoE model from Yuan Lab. 1010B total parameters, only 68.8B active at inference. It beat GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 on RAG benchmarks by wide margins. 67.4% on Docmatix vs GPT-4o's 56.8%. Here's what it unlocks: > Enterprise RAG with 68.2% avg accuracy across 10 retrieval tasks > Complex table understanding at 62.3% on MMTab > Text-to-SQL generation scoring 83.9% on Spider 1.0 > Multimodal doc analysis with a 64K context window The key innovation: Layer-Adaptive Expert Pruning (LAEP). During pretraining, expert token loads become wildly imbalanced. Some experts get 500x more tokens than others. LAEP prunes the underused ones layer by layer, cutting 33% of parameters while boosting training efficiency by 49%. They also refined "fast-thinking" RL. Correct answers with fewer reasoning steps get rewarded more. This cut output tokens by 14.38% while improving accuracy by 16.33%. The bigger signal here: MoE models are learning to self-compress during training, not after. If pruning becomes part of pretraining, the cost curve for trillion-scale models shifts dramatically.
> 385ms average tool selection. > 67 tools across 13 MCP servers. > 14.5GB memory footprint. > Zero network calls. LocalCowork is an AI agent that runs on a MacBook. Open source. π§΅ https://t.co/bnXupspSXc
We're introducing Cursor Automations to build always-on agents. https://t.co/uxgTbncJlM
Transformers just got a serious rival. Allen AI just open-sourced a 7B model that beats its own transformer. OLMo Hybrid mixes standard attention with linear RNN layers into one architecture. > Same accuracy, half the training data > Long-context jumps from 70.9% to 85.0% > Beats the pure transformer on every eval domain > Fully open: base, fine-tuned, and aligned versions The trick is a 3:1 pattern. Three recurrent layers handle most of the sequence processing cheaply. One attention layer then catches what the recurrent state missed. This cuts 75% of the expensive attention operations while keeping precision where it matters. Building long-context apps used to mean paying the full cost of attention across every layer. Now you can get better long-context performance with a leaner architecture, and the theory proving why it scales better is released alongside the weights. https://t.co/bxZ7ckAOq4

@DiamondEyesFox @durov https://t.co/Drl94NfDOR
Don't overcomplicate your AI agents. As an example, here is a minimal and very capable agent for automated theorem proving. The prevailing approach to automated theorem proving involves complex, multi-component systems with heavy computational overhead. But does it need to be that complex? This research introduces a deliberately minimal agent architecture for formal theorem proving. It interfaces with Lean and demonstrates that a streamlined, pared-down approach can achieve competitive performance on proof generation benchmarks. It turns out that simplicity is a feature, not a limitation. By stripping away unnecessary complexity, the agent becomes more reproducible, efficient, and accessible. Sophisticated results don't require sophisticated infrastructure. Paper: https://t.co/3p5MfNQII4 Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
Interesting research on how hierarchies spontaneously emerge in multi-agent systems. Start with a group of cooperative agents. There are no leaders and no structure. Just collaboration. What happens over time? Hierarchies form on their own. This new research looks at the dynamics of how initially flat, cooperative multi-agent systems naturally transition into hierarchical organizations. They identify the mechanisms and conditions that drive this structural shift. Why does it matter? Understanding hierarchy emergence is critical for designing multi-agent systems where organizational structure matters. Whether you're building agent swarms, collaborative AI teams, or simulating social systems, knowing when and why hierarchies form helps you design better systems or prevent unintended power structures. Paper: https://t.co/cKJKd59JU6 Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Can AI agents agree? Communication is one of the biggest challenges in multi-agent systems. New research tests LLM-based agents on Byzantine consensus games, scenarios where agents must agree on a value even when some participants behave adversarially. The main finding: valid agreement is unreliable even in fully benign settings, and degrades further as group size grows. Most failures come from convergence stalls and timeouts, not subtle value corruption. Why does it matter? Multi-agent systems are being deployed in high-stakes coordination tasks. This paper is an early signal that reliable consensus is not an emergent property you can assume. It needs to be designed explicitly. Paper: https://t.co/3fllhchiKX Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX