Your curated collection of saved posts and media
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. https://t.co/WAz8aIztKT All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

I built an LLM pricing comparison tool: π Search 200+ models π° Input, output, blended cost π 40+ benchmark scores βοΈ Side-by-side model compare https://t.co/SswDpoDwXX
We are launching something new at JetBrains βΒ please meet Air. It's a new Agentic Dev Environment built for working with agents from different vendors. More cool stuff is coming, stay tuned: @getsome_air https://t.co/X3pNdmOzFW
you know what hell yea https://t.co/mTYyoxakZy

you know what hell yea https://t.co/mTYyoxakZy

Lots of buzz online about an upcoming major March heatwave for the American SW & California. And in this case, it does indeed appear increasingly likely than an extremely anomalous and even record-breaking heatwave may envelop much of the SW about a week from now. https://t.co/GByhbmJEZb
The bottom line: Treat agents like code, not chat interfaces. Design for failure, validate every boundary, and use explicit structure. Get our full guide on building reliable multi-agent systems here. π https://t.co/yjrEEXUgwQ
Oracle is building yesterday's data centers with tomorrow's debt Frontier labs like OpenAI want the newest chips. But Nvidia is shipping a new generation annually while data centers still take years to get up and running. That's a mismatch for the whole AI trade Oracle, funding it with $100B in debt, may be first to crack
NEW: OpenAI and Google employeesβincluding Google DeepMind Chief Scientist Jeff Dean βfiled an amicus brief in support of Anthropic in its lawsuit against the US government. https://t.co/3lQrzlq8BE
Let it be noted that despite my contempt for LeCunβs recurrent pattern of intellectual dishonesty, I mostly stood up for him re Zuck and Wang: https://t.co/RgtbMYwqpq
Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into @huggingface Trainer, Accelerate and TRL For extensive details please see this writeup: https://t.co/2xDWUk8p3V Thanks a lot to @krasul for helping make it happen. Also the others in the HF team who helped with integration.

Tim Cook on how Steve Jobs believed that small teams could do amazing work. https://t.co/k7bMtFM6hs
Tim Cook on how Steve Jobs believed that small teams could do amazing work. https://t.co/k7bMtFM6hs
Mark Cuban just described the sharpest divide in the modern economy. And most people are already on the wrong side of it. Cuban: βThereβs two types of approaches to AI. Some people who use it so they donβt have to learn anything, and some people who use it so they have the opportunity to learn everything.β Two sentences. The entire future of human capital compressed into a single binary. The first group sees the most powerful knowledge infrastructure ever built and uses it to avoid thinking. They offload reasoning, skip the friction, and call it efficiency. What theyβre actually doing is hollowing out the one thing that canβt be replicated. Their own cognition. Cuban: βAI is a tool, itβs a way to learn, itβs a democratization of knowledge.β For centuries, elite knowledge was locked behind institutions, geography, and capital. The right university. The right city. The right network. Entire generations of potential buried because the information was never accessible. That wall just came down permanently. The second group understands what that actually means. Same tool. Compressing decades of learning into months. Entire disciplines on demand. Mental models that once required years of expensive education now available to anyone willing to ask the right questions. The knowledge is democratized. The ambition is not. Thatβs the divide Cuban is actually describing. Not technical literacy. Not access. Pure cognitive initiative. The first group is outsourcing their mind. The second is expanding it. Atrophy doesnβt announce itself. It just arrives.
OpenAIβs IPO hopes are facing skepticism from investors, @AnitaRamaswamy explains: βThey were still concerned about the current valuation.β βOpenAIβ¦ doesn't project that it's going to be generating cash until at least 2030.β https://t.co/7moB74b9M1
If you want AI Code Review, but don't want to pay $25 per review (not a typo), check out Codex Review! It leverages frontier Codex models, finds complex issues, and 100% usage based. Most runs should cost ~$1 or less https://t.co/43iF6rq8Xa
We built a neat tool that lets you convert a directory of Powerpoint files into clean, structured markdown - that Claude Code / agent SDK / any generalized agent wrapper can easily understand. The pptx skill in Claude Code is quite basic and doesnβt have high-fidelity understanding over graphics/charts/tables. Our project Surreal Slides uses LlamaParse to convert presentations into clean structured data that you can put into a db (@SurrealDB) for simple retrieval, without having to take screenshots of the data on the fly. Thanks to @itsclelia for this project, check it out: https://t.co/Fj1PASv8IP
I use this analogy a lot. This is what a room full of computers looked like in old times: https://t.co/Y4a93z76lw
ICYMI: As part of Codex for OSS, open-source maintainers can apply for API credits, six months of ChatGPT Pro with Codex, and Codex Security! Apply!! First batch rolling out soon! https://t.co/qcXXO4DLNt
PyTorch is heading to @NVIDIA #GTC26 in San Jose next week! π Visit us at Booth #338 for: β¨ Helion kernel authoring demos β¨ ExecuTorch on-device inference β¨ Meet PyTorch core maintainers & experts Plus talks, hands-on labs & a hackathon! π https://t.co/Hn2DEgXXa5 https://t.co/YUK6iWvX3V
Perplexity Computer replaced $225K/yr in marketing tools in a single weekend. We built an AI marketing agent that scans hourly, manages budgets, detects fatigue, and coordinates several campaigns end to end. In one test run, it made 224 micro-optimizations to our ad stack. https://t.co/B0ueikpQyp
SHOCKING: Among Republican men under 50, 54% deny the Holocaust. We are so screwed. https://t.co/vt6ZmGsCoY
SHOCKING: Among Republican men under 50, 54% deny the Holocaust. We are so screwed. https://t.co/vt6ZmGsCoY
No way, a product that sucks doesnβt work as expected! https://t.co/zSVKcLk7uT https://t.co/aPzF2tlNm1