Your curated collection of saved posts and media
Thanks for running our open-source work on current frontier models βThe results are: the most capable models today (GPT-5.5 Pro) did outperform the best models from before (79/100 vs 69/100), but did not improve enough to be considered sufficient for reliable medical use.β Read full text and results below
Thanks for running our open-source work on current frontier models βThe results are: the most capable models today (GPT-5.5 Pro) did outperform the best models from before (79/100 vs 69/100), but did not improve enough to be considered sufficient for reliable medical use.β Read full text and results below
A big problem with research studies on AI models is that given how long the peer review process is, the results are always out-of-date by the time the paper is published. This time, we have something better! The typical reaction to research results like this roughly goes "You'r
Day 1 of vibecoding https://t.co/n8ff35htEV
1/ On Training in Imagination - Dwarkesh's episode has a segment on dreaming as one of the next training paradigms. The idea is that a model learns mostly inside its own, by imagining what would happen, instead of trying out for real. We have a recent paper on exactly this π₯³π₯³π₯³
What does the next training paradigm look like? 0:00:00 β The big research bet the labs are making 0:02:12 β Grindability is just as important as verifiability 0:06:10 β Will RLVR alone generalize? 0:08:41 β Getting the learning back to the weights 0:15:22 β Dreaming 0:17:23 β W
Visualizing your dataset (especially large ones) in a low-dimensional embedding space can tell you a lot about the patterns and clusters in your dataset. We release a notebook showing how you can visualize your dataset using DINOv2 models by running it on your CPU. Yes! CPU!
Transformers are better at copying, while RNNs are better at modeling "meaning-bearing wordsβthe nouns, verbs, & adjectives that say what a sentence is about"
Hybrid (transformerβRNN) models are fast becoming a serious alternative to the transformer, but a big question remains: how do they process tokens differently & how does this impact performance? We compared our transformer (Olmo 3) & hybrid (Olmo Hybrid) models to find
We open-sourced BrowserBC: A system that turns human browser trajectories into reusable agent skills. Just one recording is enough to generalize a skill. π οΈ GitHub: [https://t.co/WP8mQGuJ6N] Hereβs how it works. π

GLM is the kind of model that revives serious interest in open source AI. It passes the blind test relative to the frontier models on the median production grade knowledge worker task. Itβs affordable to serve. And is a sub trillion parameter model, meaning it has a lot of potential to go beyond matching the frontier at the median level of difficulty to also doing it for the long tail. Plenty to look forward to!
v14 Lite Release Notes: β Distilled the intelligence from HW4 V14 into HW3. This allows HW3 to directly learn how to handle scenarios using HW4 V14 as a guide. This process unlocks the improvements that have been made to HW4 including Reinforcement Learning (RL) and offline models for HW3. β Improved both proactive and reactive responsiveness across a wide variety of categories including navigation handling, merges and forks, pedestrian interactions, traffic lights, and vehicle cut-in scenarios. β Improved general comfort in nominal scenarios through fewer false slowdowns, smoother steering and more consistent lane centering. β Introduced parking, unparking, and reversing capabilities. β Added Arrival Options for you to select where FSD should park: in a Parking Lot, on the Street, in a Driveway, or at the Curbside. β Speed Profiles are now available at all times, to further customize driving style preference.
FSD v14 Lite is now rolling out to AI3 early-access customers. Based on the feedback, will rollout to more customers over the next few weeks. This build distills the driving behavior from AI4βs v14 series into both the camera and compute config of AI3. It includes destination op