Your curated collection of saved posts and media
I will have the best MiniMax compressions on the market. REAP is out, it's a v0, I started repruning the models around feedback (; I am going to quantise this to 4bit & GGUF https://t.co/7BlITpUkTq https://t.co/vorWncLAAb

Quick career update: joined @HuggingFace as a Research Engineer to make RL go brrrrr π P.S. Iβve been a huge fan of hf ever since I started working on ML and contributing to open source. I canβt imagine what open-source AI would look like without HF. Iβve looked up to so many people here, and itβs truly a pleasure to now work with them.
Let me go first I think it would be π€ huggingface It has completely democratized open source ai I cant imagine dev/training/finetuning models and building datasets without hf libraries Things would be so much difficult without them if every model had different standards
Both Starship and the Super Heavy Booster have successfully completed the static fire tests and are ready to take to the skies Every test brings us one step closer to making humanity multi-planetary "Engineering is the closest thing to magic that exists in the real world" β Elon Musk
π’π’A double launch today! Weβre releasing a paper analyzing the rapidly growing trend of βopen-world evaluationsβ for measuring frontier AI capabilities. Weβre also launching a new project, CRUX (Collaborative Research for Updating AI eXpectations), an effort to regularly conduct such evaluations ourselves. I think open-world evals are the most important development in AI evaluation over the past year. Our paper explains why we need them, what they can and canβt tell us, and how to do them well. In CRUX #1, we tasked an agent with building and publishing a simple iOS app to the Apple App store. The paper has many βlessons from the trenchesβ from running this experiment. We hope you find it interesting! CRUX #2 will be about AI R&D automation. The core team is @sayashk, @PKirgis, @steverab, Andrew Schwartz, and me. Weβre delighted to have assembled an amazing group of collaborators, many of whom have conducted important open-world evaluations: @fly_upside_down, @RishiBommasani, @DubMagda, @ghadfield, @ahall_research, @sarahookr, @sethlazar, @snewmanpv, @DimitrisPapail, @shostekofsky, @hlntnr, and @CUdudec. Paper: https://t.co/M15jgh4PCP HTML version: https://t.co/iuVW7RAlr5 CRUX website: https://t.co/g937gpS65j
Benchmarks are saturated more quickly than ever. How should frontier AI evaluations evolve? In a new paper, we argue that the AI community is already converging on an answer: Open-world evaluations. They are long, messy, real-world tasks that would be impractical for benchmarks. https://t.co/CrvbEd9l7f
So excited to share that we're bringing Computer Use to Codex. Computer Use lets Codex see, click, and type into your Mac apps, with its own cursor. It's a magical feeling to have agents using your apps in the background, and still get to use your computer at the same time. https://t.co/wdgxiHAKyX
This tweet was sent by Codex via Computer Use https://t.co/UyxAw6MVj0
This tweet was sent by Codex via Computer Use https://t.co/UyxAw6MVj0
Biggest lesson from OpenClaw is that a good teammate doesn't start from scratch everytime you check in. They remember what was decided, what's still open, and proactively help you. Today we launched heartbeats in Codex: automations that maintain context inside a single thread over time. Instead of each run starting fresh, Codex wakes up in the same conversation, with the history and context it needs already in place. You can also have it schedule its own next steps β just ask Codex. Think about the overhead that quietly accumulates every morning: scanning Slack channels, catching up on email, piecing together what moved overnight. With a heartbeat, you offload that once, and wake up to a brief already waiting in a pinned thread. If you want to try turning Codex into a chief of staff: connect Slack, Gmail, and Notion, and paste the following prompt into codex: Please check @Slack @Gmail @Notion and write me a morning brief every weekday at 9am in this thread. I want you to collapse all the chaos at work into a single note every morning over some coffee βοΈ
Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right https://t.co/kDlbnJv6YI

Today weβre announcing Ternary Bonsai: Top intelligence at 1.58 bits Using ternary weights {-1, 0, +1}, we built a family of models that are 9x smaller than their 16-bit counterparts while outperforming most models in their respective parameter classes on standard benchmarks. Weβre open-sourcing the models under the Apache 2.0 license in three sizes: 8B (1.75 GB), 4B (0.86 GB), and 1.7B (0.37 GB).
one small step towards something super one of the most exciting things I've seen is 1) very good skill triggering 2) computer use works in the background so I can multitask 3) better pdf handling and connectors too! https://t.co/ZeBf7a5kQQ
With computer use on macOS, Codex can now use any app by seeing, clicking, and typing with its own cursor. It runs in the background without taking over your computer, working on tasks like frontend iteration, app testing, or any workflow that doesn't expose an API. https://t.co/iO9iubLZX9
You can now generate and iterate on images with gpt-image-1.5 in Codex to create frontend designs, mockups, game assets, and more without leaving your workflow. Usage is included with your ChatGPT account, no API key needed. https://t.co/ay17I3Nxoa
Automations can now run in the same thread, so Codex can pick up where it left off, with the original context intact. It can schedule future work and wake up automatically to continue long-term tasks, from landing open PRs to following up on tasks or staying on top of fast-moving conversations.
Weβve also added support for 90+ plugins in Codex, giving it more ways to gather context and take action across the tools you already use for docs, project management, code review, creative work, deployments, and more. https://t.co/IkmpDJwrLq
Opus 4.7 is live in Claude Code today! The model performs best if you treat it like an engineer you're delegating to, not a pair programmer you're guiding line by line. Here are three workflow shifts we recommend for this model π§΅ https://t.co/bD5JO1xDMS
Happy model launch day! Opus 4.7 is now available on all products and a significant step up from Opus 4.6. It's better at coding, computer use, finance, and general knowledge work. π§΅ I'll put the 5 things I find most interesting in thread! https://t.co/JEsw0a6Mrs
Claude Opus 4.7 is now the default orchestration model powering Computer. It's also available for Max subscribers on Perplexity web, iOS, and Android. https://t.co/aqQm1FKU5K
WendyOS is the foundation of our Physical AI operating system for NVIDIA Jetsons, and its progress has been shaped in no small part by the guidance of Ilies Chergui. I reached out to Ilies last winter, and heβs since become both a great friend and a trusted advisor. Through WhatsApp chats and dinners, heβs generously shared hard-won advice on what to do, what to avoid, and how to build this the right way. Proud to call this legend a friend!
Opus 4.7 feels more intelligent, agentic, and precise than 4.6. It took a few days for me to learn how to work with it effectively, to fully take advantage of its new capabilities. Will post a few more tips throughout the day, starting with this blog post: https://t.co/XQrH8P28yo
OpenAIβs Codex Mac app adds three key features that go beyond agentic coding https://t.co/yvzpzJbsZN by @apollozac
OpenAIβs Codex Mac app adds three key features that go beyond agentic coding https://t.co/yvzpzJbsZN by @apollozac
Is Opus 4.7 good? I suggest you A/B test prompts between Codex and Claude for a while. Good time to mention this is easy to do in https://t.co/ImLyLY82pL https://t.co/Nb4Hr9lvh8

Codex just got a lot more powerful. Computer use, in-app browser, image generation and editing, 90+ new plugins to connect to everything, multi-terminal, SSH into devboxes, thread automations, rich document editing. Learns from experience and proactively suggestions work. And a ton more.
NHTSA autonomous vehicle crash data has been updated through March 15, 2026, for AVs including Tesla Robotaxi. This includes unsupervised Tesla-driven robotaxis. β’ Waymo: 58 incidents β’ Zoox: 3 incidents β’ Tesla: 0 incidents Tesla has had 15 incidents in 10 months with an estimated 1,000,000+ miles driven on FSD. These miles include fully unsupervised driving as well. Many people claimed that Tesla had so many incidents with safety monitors and that it would therefore be worse when unsupervised. It turns out that wasnβt the case at all, as incidents are dropping as Tesla does more testing and trains better models.

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks. https://t.co/UEEsYBDYfo
44 years ago IBM released the first personal computer, establishing the standard for modern personal computing.. weβve come a long way! https://t.co/hdfuGT6JWo
Today we're releasing Personal Computer. Personal Computer integrates with the Perplexity Mac App for secure orchestration across your local files, native apps, and browser. Weβre rolling this out to all Perplexity Max subscribers and everyone on the waitlist starting today. ht
Something is about to drop π₯ https://t.co/V7XNDyb70Z
βPeople arenβt just building for humans anymore. Theyβre building for agents.β @Cloudflare shares how Cloudflare Sandbox SDK works with the OpenAI Agents SDK to help agents run code in secure environments while keeping sensitive data separate from execution. https://t.co/VE6YZR6WAG
You ever run a benchmark and end up with 40 log files, zero clarity, and a laptop that sounds like a jet engine? @runloopai + W&B Weave fixes this π§΅ https://t.co/K5hVq6RkfG
Even my Doria persona is in the weights now. https://t.co/d7zNQ9anTW