Your curated collection of saved posts and media
This might look like a game. But itβs actually the beginning of autonomous coordination between AI agents. π§ π€
This might look like a game. But itβs actually the beginning of autonomous coordination between AI agents. π§ π€
You can actually interact with the world simulator directly in the browser. π€ Here is a quick screen recording (8x speed) of me playing with it: real-time action-conditioned video prediction across rigid objects, deformable objects, rope, and object piles. Try it yourself (no install required): https://t.co/rqAmgzcB7F Huge kudos to my student @YXWangBot for making the interactive demo happen!
@honnibal TBH I think a big difference is my pre-LLM workflow was vastly different to nearly anyone else's. I hardly ever have to debug my code since it's extremely unlikely for me to write bugs using the notebook-based approach. So much faster debugging is only useful for 3rd part code.
Another 15k like post that is wrong about an AI paperβs findings. And the community note undersells how wrong: the creativity paper measured 61 people (underpowered) and found NO drop in creativity at 30 days. The ChatGPT group was actual still (significantly!) higher at the end https://t.co/v1Z87oDCJI
@quasicoh What kinds of improvements would you want to see for codex here?
Perplexity is the most underrated AI tool for web search. I'm gonna integrate it into OpenClaw for SEO stuffs. Also, EXTREMELY cheap. https://t.co/WfC29oJ4Zn
Perplexity is the most underrated AI tool for web search. I'm gonna integrate it into OpenClaw for SEO stuffs. Also, EXTREMELY cheap. https://t.co/WfC29oJ4Zn
OpenAI's massive Stargate data center canceled as firm can't reach terms with Oracle, operator struggles with reliability issues β Meta said to be interested in snatching excess capacity https://t.co/16mCMGJ7LI
Confirmed. Both Opus 4.6 (Claude Code) and 5.3 (codex) failed to do something that 5.4 (codex) was able to do.
The Top AI Papers of the Week (March 1 - March 8) - NeuroSkill - ParamMem - Numina-Lean-Agent - Bayesian Teaching for LLMs - Auton Agentic AI Framework - Theory of Mind in Multi-Agent LLMs - Why LLMs Form Geometric Representations Read on for more:
The Top AI Papers of the Week (March 1 - March 8) - NeuroSkill - ParamMem - Numina-Lean-Agent - Bayesian Teaching for LLMs - Auton Agentic AI Framework - Theory of Mind in Multi-Agent LLMs - Why LLMs Form Geometric Representations Read on for more:
@CtrlAltDwayne My whole timeline was full of people complaining about codex degradation recently. For Claude Code I think it is way more likely that people did not realize that Anthropic changed the default thinking from high to medium for Opus.
Social scientists working with materials requiring digitization can only study what machines can read. In practice, that means printed Latin-script documents from well-funded archives. In a new working paper, I show that Vision Language Models used zero-shot outperform every existing OCR system across every script evaluated, and I propose a pipeline for deploying them on new collections. I apply it to six archival collections spanning 1.8 million pages across six countries for under $1,900.
I want to up the ante on this. If you have a large document collection, I will digitize it for you, for free (you pay for inference), on one condition: that we make the data publicly available immediately.
@honnibal The "much much faster" bit I'd caveat quite a lot though. The issues in "No Silver Bullet" are largely still true. So some bits are much much faster, but overall impact on end-to-end creation, maintenance, support, and feature addition is a more limited than some believe.
@honnibal I've spent the last few years focused on the question of how to get the most utility out of LLMs, and created a whole company based on that, which I wouldn't be doing if I didn't think they were useful.
ChatGPT for Excel is here! GPT-5.4 is shockingly good at performing Excel manipulations. In particular, it's been impressive at handling work when thrown into complex existing spreadsheets. Available for Plus, Pro, Enterprise, Business, and Edu users! https://t.co/af0zuzBHpM
GPT-5.4 is really good at spreadsheets; a few finance people have finally said things to me like "huh I guess this AI thing is real"
iβve been using gpt 5.4 for the past few weeks. in a sea of endless model drops and benchmark maxxing, this model is the first in a long time to be worth your time to try. honestly didnβt expect openai to pull this off.
GPT-5.4 is great at coding, knowledge work, computer use, etc, and it's nice to see how much people are enjoying it. But it's also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.
AI is getting remarkably good at finding software vulnerabilities. Anthropicβs system recently hacked the Firefox browser in a controlled test and uncovered numerous bugs. Tools that strengthen security could also make exploitation easier, depending on who uses them. https://t.co/1V7gdmcb9N @wsj @AnthropicAI
insane, check out this demo of codex using ableton please keep sharing demos like this
@kristoph definitely. the current one is already 90% AI written I ain't writing all that