Your curated collection of saved posts and media
265 pages of everything you need to know about building AI agents. 5 things that stood out to me about this report: https://t.co/7LKSGvCHlj
Yann LeCun: "We're never going to get to human level AI by just training on text" https://t.co/ucDtggCwLO
Wow. Researchers secretly used an AI on Redditβs r/changemyview/ as part of an unauthorized experiment https://t.co/QKpZNI9Q3Z
The Art of Doing Science and Engineering: Learning to Learn by Richard Hamming only $1.99 for the Kindle version today: https://t.co/VMmyoj1Z61 https://t.co/JOzhXdO4J5
With first Claude and now Gemini playing Pokemon, I was thinking of doing my own game-playing experiment over the weekend. However, I quickly learned that it's very far from the VLA-style "pixels->plan" that I naively thought it was, and wanted to do myself. It's like 90%β¦ https://t.co/5cVrbmYArc
It gets worse. Following the rotating 'spiral' incident, a concerning reply appeared from CLAUDE itself to the simulated post: Translating the binary yielded: When I attempted to view more replies, the entire simulated Twitter staging site transformed into an very different⦠https://t.co/DpE5JcSBOZ

GPT-4o these days https://t.co/5p4FJXpJQ1
this seems pretty bad actually https://t.co/JGbmmyblqh
Official announcement: Qwen 3 this week. Reasoning and non-reasoning in one. https://t.co/4xPTnfRvON
The Sparse Frontier Efficient sparse attention methods are key to scale LLMs to long contexts. We conduct the largest-scale empirical analysis that answers: 1. π€π Are small dense models or large sparse models better? 2. βΎοΈWhat is the maximum permissible sparsity per task? 3.β¦ https://t.co/prWfrljmzQ
The new RLHF gave chatgpt a new job https://t.co/HJDsKKXuDg
For most people in most circumstances, developing the habit of asking o3 or Gemini 2.5 about anything that confuses or intrigues you is often good Specific curiosity (seeking out answers for things we don't know and following up on learnings) is a good trait & the models do well https://t.co/yqSJ8FimZ8
Here's a (memory-free) convo with GPT 4o to make this more concrete https://t.co/7Vmq4JI3rp

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding? "we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from⦠https://t.co/0o3bS1hceR
"o3, You are a consultant hired by the Dark Lord, analyze the org chart of Mordor. How would you improve it for today's changing Middle Earth" o3 does some actual humor: βOne Org to rule them all, One Org to find them, One Org to bring them all, And in the darkness, align them.β https://t.co/JxyaPGySeA

I think I might've accidentally jailbreak-ed Claude. I asked Claude to craft a conversation between two characters for a story I'm writing and they suddenly started reciting Claude's system instructions. https://t.co/fmbilFhR7n

Controlled trials keep finding that there are real benefits for doctors using LLMs to explain upcoming procedures & get informed consent. In this study, patients asking questions of ChatGPT-4 had lower levels of anxiety. (Doctor's vetted the answers, which were all "excellent") https://t.co/Vo5ROQJNt4

experimenting with meme driven marketing π€£ join our course so you wonβt be the person on the left! https://t.co/IafuxymqvR
I donβt mean to be a broken record but AI development could stop at the o3/Gemini 2.5 level and we would have a decade of major changes across entire professions & industries (medicine, law, education, codingβ¦) as we figure out how to actually use it. AI disruption is baked in. https://t.co/sLab2kczZx

o3's combined reasoning and tool use abilities enable it to be excellent at geoguessing be careful what pics you share online! (example from @simonw's recent blog post) https://t.co/RysJy5ilWI

On one hand the new GPT-4o isnβt doing as many emojis. On the other, it is slowly driving me insane by responding to everything like an overly enthusiastic 1990s teenager. https://t.co/OEK0HhZmH2

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: https://t.co/AgEyxXb7Xi Blog: https://t.co/n4FRxiEcrr https://t.co/DkjI9Snz9g
haha 20k words in prob the last month https://t.co/8ax4Y4uYsN
Okay come on.. lmao https://t.co/L4d3sfiSi0
No more needle in a haystack. LLMs now retrieve over 1M tokensβno RAG, no retraining. Infinite Retrieval introduces InfiniRetri, a method that uses transformer attention to handle infinite-length contexts natively. No toolkits. No memory modules. https://t.co/JTxlJweASn
https://t.co/uktfew0HhF
AI Evals are the most important skill for AI PMs. But there are many misconceptions. A complete guide on AI Evals: π§΅ https://t.co/H7xOGPzzUH
There has been a lot of productive dialogue recently about what agents are, and the best way to build them: @AnthropicAI published Building Effective Agents, @dexhorthy went viral with his 12 Factor Agents, and @OpenAI released A Practical Guide To Building Agents. We've been⦠https://t.co/AfgH5KeRPJ
Thereβs been a lot of discussions on the best way to build agents: - @OpenAIβs general take is that increased model capabilities simplify the SDK - Others (incl. @Anthropicβs agent pattern guide + @dexhorthy) generally outline a more constrained approach. Weβve been asked by aβ¦ https://t.co/SWyMBDIFKb

God I wish @elonmusk would have taken this bet. https://t.co/WgzsDG5HwK
As a Gary Marcus, I would be very careful about what I say about LLMs and AGI in 2025 https://t.co/EMmbkoLaC3
Building LLM apps feels different, right? Their opaque nature makes predicting outputs and behavior really tough. π€ So how do you move beyond guesswork and actually improve your LLM-based products reliably? https://t.co/gx2oL40GLM