Your curated collection of saved posts and media
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods "This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models on challenging reasoning tasks." "Non-reasoning models⦠https://t.co/g8tqp3whNQ
o4-mini is the single greatest search engine that I have ever used. It can scan the entirety of PyTorchβs github to find the exact operation I am interested in https://t.co/tbC3jcKDCO
We're building Gemma as a community and developer-centric project π€Please share your asks, feedback, and pain points We announced PaliGemma 2 Mix, Gemma 3, Gemma 3 QAT, TxGemma, and DolphinGemma. And there's much more to come! We're learning, hearing, and improving. Let's go! https://t.co/niICRSnNLC
Utterly confusing, funny and sad https://t.co/wOMoh4MYd5
One of the most impressive AI demo I've seen. This is the future of customer service. Agents that can understand text, speech, images and even live video. Soon to be all open-source. https://t.co/kKoNTgTJ1T
OpenAI recently released a guide on building agents which contains some misguided takes There's a lot of FUD, confusion, hype, and noise around agents I wrote a blog on how to think about agent frameworks. Includes: Background Info - What is an agent? - What is hard about⦠https://t.co/7VKe9VMBad
Interesting how much specifically instructions are given for making games. AI labs are optimizing for viral use cases. https://t.co/PWQdkuz0zw
Pydantic quietly dropped the most straightforward framework for building AI Agents. This ~28 liner builds an agent that can fetch URLs with a "fetch" MCP server. https://t.co/enGmnuOZqE
haha signed up for @elevenlabsio just to play with their conversational AI agent https://t.co/U1iET1d0mu
Voice agent simulations let employees perfect interactions hundreds of times with personalised feedback before meeting real customers. Using @elevenlabsio's new conversational AI, watch as I try my best to explain the in-flight menu to a fictional customer :) https://t.co/1fjIGUwrBH
Ilya Sutskever (OpenAI cofounder) top 30 must-read research papers. "If you really learn all of these, youβll know 90% of what matters today" https://t.co/VprtIapiFF
Iβm releasing a set of slides Iβve used for various talks which lays out the architecture for agentic document workflows - how LLMs can parse, reason over, and act on PDFs, Excel etc. In general weβre really excited about using AI agents to automate knowledge work overβ¦ https://t.co/yxWRY6z08t
I hear a lot of talk about zero-knowledge proofs from crypto folks and I had no idea what it was until I watched this very intuitive video, it's actually quite interesting! https://t.co/ale4fQUhNv
Itβs definitely worth reading this post for anyone using Claude code. TIL that the word βultrathinkβ will result in maximum thinking Lots of other great tips in here https://t.co/ETS1eaqh5V https://t.co/yxUhR1tkXL
trained a nanoGPT? feeling behind before o4-mini? π¨π¨i'm open-sourcing beyond-nanoGPT, an internal codebase to help people go from LLM basics to research-level understanding. π¨π¨ it contains thousands of lines of from-scratch, annotated pytorch implementing advancedβ¦ https://t.co/51165pg73q
You can now run 100B parameter models on your local CPU without GPUs. Microsoft finally open-sourced their 1-bit LLM inference framework called bitnet.cpp: > 6.17x faster inference > 82.2% less energy on CPUs > Supports Llama3, Falcon3, and BitNet models https://t.co/AGPOsUjlyB
Weβre excited to feature ZapGit π« - an all-in-one place for you to manage @github issues and PRs through a natural language client π§βπ» Made possible by MCP (@zapier servers) and plugged into an agent workflow (@llama_index) 1. Choose the @github action and the repo 2. Agentβ¦ https://t.co/qkp50i2SOc
ChatGPT's new o3 model set a new IQ record, based on my site https://t.co/MKlEC93EK4 It got an IQ of 136! That's top 1% for humans. Thread: https://t.co/iHGpBlpRuD
rolling out @cursor_ai 0.49 hereβs whatβs new β https://t.co/5WWwdOPL4f
Gemini 2.5 Flash is here! It's Google's first hybrid model, which allows you to turn thinking on or off. It has a new parameter, thinking_budget (i.e., max # of thinking tokens), to control quality, cost, and latency. Flash also leads in the price-to-performance ratio. https://t.co/QHqy1h2BOG
A full-stack JavaScript web app using LlamaExtract to perform financial analysis! LlamaExtract, part of LlamaCloud, allows you to create agents by defining reusable schemas that precisely define what structured data you want extracted from complex documents. In this example,β¦ https://t.co/AgvOLKk4Pd
Perplexity serves MoEs like post-trained versions of DeepSeek-v3. These models can be made to utilize GPUs efficiently in multi-node settings, achieving high throughput and low latency simultaneously, compared to single-node deployments. https://t.co/pZwOaRb0oZ
Announcing: Voice AI course and online community ... @swyx and I are hosting a month-long technical deep dive into Voice AI and Voice Agents. Our goals are to: β‘οΈ cover all the lessons we've learned over the last two years building realtime, conversational AI, β‘οΈhost funβ¦ https://t.co/E68FivL4y0
I am frequently asked: "when will AI work with well spreadsheets?" Quietly, Google has come some way in realizing this vision, using AI at both the cell & sheet level (and running code). Here are examples of me taking fake startup financials I use in class and it spotting issues https://t.co/MkFqKnmMQK

sooo @aiDotEngineer now has an official MCP server :) (and a @jeremyphoward llms.txt) try adding this to your friendly neighborhood VS Code Fork: and then convert your codebase into a talk with natural language inside your IDE: happy to share that @threepointone and i vibe⦠https://t.co/HGufVdeRju

Dang it, I made an eval I thought I'd trounce LLMs at: identifying species in photos I've taken over the years, given ~5 plausible options. TIL 1) I don't know my latin names as well as I thought, and 2) 4o apparently does π Writeup once I do the human baseline score + polish https://t.co/iEVirKt3Vz

π Finally Cursor has notebook support! https://t.co/pNLRU8ZU7d
Multimodal RAG: Just use ColPali/DSE then pass your screenshots to the LLM This is the dream, but how well do LLMs read text contained in images? We wanted to know, so we tried a simple thing: do results change on evals when using screenshots rather than text as input? Yes. https://t.co/j23rObYcG0
o3 Pro on ARC-AGI Semi Private Eval Results ARC-AGI-1: * Low: 44%, $1.64/task * Medium: 57%, $3.18/task * High: 59%, $4.16/task ARC-AGI-2: * All reasoning efforts: <5%, $4-7/task Takeaways: * o3-pro in line with o3 performance * o3's new price sets the ARC-AGI-1 Frontier https://t.co/ihTP82ue4D
That Altman essay⦠One thing you can definitely say about him and Dario is that they are making very bold, very testable predictions. We will know whether they are right or wrong in a remarkably short time https://t.co/4NQCIHrBSQ
This was less than almost every estimate I have seem: according to the latest Sam Altman post, the average ChatGPT query uses about the same amount of power as the average Google search in 2009 (the last time they released a per-search number)β¦ 0.0003 kWh https://t.co/AgVQB7zkOu

As someone who was formally trained in applied statistics, this book legitimately changed my life. It's old now, but fundamentally it's the intellectual bridge between statistics and machine learning. And I crossed it. https://t.co/2YSfFbQ5gM