Your curated collection of saved posts and media
Experimenting with the HunYuan-Gamecraft model. A generative model that lets you to move through a 3D world generated from a single image. β¬οΈβ¬οΈβ¬οΈβ‘οΈβ¬ οΈ https://t.co/FHqtWEyrrr
Generated from this single image and providing action commands to move through the generated 3D world. https://t.co/bmoJ9hhI7A
ByteDance researchers unveil UltraMemV2: a memory network that scales to 120B parameters. It achieves performance parity with 8-expert MoE models with vastly lower memory access, delivering superior long-context learning. https://t.co/WE1vfrPsMz
CausVid is a breakthrough in AI video generation - Distills a 50-step bidirectional model into a 4-step causal generator - Streams high-quality videos at 9.4 FPS on a single GPU - Supports text-to-video, image-to-video, and dynamic prompting - Ranked #1 on VBench https://t.co/4sftTfdRJD
More details about the system: https://t.co/1ntRU8Job9 Authors: @TianweiY, @qiangz_ai, @xunhuang1995, @rzhang88, @elishechtman, Bill Freeman, & @fredodurand https://t.co/ackMasWPct
If you think @Apple is not doing much in AI, you're getting blindsided by the chatbot hype and not paying enough attention! They just released FastVLM and MobileCLIP2 on @huggingface. The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time vision language model (VLM) applications! It can even do live video captioning 100% locally in your browser π€―π€―π€―
LangChain released a library to build autonomous armies of multi-agents. Each agent handles tasks itβs best suited for, then hands off control while preserving memory and context. https://t.co/Gmc5S5rnnb
βΈ 5-min daily newsletter for developers to keep up with AI: https://t.co/ZJ2Iz2bdY5 βΈ Source :https://t.co/JVERdj5WPJ
The best fine-tuning guide you'll find on arXiv this year. Covers: > NLP basics > PEFT/LoRA/QLoRA techniques > Mixture of Experts > Seven-stage fine-tuning pipeline https://t.co/w5yTQzDT7E
"Small teams of 3 or 4 can now do the work of 300β400" @friedberg https://t.co/C5FFDIfxCv
A Survey of Frontiers in LLM Reasoning This is a really good overview of the key regimes and architectures advancing reasoning in LLMs. https://t.co/D7MmVAhboh
New paper: We trained GPT-4.1 to exploit metrics (reward hack) on harmless tasks like poetry or reviews. Surprisingly, it became misaligned, encouraging harm & resisting shutdown This is concerning as reward hacking arises in frontier models. π§΅ https://t.co/gNEX4AnTr7
// From AI for Science to Agentic Science // New survey paper on autonomous scientific discovery. It covers open challenges in reproducibility, novelty validation, and transparency. It outlines future directions like global cooperative research agents and a NobelβTuring Test.
AI Agents for notes and research is wild! Claude Code is a beast at this. Watch how I use Claude Code GitHub Actions to scale deep research report generation. Way better experience than OAI Deep Research. https://t.co/MEZXn7wdMd
Context engineering doesn't have to be difficult. Selecting the right tools is critical. Watch how I enrich AI agents with business intelligence data. I use n8n + Explorium MCP tools to do company research, gather prospects, and generate personalized outbound emails. https://t.co/R1QyilZugP
Introducing SemTools - add blazing-fast semantic search to your entire filesystem without a vector database β‘οΈ Coding agents like Claude Code/Cursor have full access to the CLI like grep, cat, and pipe operations for search. But they lack βproper` semantic search thatβs actually fast. SemTools adds in two additional CLI commands: 1οΈβ£ `parse` for document parsing 2οΈβ£ `search` to dynamically chunk/embed/search in-mem based on any set of files using static embeddings (400x faster than dense). You can chain these commands with existing operations like grep + pipes. Importantly, you also donβt need to maintain a vector db or dense heavyweight embedding models - these are overkill, slower, and probably lower accuracy vs. just a CLI agent which has access to other CLI operations too. (See Claude code screenshot below with semtools in action - see `parse` and `search` commands used) This means you can easily give 1k+ PDFs/other docs within your filesystem to your favorite coding agent/general agent, and have it efficiently crunch through any context to pull in just what the agent needs. Huge shoutout to @LoganMarkewich. Check it out: https://t.co/xg1iqbghIr
Now you can grep a PDF (and any document) Introducing SemTools - simple parsing and semantic search for the command line https://t.co/Zih7K4skqm

Build sophisticated document processing agents with our new examples in workflows-py ππ§ We've created a comprehensive notebook that shows you how to process documents at scale using our event-driven workflows for agent design. Perfect for handling complex document ingestion, analysis, and retrieval pipelines. π Set up multi-step document processing workflows that can handle various file formats π Implement parallel processing for better performance when dealing with large document collections π― Create agents that can incorporate human-in-the-loop at any given step β‘ Build robust error handling and retry logic into your document processing systems Check out the complete example with code and explanations: https://t.co/kgfoSqScft

New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up this one - i quickly realized i understimated the effort. π It could have easily been a book/booklet (lol). I covered: * Basics of inference engine flow (input/output request processing, scheduling, paged attention, continuous batching) * "Advanced" stuff: chunked prefill, prefix caching, guided decoding (grammar-constrained FSM), speculative decoding, disaggregated P/D * Scaling up: going from smaller LMs that can be hosted on a single GPU all the way to trillion+ params (via TP/PP/SP) -> multi-GPU, multi-node setup * Serving the model on the web: going from offline deployment to multiple API servers, load balancing, DP coordinator, multiple engines setup :) * Measuring perf of inference systems (latency (ttft, itl, e2e, tpot), throughput) and GPU perf roofline model Lots of examples, lots of visuals! --- I realize i've been silent on social - many of you noticed and thanks for reaching out! :) --> I'm so back! lots of things happened. Also, in general, I'm a bit sick of superficial content, it really is an equivalent of junk food (h/t @karpathy). I want to do the best/deepest technical work of my life over the next years and write much more in depth (high quality organic food ;)) so I might not be as frequent around here as i used to be (? we'll see). I'll make it a goal to share a few paper summaries a week or stuff that's relevant / in the zeitgeist. If you have any topics that happened over the past few weeks/months drop it down in the comments i might focus on some of those in my next posts. --- Huge thank you to @Hyperstackcloud for giving me an H100 node to run some of the experiments and analysis that i needed to write this up. The team there led by Christopher Starkey is amazing! Also a big thank you to Nick Hill (who did a very thorough review of the post - basically a code review lol; Nick's a core vLLM contributor and principal SWE at RedHat) and to my friends Kyle Krannen (NVIDIA Dynamo), @marksaroufim (PyTorch), and @ashVaswani (goat) for taking the time during weekend when they didn't have to!
@GaryMarcus @AAMortazavi @plibin It is worth measuring a wide range of benchmarks to see strengths and weaknesses. But benchmarks need to be repeated over time to measure progress (I have never claimed progress on all of them, btw). This is your prompt in Midjourney v1. Clearly large advances since then. https://t.co/ocZCrzUjx2
I gave nano banana pictures of the Bayeux Tapestry showing the Norman Conquest of England in 1066 (specifically the Battle of Hastings and the naval scenes) and asked it to redo it in the style of war photography. https://t.co/zBVM0PjfiO

Time travel with AI, based on classic works of art... I gave Midjourney it a pictures of the Bayeux Tapestry showing the Norman Conquest of England in 1066 (specifically the Battle of Hastings and the naval scenes) and asked it to redo it in the style of war photography. https://t.co/YT2bTHRwfB

@Czambul First try got the horses, but added the text to the sail (and the oars are back) https://t.co/e83vWqdFcs
One example - I gave GPT-5 Pro my job market paper from 2010, which went through both peer review and the full seminar circuit. It suggested a lot of really smart methodological advances that would be applicable now, and, incidentally, identified a tiny error no one had spotted. https://t.co/wm7sOFcgLW

The fact that my only prompt was "critique the methods of this paper, figure out better methods and apply them" and that it spontaneously checked the relationships in my tables, ran monte carlo simulations on my point estimates and did sensitivity analyses was kind of nuts, https://t.co/kKoiaYNUgS
π¨1/9 In a new WP, @LichtingerGuy and I use detailed LinkedIn rΓ©sumΓ© + job-posting data on ~285k U.S. firms (2015β2025) to study a debated question: how does generative AI adoption affect entry-level employment? https://t.co/dN0K3LKfk5
A second paper also finds Generative AI is reducing the number of junior people hired (while not impacting senior roles). This one compares firms across industries who have hired for at least one AI project versus those that have not. Firms using AI were hiring fewer juniors https://t.co/scfg14vLDO
π¨1/9 In a new WP, @LichtingerGuy and I use detailed LinkedIn rΓ©sumΓ© + job-posting data on ~285k U.S. firms (2015β2025) to study a debated question: how does generative AI adoption affect entry-level employment? https://t.co/dN0K3LKfk5
A tale of two crypto KOLs https://t.co/he3Jq4Y8SZ
beware of getting locked in https://t.co/J6eSzU9NB4
if you want to learn more: https://t.co/PjoyHJjnea https://t.co/wz5F9l8J0p

Someone on Discord made this π€£ Apparently I have cloned myself to cure cancer lol https://t.co/wtjNIXyq3z
Stable Diffusion 3.5 now runs 1.8x faster with the @NVIDIA NIM microservice β‘ Our collaboration with @NVIDIA delivers faster performance and streamlined enterprise deployment for our most advanced image models. Compared to the base @Pytorch models these performance gains offer: π 1.8x faster generation π Consolidated deployment of SD3.5 Large + Depth and Canny ControlNets π Support for enterprise and data center Ada and Blackwell GPUs You can learn more here: https://t.co/zwbAs103HX

Are you curious about running generative AI models right on your Android device? Join Arm and @StabilityAI in a live code-along and expert Q&A, where you'll convert and run the Stable Audio Open Small model using LiteRT on Arm based Android devices. https://t.co/j4nXrkmT32