Your curated collection of saved posts and media
Introducing Code Review Bench v0: https://t.co/iAZDURyqol The first independent code review benchmark. 200,000+ PRs. Unbiased. Fully OSS. Updated daily. Tool performance highlights π§΅π Featuring: @augmentcode @baz_scm @claudeai @coderabbitai @cursor @GeminiApp @github @graphite @greptile @kilocode @OpenAIDevs @propelcode @QodoAI
βοΈΒ Hello? AI Selves now have phone numbers! Put them in your imessage or SMS to be there when youβre not, settle arguments in your group chats, and make talking to yourself more normal. More ideas ππ§΅ Plus, weβre letting more people in off of our waitlist! QRT to get your own early access code.
Imagination Helps Visual Reasoning, But Not Yet in Latent Space Causal mediation analysis reveals latent visual reasoning in MLLMs fails: latent tokens ignore inputs and barely affect answers. CapImagine, a text-based alternative, teaches explicit imagination and significantly outperforms latent baselines.
Top AI Papers of The Week (Feb 24 - Mar 2) - A Very Big Video Reasoning Suite: 200 tasks, 1M+ video clips for video reasoning research - Does Your Reasoning Model Implicitly Know When to Stop Thinking? Introducing SAGE paradigm - AgentFly: Fine-tuning LLM agents without fine-tuning LLMs - Microsoft rStar2-Agent: 80.6% on AIME24 with just 14B parameters - From Blind Spots to Gains: Diagnostic-driven iterative training for LMMs - VibeVoice: Synthesizing 90-minute multi-speaker conversational speech - Alibaba MobilityBench: Benchmarking real-world route-planning agents - NVIDIA's data engineering strategies for scaling LLM terminal capabilities - VESPO: Variational sequence-level soft policy optimization for stable RL training - Beyond Pass@1: Self-play with variational problem synthesis sustains RLVR Find them below:
Thanks AK for reposting our work! Here are all the links for anyone who wants to check out more! Paper:Β https://t.co/6PajZXj6V0 Project Website:Β https://t.co/5VTiCqTDhN EvalKit:Β https://t.co/lxhyzMaI8j Cloud Infra:Β https://t.co/QNJRfOKQN3 Training Set:Β https://t.co/DlzLojQjsR Eval Set:Β https://t.co/Tzs2jAN99C Leaderboard:Β https://t.co/peZ1XkelYY Model:Β https://t.co/gFFJofrlNR

JavisDiT++ Unified Modeling and Optimization for Joint Audio-Video Generation https://t.co/bd8BlNZNEr
Top AI Papers of The Week (Feb 16-22) - Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs - SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise - GLM-5: from Vibe Coding to Agentic Engineering by @zhipuAI - Experiential Reinforcement Learning - MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs - Zooming without Zooming: Region-to-Image Distillation by @InclusionAI - Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? - DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval - SLA2: Sparse-Linear Attention with Learnable Routing and QAT - SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Find them below:
Just shipped! @huggingface storage add-ons. Starting at $12/month per TB - 3x cheaper than regular cloud storage, with very fast uploads and downloads powered by Xet's deduplication. You can now buy, upgrade, and cancel storage plans directly from your billing settings. https://t.co/RDylcDjkb4
Iβm giving an agent control over Reachy Mini from @huggingface and letting it understand and share spatial data via @Spectacles AR is the human interface for robotics and physical AI imo. It feels like absolute magic to interact with this, both in voice/agent and βpuppeteeringβ mode. Iβll probably work on AR for either an arm (manipulation tasks) or some sort of drone (locomotion in 3D space) nextβ¦ Project is fully open source btw: https://t.co/pmkXJR0U7f Thank you @SensAIHackademy for sending me the robot!
TranslateGemma 4B by @GoogleDeepMind now runs 100% in your browser on WebGPU with Transformers.js v4. 55 languages. No server. No data leaks. Works offline. A 4B parameter translation powerhouse, right in your browser. Try the demo π https://t.co/YgYskHqBRm
datasets v4.6.0 is out π€ News for multimodal/streaming: π¬ push_to_hub() video datasets π image/audio/video are now PLAIN blobs in Parquet βοΈ type inference for Lance βοΈ .reshard() streaming Parquet datasets: shard per row group instead of file All optimized for Xet π§΅π https://t.co/kD9tewovNN
π pplx-embed is @perplexity_ai new collection of state-of-the-art multilingual embedding models optimized for real-world, web-scale retrieval tasks! - Built on Qwen3 w/ diffusion-based pretraining and bidirectional attention - Available at 0.6B and 4B parameters w/ native INT8 quantization - pplx-embed-v1 for independent text embeddings - pplx-embed-context-v1 for document chunks in RAG - Validated on real-world search scenarios over tens of millions of documents - Permissive MIT License - Available on the @huggingface Hub, and supported on Text Embeddings Inference, Sentence Transformers, and Transformers.js
Want to bring open-source vision language models to the edge? π» Check out our @huggingface article on deploying NVIDIA Cosmos Reasoning 2B across the NVIDIA Jetson family with vLLM and a Live VLM WebUI. π https://t.co/Tp0tZtjgRp https://t.co/tytkmCRJzx
I tried Codex 5.3 (web) for porting VidEoMT, a simple and elegant ViT-based video segmentation model, to @huggingface Transformers Sadly, it missed the global picture, mistakenly assuming the model uses DINOv3 as its backbone, whereas it actually uses DINOv2. It got stuck. Opus 4.6 fixed it after I told it The job of ML Engineer is still safe - humans stay in the driver's seat PR: https://t.co/5ahL0GqtZN

BEDLAM2.0 image and depth data are now available via Hugging Face, providing high-speed worldwide download access to over 26TB of synthetic data for non-commercial research. Hugging Face: https://t.co/tl8S3DJNWw Project: https://t.co/NR5Np9UT46

What happens when you make an LLM drive a car where physics are real and actions can't be undone? I ported CARLA, the autonomous driving simulator, to OpenEnv and added training via TRL + HF Spaces In 50 steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians https://t.co/QR4FJS70h7
Editing images is a series of state transitions between the source image and the edited image that we want. Yet, the existing paradigm doesn't explicitly include any transitioning priors in the editing process. This becomes particularly prevalent for edits, involving causal dynamics (e.g., refraction, deformation). To model this kind of physics-informed information, we leverage the rich priors present in videos and introduce PhysicEdit π₯ TL;DR: We fine-tune QwenImage Edit on a curated dataset of videos with reasoning traces and fixed-length transition queries to do solid physics-aware image editing! In the process, we introduce a cool dataset "PhysicTran38K", consisting of 38K transition trajectories across five physical domains and devise a method to provide supervision from it QwenImage Edit. Hop in to learn more β¬οΈ
Is it worth re-OCR'ing old library index cards? Re-OCR'd 453,000 from Boston Public Library's rare books catalogue. ~$50 compute using @huggingface Jobs BPL's own guide calls their search "extremely unreliable." Does better OCR and semantic search fix it? Demo link below https://t.co/DC5nqtmQtC

Never thought this day would come, but we've hit 10k followers on @huggingface :') π€ Huge thank you to them for their endless storage grants allowing me to upload over 2000 quants these past few years! https://t.co/Ueh6ty1Yed
Marco built Reachy Phone Home so Reachy Mini can detect when youβre on your phone, using @Ultralytics YOLO26 vision, and respond in real time with voice + motion. Built on Arm (Apple Mac / Raspberry Pi 5) with @huggingface π€ + @pollenrobotics π¦Ύ, itβs now an award-winning project, earning an @NVIDIAGTC Golden Ticket π It's great to see our developers build and win in the open AI ecosystem π https://t.co/C8atY3fwLv
π€ @perplexity_ai has released 4 open-weights state-of-the-art multilingual embedding models designed for retrieval tasks! pplx-embed-v1 and pplx-embed-context-v1 Specifically trained for int8 and binary embeddings, they'll be viable for massive search problems. Details in π§΅ https://t.co/smqcPLKjU2
https://t.co/sOMaBpuaQJ
https://t.co/sOMaBpuaQJ
New course: Gemini CLI: Code & Create with an Open-Source Agent, built with @googlecloudtech/@geminicli and taught by @JackWoth98. Agentic coding assistants like Gemini CLI are transforming how developers work. This short course teaches you to use Google's open-source agent to coordinate local tools and cloud services for coding and non-coding workflows. Gemini CLI works from your terminal, so it works with your local files and development tools. You can also connect it to services through MCP. Then provide high-level instructions, and it autonomously plans and executes complex workflows. Skills you'll gain: - Build website features and automate code reviews with GitHub ActionsCreate data dashboards that combine local files with cloud data sources - Use MCP servers and extensions to orchestrate workflows across GitHub, Canva, and Google Workspace - Generate social media content from multimedia files like conference recordings I particularly appreciate that Gemini CLI is open-source. You can see exactly how it works, read the prompts it uses, and understand its architecture. The community has contributed thousands of pull requests. Since Gemini 3βs release I've found Gemini CLI highly capable - this is a tool worth having in your toolbox! Whether you're prototyping applications, automating workflows, or working with multimedia content, join to learn to delegate complex tasks and build faster: https://t.co/m3J7kwQpxC