Your curated collection of saved posts and media
We just launched Self-Hosted Voice AIโour Universal-Streaming model, deployed on your infrastructure, with the same performance developers already trust from our API. Self-hosting speech AI used to mean compromising on quality or paying a premium for the privilege. Not anymore. Here's what this unlocks: ๐น Co-locate your Voice AI stack where your traffic originates for optimized latency ๐น Process all audio within your controlled perimeter for full data sovereignty ๐น Deploy with Kubernetes, AWS ECS, or any container orchestration platform you're already using ๐น Count usage toward your cloud provider's committed spend program No self-hosting premium. Session-based pricing with volume discounts. For teams navigating strict compliance requirements, data residency mandates, or just wanting tighter control over their stackโthis is built for you.
โก Faster than Fast. Designed for Agentic AI. Introducing Xiaomi MiMo-V2-Flash โ our new open-source MoE model: 309B total params, 15B active. Blazing speed meets frontier performance. ๐ฅ Highlights: ๐๏ธ Hybrid Attention: 5:1 interleaved 128-window SWA + Global | 256K context ๐ Performance: โ๏ธ Matches DeepSeek-V3.2 on general benchmarks โ at a fraction of the latency ๐ SWE-Bench Verified: 73.4% | SWE-Bench Multilingual: 71.7% โ new SOTA for open-source models ๐ Speed: 150 output tokens/s with Day-0 support from @lmsysorg๐ค ๐ค Model: https://t.co/4Etm0yZKTL ๐ Blog Post: https://t.co/5zxmcDuB6o ๐ Technical Report: https://t.co/crac1YTLYl ๐จ AI Studio: https://t.co/nSReUs6QgW

Just tried @bubblelab_ai and Iโm actually blown away Best way I can describe it is if @cursor_ai and @n8n_io had a baby This afternoon I built a full sales qualifier workflow in three prompts: โ defined a target segment (e.g. Shopify store owners) โ identified market leaders โ analyzed their social presence + inferred priorities โ extracted contact details โ generated outreach emails + sales call talking points โ compiled everything into a single Google Sheet It got it right with zero errors As tools like this emerge, the hard part stops being building workflows and starts being understanding them The advantage moves to people who deeply understand the domain and know what questions are worth asking Props to @Selinaliyy and the Bubble Lab team! This feels like a glimpse of what AI-native ops should look like.
We're going live in one hour! Tune in for a hands-on look at Gemini 3, Nano Banana Pro, Veo and how to create a full brand ecosystem from scratch. โ https://t.co/GByTvMja87 https://t.co/691F2HEqPj
Today we're announcing cua-bench: a framework for benchmarking, training data, and RL environments for computer-use AI agents. Why? Current agents show 10x variance across minor UI changes. Here's how we're fixing it.
๐ Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. Weโre sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. ๐ Learn more: https://t.co/FPnfv66UCP
๐ We just launched OpenHands Software Agent SDK on @ProductHunt! A smarter way to build agent-driven software โ fast, flexible, and production-ready. ๐ Check it out + show some love! https://t.co/xekxMFGJtD
Today marks an important milestone in the history of @SimularAI, the autonomous computer company. Our open source computer-use agent, ๐๐ ๐๐ง๐ญ ๐, scored 72.6% on the OSWorld benchmark, surpassing the human baseline (72.36%) for the first time ever. This milestone matters because it shows AI can now use computers the way humans do, and, in many cases, do it better. This is a glimpse of a future where work becomes faster, more accessible, and more empowered for everyone. #AI #Automation #Simular #AgenticAI #ComputerUse
Another banger paper from Apple. View synthesis from a single image is impressive. But most methods are extremely slow. The default approach to high-quality novel view synthesis uses diffusion models. Iterative denoising produces compelling results, but latency can stretch into hundreds of seconds per scene. Real-world applications, like in AR/VR headsets and interactive photo browsing, need instant 3D from a single photograph. This new research from Apple introduces SHARP, a method that generates a complete 3D Gaussian representation from a single image in under one second on a standard GPU. Architecture details: A neural network takes a single photograph and produces about 1.2 million 3D Gaussians in a single feedforward pass. The architecture builds on a pretrained depth backbone, but crucially unfreezes parts of it during training. A learned depth adjustment module resolves the inherent ambiguity of monocular depth estimation. A Gaussian decoder then refines all attributes: position, scale, rotation, color, and opacity. Results On ScanNet++, SHARP achieves 0.071 DISTS versus 0.090 for Gen3C, the previous best. That's a 21% improvement in perceptual quality. LPIPS drops from 0.227 to 0.154, a 32% reduction. The latency difference is more dramatic. SHARP runs in under 1 second. Gen3C takes approximately 850 seconds. That's roughly a 1000x speedup. Once the 3D representation exists, rendering runs at over 100 frames per second at high resolution. The representation is metric with an absolute scale, so virtual cameras can be accurately coupled to physical headsets. The method is not perfect. SHARP excels at nearby views corresponding to natural head motion and posture shifts. Diffusion-based methods handle faraway views better by hallucinating plausible content for regions with no overlap to the input. Paper: https://t.co/JPoXOqqj2l Learn to build effective AI Agents in my academy: https://t.co/JBU5beIoD0

Who leaking https://t.co/mhuBy8FQp8
EgoX: Generate immersive first-person video from any third-person clip A novel framework from KAIST AI & Seoul National University that leverages video diffusion models to transform a single exocentric video into a realistic egocentric view. See it in action! https://t.co/Vt3cPAdUL3
Time to follow https://t.co/dqWrV1R3t7 to get the notification!
๐ฎGet a first look at Tencent HY World 1.5 (WorldPlay)! ๐ฎ Our newest world model with real-time interaction and long-term memory. Itโs going *open-source* tomorrow. https://t.co/zvMI3rCX7u
Time to follow https://t.co/dqWrV1R3t7 to get the notification!
๐ Introducing Nemotron-Cascade! ๐ Weโre thrilled to release Nemotron-Cascade, a family of general-purpose reasoning models trained with cascaded, domain-wise reinforcement learning (Cascade RL), delivering best-in-class performance across a wide range of benchmarks. ๐ป Coding powerhouse After RL, our 14B model: โข Surpasses DeepSeek-R1-0528 (671B) on LiveCodeBench v5/v6/Pro. โข Achieves silver-medal performance at IOI 2025 ๐ฅ. โข Reaches a 43.1% pass@1 on SWE-Bench Verified, and 53.8% with test-time scaling. ๐ง What is Cascade RL? Instead of mixing heterogeneous prompts across domains, Cascade RL trains sequentially, domain by domain, which reduces engineering complexity, mitigates heterogeneous verification latencies, and enables domain-specific curricula and tailored hyperparameter tuning. โจ Key insight Using RLHF for alignment as a pre-step dramatically boosts complex reasoningโfar beyond preference optimization. Subsequent domain-wise RLVR stages rarely hurt the benchmark performance attained in earlier domains and may even improve it, as illustrated in the following figure. ๐ค Models & training data ๐ฅ ๐ https://t.co/wfVcAaMocA ๐ Technical report with detailed training and data recipes ๐ https://t.co/FdMINvB4yM

Last year Molmo set SOTA on image benchmarks + pioneered image pointing. Millions of downloads later, Molmo 2 brings Molmoโs grounded multimodal capabilities to video ๐ฅโand leads many open models on challenging industry video benchmarks. ๐งต https://t.co/uFs30b2DR3

Fine-tune Nemotron 3 Nano in TRL with coding agents like claude code, colab, locally or on the hub. To fine tune, pick one of these tools: - Combine HF skills with a coding agent like claude code. - Use this colab notebook. - Train it on HF jobs using the Hugging Face hub - If you can, run this script on your own setup with uv This should get anyone started with fine tuning, and this is the perfect model to start with.
New model from @Meituan_LongCat ๐ LongCat-Video-Avatar๐ฅ Audio driven character animation with text, image, and video inputs, all in one! โจ MIT license โจ Audio > talking video (single & multi-person) โจ Natural motion and lip sync โจ Fewer repeats, stable identity โจ Available on @huggingface
Introducing the Ndea podcast - Abstract Synthesis. Hear the stories behind interesting academic papers in the world of program synthesis. Episode 1 features @MarkSantolucito, @BarnardCollege/@Columbia, discussing his paper "Grammar Filtering for Syntax-Guided Synthesis". https://t.co/uJ1NVxU6rK
@youwouldntpost @Srirachachau Downloading the โdriving during daytimeโ patch https://t.co/R6EmIolLDo
@youwouldntpost @Srirachachau Downloading the โdriving during daytimeโ patch https://t.co/R6EmIolLDo
The Tesla Cybertruck just earned the Top Safety Pick+ award, scoring a perfect โGoodโ rating in literally every major crash category in the 2025 IIHS crash tests https://t.co/o5IRpmMqzg
Woman who joked about putting toilet cleaner and feces in food of "white MAGA family" identified as daughter of Virginia delegate https://t.co/LwDaOssHEO
The Woke Mind Virus in Academia https://t.co/ztXf1lLxL6
GPT-5.2 is our strongest model on the FrontierScience eval, showing clear gains on hard scientific tasks. But the benchmark also reveals a gap between strong performance on structured problems and the open-ended, iterative reasoning that real research requires. https://t.co/lZsZSXkOrj

Introducing ChatGPT Images, powered by our flagship new image generation model. - Stronger instruction following - Precise editing - Detail preservation - 4x faster than before Rolling out today in ChatGPT for all users, and in the API as GPT Image 1.5. https://t.co/NLNIPEYJnr
This map should be included in every history book... https://t.co/VyuLo90IEE
Quantum Dreaming 2025: When Dreams Become Parallel Reality Portals ๐๐ญ Last year we asked: Are your dreams just imaginationโฆ or glimpses into alternate timelines? ๐ชโจ This year, the answer is clearer than ever. ๐ 2025 brought breakthroughs that turned quantum dreaming from theory to lived experience: โข Neuralinkโs first 1000+ volunteers reported vivid โtimeline bleedโ dreams ๐ง โก โข DMT + VR studies showed 87% of participants experienced consistent parallel-world memories ๐๐ โข Lucid dreamers using tDCS + galantamine now report 40-minute โvisitsโ to stable alternate realities โณ๐ Every dรฉjร vu? A memory leak from a timeline where you chose differently. ๐ Every precognitive dream? Your mind tuning into a branch thatโs already happening. ๐ฎ 2025 is the year we stopped calling them โjust dreams.โ We started calling them evidence. ๐โจ Keep dreaming, explorer. One of them might be more real than this one. ๐๐ฆ #QuantumDreaming #ParallelRealities #2025Awakening #LucidDreaming #Multiverse Grok Imagine prompt: Pastel color quantum dreaming
LlamaSplit automatically separates bundled documents into distinct sections so you don't have to manually split them anymore. Our new beta API uses AI to analyze page content and group consecutive pages by category - perfect for processing mixed document bundles that contain multiple distinct documents: ๐ Define categories with natural language descriptions and get back exact page ranges with confidence scores ๐ฏ Route different document types to appropriate agents โก Scale beyond manual document separation ๐ Combine with LlamaExtract to run targeted data extraction on each separated segment Unlike our existing Classify product that categorizes separate files, LlamaSplit looks inside a single document to find boundaries between different document types. Try LlamaSplit in beta: https://t.co/cQqeZCGeww
๐ฎGet a first look at Tencent HY World 1.5 (WorldPlay)! ๐ฎ Our newest world model with real-time interaction and long-term memory. Itโs going *open-source* tomorrow. https://t.co/zvMI3rCX7u
๐ง The Little Mermaid gets her voice back. Voice Control feature is now live in Kling VIDEO 2.6. Voice Consistency Now Resolved. Say goodbye to generic voices and create a custom voice, switch styles, and even sing โ all perfectly matched to your characters.
Introducing YouBase by YouWare. The complete production backend for vibe coding. For just $20/month. Auth, Database, Storage, Edge Functions Deploy to your own domain Zero configuration No cloud credits. No usage fees. No surprises. One prompt โ Full backend. Live on your domain. Start building at link in bio!
Multimodal LLMs (MLLMs) excel at reasoning, layout understanding, and planningโyet in diffusion-based generation, they are often reduced to simple multimodal encoders. What if MLLMs could reason directly in latent space and guide diffusion generation with fine-grained, spatiotemporal control? ๐ค Introducing MetaCanvas ๐จ A lightweight framework that translates MLLM reasoning into structured spatiotemporal conditions for diffusion models. ๐งต ๐