Your curated collection of saved posts and media
Next edit suggestions just leveled-up in @code: with long-distance NES, you get edit suggestions anywhere in your file, not just near your cursor's position. Learn how the team built this - creating the training dataset, refining the UX, evaluating success, & more: https://t.co/xDaJRpikCi

in the 1 hour downtime of github, I finally got claude code teams setup with tmux. @AnthropicAI I finally have something to compare it with @augmentcode Intent. Both use tons of tokens but help you orchestrate discover-plan-build-eval-verify-precommit-commit-submit pr loops.
@theo Can we get a deeper dive on the Claude code CLI & codex app. Automations angle. I like codex for defined features & tests and can have it loop and push pr. I run Claude code in tmux with teams which allow for parallelizion but also human intervention..
Excited to launch Gemini 3.1 Pro! Major improvements across the board including in core reasoning and problem solving. For example scoring 77.1% on the ARC-AGI-2 benchmark - more than 2x the performance of 3 Pro. Rolling out today in @GeminiApp, @antigravity and more - enjoy! https://t.co/hOgEFtJ57w
What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ @drfeifei+... 1/n https://t.co/UQVUJOqvWz
Ollama can now launch Pi, a minimal coding agent which you can customize for your workflow ollama launch pi You can even ask pi to write extensions for itself https://t.co/hlUYnA3vl4
Train your π¦@openclaw simply by talking to it. Meet OpenClaw-RL. Host your model on our RL server, and your LLM gets optimized automatically. Use it anywhere. Keep it private. Make it more personal every day. We have fully open sourced everything. Come in and have fun!
1/4 We are thrilled to announce that CoPaw is now open source! After an incredible wave of feedback, our team has completely overhauled the engine to give you full control over your personal AI partner. Key Highlights: Ultimate Model Freedom Local-First: Full native support for Ollama, llama.cpp, and MLX (Apple Silicon). Bring Your Own Model: Easily add/remove custom model providers or private API endpoints. Your data, your choice. Smarter Long-Term Memory No more "amnesia." CoPaw remembers your preferences and tasks. New Local Mode: Use vector search without complex database installsβnow fully compatible with Windows for an out-of-the-box experience. Modular "Lego-Like" Architecture Skill Hub Integration: Import skills from community hubs like ClawHub with one command. Agentic Workflow: Modularized Prompts, Hooks, and Tools. Supports MCP (Model Context Protocol) hot-swappingβexpand capabilities without restarting. Proactive Multi-Channel Connection Connect to DingTalk, Feishu, Discord, iMessage, and more. A new standardized protocol makes it easier than ever to build your own channel plugins.
Alibaba has expanded its Qwen3.5 model family with 3 new models - the 27B model is a standout, scoring 42 on the Artificial Analysis Intelligence Index and matching open weights models 8-25x its size @Alibaba_Qwen has expanded the Qwen3.5 family with three new models alongside the 397B flagship released earlier this month: the Qwen3.5 27B (Dense, scoring 42 on Intelligence Index), Qwen3.5 122B A10B (MoE, 42), and Qwen3.5 35B A3B (MoE, 37). The two MoE (Mixture-of-Experts) models only activate a fraction of the total parameters per forward pass (10B of 122B and ~3B of 35B respectively). The Intelligence Index is our synthesis metric incorporating 10 evaluations covering general reasoning, agentic tasks, coding, and scientific reasoning. All models are Apache 2.0 licensed, natively support 262K context, and return to the unified thinking/non-thinking hybrid architecture from the original Qwen3, after Alibaba moved to separate Instruct and Reasoning checkpoints with the Qwen3 2507 updates. Key benchmarking results for the reasoning variants: β€ Qwen3.5 27B scores 42 on Intelligence Index and is the most intelligent model under 230B. The nearest model of similar size is GLM-4.7-Flash (31B total, 3B active) which scores 30. Open weights models of equivalent intelligence are 8-25x larger in terms of total parameters: MiniMax-M2.5 (230B, 42), DeepSeek V3.2 (685B, 42), and GLM-4.7 (357B, 42). In FP8 precision it takes ~27GB to store the model weights, while in 4-bit quantization you can use laptop quality hardware with 16GB+ of RAM β€ Qwen3.5 27B scores 1205 on GDPval-AA (Agentic Real-World Work Tasks), placing it alongside larger models. For context, MiniMax-M2.5 scores 1206, GLM-4.7 (Reasoning) scores 1200, and DeepSeek V3.2 (Reasoning) scores 1194. This is particularly notable for a 27B parameter model and suggests strong agentic capability for its size. GDPval-AA tests models on real-world tasks across 44 occupations and 9 major industries β€ AA-Omniscience remains a relative weakness across the Qwen3.5 family, driven primarily by lower accuracy rather than hallucination rate. Qwen3.5 27B scores -42 on AA-Omniscience, comparable to MiniMax-M2.5 (-40) but behind DeepSeek V3.2 (-21) and GLM-4.7 (-35). Although Qwen3.5 27B's hallucination rate (80%) is lower than peers (GLM-4.7 90%, MiniMax 89%, DeepSeek 82%), its accuracy is also lower at 21% vs 34% for DeepSeek V3.2 and 29% for GLM-4.7. This is likely a consequence of model size - we have generally observed that models with more total parameters perform better on accuracy in AA-Omniscience, as broader knowledge recall benefits from larger parameter counts β€ Qwen3.5 27B is equivalently intelligent to Qwen3.5 122B A10B. The 122B A10B is a Mixture-of-Experts model that only activates 10B of its 122B total parameters per forward pass. The 27B model leads in GDPval-AA (1205 Elo vs 1145 Elo) and slightly on TerminalBench (+1.5 p.p.), while the 122B model leads on SciCode (+2.5 p.p.), HLE (+1.2 p.p.), and has a lower hallucination rate (Omniscience -40 vs -42) β€ Qwen3.5 35B A3B (Reasoning, 37) is the most intelligent model with ~3B active parameters, 7 points ahead of GLM-4.7-Flash (30). Other models in this ~3B active category include Qwen3 Coder Next (80B total, 28), Qwen3 Next 80B A3B (27), and NVIDIA Nemotron 3 Nano 30B A3B (24) β€ Qwen3.5 27B used 98M output tokens to run the Intelligence Index, costing ~$299 via Alibaba Cloud API. This is notably high token usage compared to models at similar intelligence: MiniMax-M2.5 (56M), DeepSeek V3.2 (61M), and even the larger Qwen3.5 397B (86M). Other information: β€ Context window: 262K tokens (extendable to 1M via YaRN) β€ License: Apache 2.0 β€ API pricing (Alibaba Cloud): 397B: $0.60/$3.60, 122B: $0.40/$3.20, 27B: $0.30/$2.40, 35B A3B: $0.25/$2.00 per 1M input/output tokens
High-performance browser control for AI agents. Pinchtab is a lightweight (12MB) Go binary that runs Chrome and exposes a plain HTTP API so any agent or script can navigate web pages, read text efficiently, click/type interactively, and persist sessions. Zero config, framework-agnostic, token-efficient.
#youtube George Hotz | Programming | how I actually use agentic coding | Agentic AI https://t.co/rrloPO5Vif
Use my tscribe tool to easily transcribe X or Youtube videos... Great way to get transcripts into your claude code sessions... Simply... tscribe transcribe <video url> then just dump it to stdout (lots more you can do too...) tscribe dump
Mercury 2 is live. The world's first reasoning diffusion LLM β 5x faster than leading speed-optimized autoregressive models. Built for production: multi-step agents without delays, voice AI with tight latency budgets, instant coding feedback. Diffusion-based generation enables parallel refinement, not sequential tokens. Faster. More controllable. Dramatically lower inference cost. Available today on the Inception API. @dinabass has the story in @business.
π Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash Β· Qwen3.5-35B-A3B Β· Qwen3.5-122B-A10B Β· Qwen3.5-27B β¨ More intelligence, less compute. β’ Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B β a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. β’ Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models β especially in more complex agent scenarios. β’ Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: β 1M context length by default β Official built-in tools π Hugging Face: https://t.co/wFMdX5pDjU π ModelScope: https://t.co/9NGXcIdCWI π Qwen3.5-Flash API: https://t.co/82ESSpaqAF Try in Qwen Chat π Flash: https://t.co/UkTL3JZxIK 27B: https://t.co/haKxG4lETy 35B-A3B: https://t.co/Oc1lYSTbwh 122B-A10B: https://t.co/hBMODXmh1o Would love to hear what you build with it.

Most AI agents are amnesiacs. They solve a problem brilliantly, then forget everything the moment you close the terminal. Hermes flips this: it's an agent that writes down what it learns and gets better the longer you use it. It runs on a dedicated machine, remembers what it learns across sessions, and writes reusable skills when it solves hard problems. Most AI tools are stateless: they forget everything the moment you close the tab. This one accumulates knowledge like a human teammate. When Hermes solves a complex task, it documents the approach as a searchable skill that it can load automatically next time. Think of it as procedural memory: the agent doesn't just remember facts, it remembers how to do things. Because the agent lives on your server rather than running in a sandboxed cloud environment, it can: 1. Schedule tasks that run unattended (daily reports, nightly backups) 2. Maintain persistent state across messaging platforms 3. Spawn isolated subagents for parallel work 4. Access your actual filesystem and terminal with full context 5. Build up knowledge about your specific setup This unlocks use cases that browser-based assistants can't touch. An agent that knows your codebase, remembers how you like things formatted, and can kick off multi-hour workflows while you sleep. If this pattern catches on, the AI assistant market splits into two categories: disposable chatbots that reset every session, and persistent agents that compound value over time.
Most language models only read forward. Perplexity just open-sourced 4 models that read text in both directions. They used a technique from image generation to retrain Qwen3 so every word can see every other word in a passage. That changes how well a model understands meaning. They built four models from this: 1. Two sizes: 0.6B and 4B parameters 2. Two types: standard search embeddings and context-aware embeddings The context-aware version is the interesting one. It processes an entire document at once, so each small chunk "knows" what the full document is about. Standard embeddings treat each chunk in isolation. > Tops benchmarks for models of similar size > Works in multiple languages out of the box > MIT licensed, free for commercial use If you're building search over large document collections, you can now get document-level understanding without running a massive model. Small enough to actually deploy.
New research from Google DeepMind. What if LLMs could discover entirely new multi-agent learning algorithms? Designing algorithms for multi-agent systems is hard. Classic approaches like PSRO and counterfactual regret minimization took years of expert effort to develop. Each new game-theoretic setting often demands its own specialized solution. But what if you could automate the discovery process itself? This research uses LLMs to automatically generate novel multi-agent learning algorithms through iterative prompting and refinement. The LLM proposes algorithm pseudocode, which gets evaluated against game-theoretic benchmarks, and feedback drives the next iteration. LLMs have absorbed enough algorithmic knowledge from training to serve as creative search engines over the space of possible algorithms. They generate candidates that humans wouldn't think to try. The discovered algorithms achieve competitive performance against established hand-crafted baselines across multiple game-theoretic domains. This shifts algorithm design from manual expert craft to automated discovery. The same approach could generalize beyond games to any domain where we need novel optimization procedures. Paper: https://t.co/9AeQYo2LFS Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
New research from Google DeepMind. What if LLMs could discover entirely new multi-agent learning algorithms? Designing algorithms for multi-agent systems is hard. Classic approaches like PSRO and counterfactual regret minimization took years of expert effort to develop. Each new game-theoretic setting often demands its own specialized solution. But what if you could automate the discovery process itself? This research uses LLMs to automatically generate novel multi-agent learning algorithms through iterative prompting and refinement. The LLM proposes algorithm pseudocode, which gets evaluated against game-theoretic benchmarks, and feedback drives the next iteration. LLMs have absorbed enough algorithmic knowledge from training to serve as creative search engines over the space of possible algorithms. They generate candidates that humans wouldn't think to try. The discovered algorithms achieve competitive performance against established hand-crafted baselines across multiple game-theoretic domains. This shifts algorithm design from manual expert craft to automated discovery. The same approach could generalize beyond games to any domain where we need novel optimization procedures. Paper: https://t.co/9AeQYo2LFS Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
Important survey on agentic memory systems. Memory is one of the most critical components of AI agents. It enables LLM agents to maintain state across long interactions, supporting long-horizon reasoning and personalization beyond fixed context windows. But the empirical foundations of these systems remain fragile. This new survey presents a structured analysis of agentic memory from both architectural and system perspectives. The authors introduce a taxonomy based on four core memory structures and then systematically analyze the pain points limiting current systems. What did they find? Existing benchmarks are underscaled and often saturated. Evaluation metrics are misaligned with semantic utility. Performance varies significantly across backbone models. And the latency and throughput overhead introduced by memory maintenance is frequently overlooked. Current agentic memory systems often underperform their theoretical promise because evaluation and architecture are studied in isolation. As agents take on longer, more complex tasks, memory becomes the bottleneck. This survey clarifies where current systems fall short and outlines directions for more reliable evaluation and scalable memory design. Paper: https://t.co/xNGTbVVhq9 Learn to build effective AI agents in our academy: https://t.co/LRnpZN7deE

New research from Google DeepMind. Really interesting paper on diffusion models. Training good latents for diffusion models is harder than it looks. The standard approach uses a KL penalty borrowed from VAEs, with no principled way to control how much information actually lives in the latent space. This new research introduces Unified Latents (UL), a framework that co-trains a diffusion prior on the latents. This provides a tight upper bound on latent bitrate and makes the reconstruction-generation tradeoff explicit and, most importantly, tunable. On ImageNet-512, UL achieves FID 1.4 while requiring fewer training FLOPs than Stable Diffusion latents. On Kinetics-600, it sets a new state-of-the-art FVD of 1.3 for video generation. The latent space is one of the most undertreated design decisions in diffusion-based generation. UL gives practitioners a principled handle on it, for both images and video. Paper: https://t.co/E1HCf9QzB4
New research from Google DeepMind. Really interesting paper on diffusion models. Training good latents for diffusion models is harder than it looks. The standard approach uses a KL penalty borrowed from VAEs, with no principled way to control how much information actually lives in the latent space. This new research introduces Unified Latents (UL), a framework that co-trains a diffusion prior on the latents. This provides a tight upper bound on latent bitrate and makes the reconstruction-generation tradeoff explicit and, most importantly, tunable. On ImageNet-512, UL achieves FID 1.4 while requiring fewer training FLOPs than Stable Diffusion latents. On Kinetics-600, it sets a new state-of-the-art FVD of 1.3 for video generation. The latent space is one of the most undertreated design decisions in diffusion-based generation. UL gives practitioners a principled handle on it, for both images and video. Paper: https://t.co/E1HCf9QzB4
Check out @ShenyuanGao's technical deep dive: https://t.co/DnEGLzGuJV
We have seen rapid progress in humanoid control β specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work. To progress toward this goal, we developed SONIC (https://t.co/zOZVraFuDV), a Behavior Foundation Model for real-time, whole-body motion generation that supports teleoperation and VLA inference for loco-manipulation. Today, weβre open-sourcing SONIC on GitHub. We are excited to see what the community builds upon SONIC and to collectively push humanoid intelligence toward real-world deployment at scale. π Paper: https://t.co/DGBP7LAvuT π Code: https://t.co/WAZ1P13072

And @yukez 's announcement: https://t.co/38IhxYX1tZ