Your curated collection of saved posts and media

Showing 24 posts Β· last 30 days Β· by score
C
code
@code
πŸ“…
Feb 26, 2026
12d ago
πŸ†”62747526

Next edit suggestions just leveled-up in @code: with long-distance NES, you get edit suggestions anywhere in your file, not just near your cursor's position. Learn how the team built this - creating the training dataset, refining the UX, evaluating success, & more: https://t.co/xDaJRpikCi

Media 1Media 2
πŸ–ΌοΈ Media
M
Mid0
@Mid0
πŸ“…
Feb 09, 2026
29d ago
πŸ†”21143720
⭐0.40

in the 1 hour downtime of github, I finally got claude code teams setup with tmux. @AnthropicAI I finally have something to compare it with @augmentcode Intent. Both use tons of tokens but help you orchestrate discover-plan-build-eval-verify-precommit-commit-submit pr loops.

M
Mid0
@Mid0
πŸ“…
Feb 17, 2026
21d ago
πŸ†”96738977
⭐0.36

@theo Can we get a deeper dive on the Claude code CLI & codex app. Automations angle. I like codex for defined features & tests and can have it loop and push pr. I run Claude code in tmux with teams which allow for parallelizion but also human intervention..

D
demishassabis
@demishassabis
πŸ“…
Feb 19, 2026
19d ago
πŸ†”76177645

Excited to launch Gemini 3.1 Pro! Major improvements across the board including in core reasoning and problem solving. For example scoring 77.1% on the ARC-AGI-2 benchmark - more than 2x the performance of 3 Pro. Rolling out today in @GeminiApp, @antigravity and more - enjoy! https://t.co/hOgEFtJ57w

Media 1
πŸ–ΌοΈ Media
B
BaadeAlan
@BaadeAlan
πŸ“…
Feb 17, 2026
21d ago
πŸ†”93052072

What's the right space to diffuse in: Raw Data or Latents? Why not both! In Latent Forcing, we order a joint diffusion trajectory to reveal Latents before Pixels, leading to improved convergence while being lossless at encoding and end-to-end at inference. w/ @drfeifei+... 1/n https://t.co/UQVUJOqvWz

Media 1
πŸ–ΌοΈ Media
O
ollama
@ollama
πŸ“…
Feb 26, 2026
12d ago
πŸ†”42532961

Ollama can now launch Pi, a minimal coding agent which you can customize for your workflow ollama launch pi You can even ask pi to write extensions for itself https://t.co/hlUYnA3vl4

πŸ–ΌοΈ Media
Y
YinjieW2024
@YinjieW2024
πŸ“…
Feb 26, 2026
12d ago
πŸ†”03363837

Train your 🦞@openclaw simply by talking to it. Meet OpenClaw-RL. Host your model on our RL server, and your LLM gets optimized automatically. Use it anywhere. Keep it private. Make it more personal every day. We have fully open sourced everything. Come in and have fun!

πŸ–ΌοΈ Media
A
Ali_TongyiLab
@Ali_TongyiLab
πŸ“…
Feb 28, 2026
10d ago
πŸ†”36473199

1/4 We are thrilled to announce that CoPaw is now open source! After an incredible wave of feedback, our team has completely overhauled the engine to give you full control over your personal AI partner. Key Highlights: Ultimate Model Freedom Local-First: Full native support for Ollama, llama.cpp, and MLX (Apple Silicon). Bring Your Own Model: Easily add/remove custom model providers or private API endpoints. Your data, your choice. Smarter Long-Term Memory No more "amnesia." CoPaw remembers your preferences and tasks. New Local Mode: Use vector search without complex database installsβ€”now fully compatible with Windows for an out-of-the-box experience. Modular "Lego-Like" Architecture Skill Hub Integration: Import skills from community hubs like ClawHub with one command. Agentic Workflow: Modularized Prompts, Hooks, and Tools. Supports MCP (Model Context Protocol) hot-swappingβ€”expand capabilities without restarting. Proactive Multi-Channel Connection Connect to DingTalk, Feishu, Discord, iMessage, and more. A new standardized protocol makes it easier than ever to build your own channel plugins.

Media 1
πŸ–ΌοΈ Media
A
ArtificialAnlys
@ArtificialAnlys
πŸ“…
Feb 27, 2026
11d ago
πŸ†”97777245

Alibaba has expanded its Qwen3.5 model family with 3 new models - the 27B model is a standout, scoring 42 on the Artificial Analysis Intelligence Index and matching open weights models 8-25x its size @Alibaba_Qwen has expanded the Qwen3.5 family with three new models alongside the 397B flagship released earlier this month: the Qwen3.5 27B (Dense, scoring 42 on Intelligence Index), Qwen3.5 122B A10B (MoE, 42), and Qwen3.5 35B A3B (MoE, 37). The two MoE (Mixture-of-Experts) models only activate a fraction of the total parameters per forward pass (10B of 122B and ~3B of 35B respectively). The Intelligence Index is our synthesis metric incorporating 10 evaluations covering general reasoning, agentic tasks, coding, and scientific reasoning. All models are Apache 2.0 licensed, natively support 262K context, and return to the unified thinking/non-thinking hybrid architecture from the original Qwen3, after Alibaba moved to separate Instruct and Reasoning checkpoints with the Qwen3 2507 updates. Key benchmarking results for the reasoning variants: ➀ Qwen3.5 27B scores 42 on Intelligence Index and is the most intelligent model under 230B. The nearest model of similar size is GLM-4.7-Flash (31B total, 3B active) which scores 30. Open weights models of equivalent intelligence are 8-25x larger in terms of total parameters: MiniMax-M2.5 (230B, 42), DeepSeek V3.2 (685B, 42), and GLM-4.7 (357B, 42). In FP8 precision it takes ~27GB to store the model weights, while in 4-bit quantization you can use laptop quality hardware with 16GB+ of RAM ➀ Qwen3.5 27B scores 1205 on GDPval-AA (Agentic Real-World Work Tasks), placing it alongside larger models. For context, MiniMax-M2.5 scores 1206, GLM-4.7 (Reasoning) scores 1200, and DeepSeek V3.2 (Reasoning) scores 1194. This is particularly notable for a 27B parameter model and suggests strong agentic capability for its size. GDPval-AA tests models on real-world tasks across 44 occupations and 9 major industries ➀ AA-Omniscience remains a relative weakness across the Qwen3.5 family, driven primarily by lower accuracy rather than hallucination rate. Qwen3.5 27B scores -42 on AA-Omniscience, comparable to MiniMax-M2.5 (-40) but behind DeepSeek V3.2 (-21) and GLM-4.7 (-35). Although Qwen3.5 27B's hallucination rate (80%) is lower than peers (GLM-4.7 90%, MiniMax 89%, DeepSeek 82%), its accuracy is also lower at 21% vs 34% for DeepSeek V3.2 and 29% for GLM-4.7. This is likely a consequence of model size - we have generally observed that models with more total parameters perform better on accuracy in AA-Omniscience, as broader knowledge recall benefits from larger parameter counts ➀ Qwen3.5 27B is equivalently intelligent to Qwen3.5 122B A10B. The 122B A10B is a Mixture-of-Experts model that only activates 10B of its 122B total parameters per forward pass. The 27B model leads in GDPval-AA (1205 Elo vs 1145 Elo) and slightly on TerminalBench (+1.5 p.p.), while the 122B model leads on SciCode (+2.5 p.p.), HLE (+1.2 p.p.), and has a lower hallucination rate (Omniscience -40 vs -42) ➀ Qwen3.5 35B A3B (Reasoning, 37) is the most intelligent model with ~3B active parameters, 7 points ahead of GLM-4.7-Flash (30). Other models in this ~3B active category include Qwen3 Coder Next (80B total, 28), Qwen3 Next 80B A3B (27), and NVIDIA Nemotron 3 Nano 30B A3B (24) ➀ Qwen3.5 27B used 98M output tokens to run the Intelligence Index, costing ~$299 via Alibaba Cloud API. This is notably high token usage compared to models at similar intelligence: MiniMax-M2.5 (56M), DeepSeek V3.2 (61M), and even the larger Qwen3.5 397B (86M). Other information: ➀ Context window: 262K tokens (extendable to 1M via YaRN) ➀ License: Apache 2.0 ➀ API pricing (Alibaba Cloud): 397B: $0.60/$3.60, 122B: $0.40/$3.20, 27B: $0.30/$2.40, 35B A3B: $0.25/$2.00 per 1M input/output tokens

Media 1
πŸ–ΌοΈ Media
πŸ”ai_fast_track retweeted
G
GitHub Projects Community
@GithubProjects
πŸ“…
Mar 01, 2026
9d ago
πŸ†”48494804
⭐0.34

High-performance browser control for AI agents. Pinchtab is a lightweight (12MB) Go binary that runs Chrome and exposes a plain HTTP API so any agent or script can navigate web pages, read text efficiently, click/type interactively, and persist sessions. Zero config, framework-agnostic, token-efficient.

❀️2,311
likes
πŸ”213
retweets
G
geohotarchive
@geohotarchive
πŸ“…
Feb 03, 2026
34d ago
πŸ†”49396917

#youtube George Hotz | Programming | how I actually use agentic coding | Agentic AI https://t.co/rrloPO5Vif

Media 1
πŸ–ΌοΈ Media
J
johnrobinsn
@johnrobinsn
πŸ“…
Feb 08, 2026
30d ago
πŸ†”25045738

Use my tscribe tool to easily transcribe X or Youtube videos... Great way to get transcripts into your claude code sessions... Simply... tscribe transcribe <video url> then just dump it to stdout (lots more you can do too...) tscribe dump

Media 1
πŸ–ΌοΈ Media
_
_inception_ai
@_inception_ai
πŸ“…
Feb 24, 2026
14d ago
πŸ†”43409933

Mercury 2 is live. The world's first reasoning diffusion LLM – 5x faster than leading speed-optimized autoregressive models. Built for production: multi-step agents without delays, voice AI with tight latency budgets, instant coding feedback. Diffusion-based generation enables parallel refinement, not sequential tokens. Faster. More controllable. Dramatically lower inference cost. Available today on the Inception API. @dinabass has the story in @business.

Media 1
πŸ–ΌοΈ Media
A
Alibaba_Qwen
@Alibaba_Qwen
πŸ“…
Feb 24, 2026
14d ago
πŸ†”30188939

πŸš€ Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash Β· Qwen3.5-35B-A3B Β· Qwen3.5-122B-A10B Β· Qwen3.5-27B ✨ More intelligence, less compute. β€’ Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B β€” a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. β€’ Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models β€” especially in more complex agent scenarios. β€’ Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: – 1M context length by default – Official built-in tools πŸ”— Hugging Face: https://t.co/wFMdX5pDjU πŸ”— ModelScope: https://t.co/9NGXcIdCWI πŸ”— Qwen3.5-Flash API: https://t.co/82ESSpaqAF Try in Qwen Chat πŸ‘‡ Flash: https://t.co/UkTL3JZxIK 27B: https://t.co/haKxG4lETy 35B-A3B: https://t.co/Oc1lYSTbwh 122B-A10B: https://t.co/hBMODXmh1o Would love to hear what you build with it.

Media 1Media 2
πŸ–ΌοΈ Media
L
LiorOnAI
@LiorOnAI
πŸ“…
Feb 25, 2026
13d ago
πŸ†”45341310
⭐0.42

Most AI agents are amnesiacs. They solve a problem brilliantly, then forget everything the moment you close the terminal. Hermes flips this: it's an agent that writes down what it learns and gets better the longer you use it. It runs on a dedicated machine, remembers what it learns across sessions, and writes reusable skills when it solves hard problems. Most AI tools are stateless: they forget everything the moment you close the tab. This one accumulates knowledge like a human teammate. When Hermes solves a complex task, it documents the approach as a searchable skill that it can load automatically next time. Think of it as procedural memory: the agent doesn't just remember facts, it remembers how to do things. Because the agent lives on your server rather than running in a sandboxed cloud environment, it can: 1. Schedule tasks that run unattended (daily reports, nightly backups) 2. Maintain persistent state across messaging platforms 3. Spawn isolated subagents for parallel work 4. Access your actual filesystem and terminal with full context 5. Build up knowledge about your specific setup This unlocks use cases that browser-based assistants can't touch. An agent that knows your codebase, remembers how you like things formatted, and can kick off multi-hour workflows while you sleep. If this pattern catches on, the AI assistant market splits into two categories: disposable chatbots that reset every session, and persistent agents that compound value over time.

L
LiorOnAI
@LiorOnAI
πŸ“…
Feb 27, 2026
11d ago
πŸ†”52900129

Most language models only read forward. Perplexity just open-sourced 4 models that read text in both directions. They used a technique from image generation to retrain Qwen3 so every word can see every other word in a passage. That changes how well a model understands meaning. They built four models from this: 1. Two sizes: 0.6B and 4B parameters 2. Two types: standard search embeddings and context-aware embeddings The context-aware version is the interesting one. It processes an entire document at once, so each small chunk "knows" what the full document is about. Standard embeddings treat each chunk in isolation. > Tops benchmarks for models of similar size > Works in multiple languages out of the box > MIT licensed, free for commercial use If you're building search over large document collections, you can now get document-level understanding without running a massive model. Small enough to actually deploy.

Media 1
πŸ–ΌοΈ Media
O
omarsar0
@omarsar0
πŸ“…
Feb 23, 2026
15d ago
πŸ†”40742150

New research from Google DeepMind. What if LLMs could discover entirely new multi-agent learning algorithms? Designing algorithms for multi-agent systems is hard. Classic approaches like PSRO and counterfactual regret minimization took years of expert effort to develop. Each new game-theoretic setting often demands its own specialized solution. But what if you could automate the discovery process itself? This research uses LLMs to automatically generate novel multi-agent learning algorithms through iterative prompting and refinement. The LLM proposes algorithm pseudocode, which gets evaluated against game-theoretic benchmarks, and feedback drives the next iteration. LLMs have absorbed enough algorithmic knowledge from training to serve as creative search engines over the space of possible algorithms. They generate candidates that humans wouldn't think to try. The discovered algorithms achieve competitive performance against established hand-crafted baselines across multiple game-theoretic domains. This shifts algorithm design from manual expert craft to automated discovery. The same approach could generalize beyond games to any domain where we need novel optimization procedures. Paper: https://t.co/9AeQYo2LFS Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

Media 1
πŸ–ΌοΈ Media
πŸ”dair_ai retweeted
O
elvis
@omarsar0
πŸ“…
Feb 23, 2026
15d ago
πŸ†”40742150
⭐0.38

New research from Google DeepMind. What if LLMs could discover entirely new multi-agent learning algorithms? Designing algorithms for multi-agent systems is hard. Classic approaches like PSRO and counterfactual regret minimization took years of expert effort to develop. Each new game-theoretic setting often demands its own specialized solution. But what if you could automate the discovery process itself? This research uses LLMs to automatically generate novel multi-agent learning algorithms through iterative prompting and refinement. The LLM proposes algorithm pseudocode, which gets evaluated against game-theoretic benchmarks, and feedback drives the next iteration. LLMs have absorbed enough algorithmic knowledge from training to serve as creative search engines over the space of possible algorithms. They generate candidates that humans wouldn't think to try. The discovered algorithms achieve competitive performance against established hand-crafted baselines across multiple game-theoretic domains. This shifts algorithm design from manual expert craft to automated discovery. The same approach could generalize beyond games to any domain where we need novel optimization procedures. Paper: https://t.co/9AeQYo2LFS Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

❀️380
likes
πŸ”80
retweets
D
dair_ai
@dair_ai
πŸ“…
Feb 24, 2026
14d ago
πŸ†”40569951

Important survey on agentic memory systems. Memory is one of the most critical components of AI agents. It enables LLM agents to maintain state across long interactions, supporting long-horizon reasoning and personalization beyond fixed context windows. But the empirical foundations of these systems remain fragile. This new survey presents a structured analysis of agentic memory from both architectural and system perspectives. The authors introduce a taxonomy based on four core memory structures and then systematically analyze the pain points limiting current systems. What did they find? Existing benchmarks are underscaled and often saturated. Evaluation metrics are misaligned with semantic utility. Performance varies significantly across backbone models. And the latency and throughput overhead introduced by memory maintenance is frequently overlooked. Current agentic memory systems often underperform their theoretical promise because evaluation and architecture are studied in isolation. As agents take on longer, more complex tasks, memory becomes the bottleneck. This survey clarifies where current systems fall short and outlines directions for more reliable evaluation and scalable memory design. Paper: https://t.co/xNGTbVVhq9 Learn to build effective AI agents in our academy: https://t.co/LRnpZN7deE

Media 1Media 2
πŸ–ΌοΈ Media
O
omarsar0
@omarsar0
πŸ“…
Feb 25, 2026
13d ago
πŸ†”46254107

New research from Google DeepMind. Really interesting paper on diffusion models. Training good latents for diffusion models is harder than it looks. The standard approach uses a KL penalty borrowed from VAEs, with no principled way to control how much information actually lives in the latent space. This new research introduces Unified Latents (UL), a framework that co-trains a diffusion prior on the latents. This provides a tight upper bound on latent bitrate and makes the reconstruction-generation tradeoff explicit and, most importantly, tunable. On ImageNet-512, UL achieves FID 1.4 while requiring fewer training FLOPs than Stable Diffusion latents. On Kinetics-600, it sets a new state-of-the-art FVD of 1.3 for video generation. The latent space is one of the most undertreated design decisions in diffusion-based generation. UL gives practitioners a principled handle on it, for both images and video. Paper: https://t.co/E1HCf9QzB4

Media 1
πŸ–ΌοΈ Media
πŸ”dair_ai retweeted
O
elvis
@omarsar0
πŸ“…
Feb 25, 2026
13d ago
πŸ†”46254107
⭐0.38

New research from Google DeepMind. Really interesting paper on diffusion models. Training good latents for diffusion models is harder than it looks. The standard approach uses a KL penalty borrowed from VAEs, with no principled way to control how much information actually lives in the latent space. This new research introduces Unified Latents (UL), a framework that co-trains a diffusion prior on the latents. This provides a tight upper bound on latent bitrate and makes the reconstruction-generation tradeoff explicit and, most importantly, tunable. On ImageNet-512, UL achieves FID 1.4 while requiring fewer training FLOPs than Stable Diffusion latents. On Kinetics-600, it sets a new state-of-the-art FVD of 1.3 for video generation. The latent space is one of the most undertreated design decisions in diffusion-based generation. UL gives practitioners a principled handle on it, for both images and video. Paper: https://t.co/E1HCf9QzB4

❀️159
likes
πŸ”34
retweets
D
DrJimFan
@DrJimFan
πŸ“…
Feb 20, 2026
18d ago
πŸ†”80910046
⭐0.34

Check out @ShenyuanGao's technical deep dive: https://t.co/DnEGLzGuJV

Y
yukez
@yukez
πŸ“…
Feb 20, 2026
18d ago
πŸ†”88857707

We have seen rapid progress in humanoid control β€” specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work. To progress toward this goal, we developed SONIC (https://t.co/zOZVraFuDV), a Behavior Foundation Model for real-time, whole-body motion generation that supports teleoperation and VLA inference for loco-manipulation. Today, we’re open-sourcing SONIC on GitHub. We are excited to see what the community builds upon SONIC and to collectively push humanoid intelligence toward real-world deployment at scale. 🌐 Paper: https://t.co/DGBP7LAvuT πŸ“ƒ Code: https://t.co/WAZ1P13072

Media 1Media 2
πŸ–ΌοΈ Media
D
DrJimFan
@DrJimFan
πŸ“…
Feb 24, 2026
14d ago
πŸ†”66493831
⭐0.34

And @yukez 's announcement: https://t.co/38IhxYX1tZ