Your curated collection of saved posts and media

Showing 32 posts · last 7 days · newest first
G
github
@github
📅
Apr 18, 2026
5d ago
🆔05856202

What happens when you build a GitHub CLI extension with Copilot CLI? P̶r̶o̶g̶r̶a̶m̶ ̶M̶a̶n̶a̶g̶e̶r̶ Dungeon Master @leereilly created one that turns your repo into a dungeon ... with procedurally generated levels and bugs that fight back. ⚔️ Here's how to enter the battle. ▶️ https://t.co/XAX9IRKAuK

🖼️ Media
H
HuggingPapers
@HuggingPapers
📅
Apr 18, 2026
5d ago
🆔41938424

NVIDIA releases Lyra 2.0 on Hugging Face A framework for generating persistent, explorable 3D worlds at scale by solving spatial forgetting and temporal drifting in long-horizon video generation. https://t.co/M9kYHhIJ6c

🖼️ Media
H
HamelHusain
@HamelHusain
📅
Apr 18, 2026
5d ago
🆔08912317
0.38

Lots of people asking what’s so good about computer use. Here’s 5 things that come to mind 1. operate Mac Apps without a great API: Slack, Google Sheets, Notes, IMessage without installing separate plugins. It instantly transforms all your apps into tools 2. If you need to operate your browser more visually it works really smoothly and fast (good for sites that are still human centric) 3. It uses its own cursor, keyboard etc so you can keep working. 4. Once you do any task once you can simply ask Codex to reflect on what it did and how it would accomplish the task next time with the benefit of hindsight and create a skill AND schedule an automation. It’s really nice that codex can just schedule and edit automations when asked! it’s very Claw like in this way. This last point is not computer use specific but is powerful when combined with computer use 5. The UI polish is insane: you get nice icons for any application you want to tag into computer use plus all the other built in new stuff like built in file viewer and browser so there is no context switching. So you can iterate really fast and not lose focus. Because of the polish it also feels nice and delightful to use.

@HamelHusain • Fri Apr 17 17:24

Seriously stop everything you are doing and use codex desktop app new computer use. Absolutely mind blowing

M
mark_k
@mark_k
📅
Apr 18, 2026
5d ago
🆔66165330
0.34

Both Gemini and Grok are underrated. I used to champion Gemini for a while, but lately I've been very happy with Grok, especially since the Grok 4.20 release with multi-agent. And now we have Grok 4.3!

@droidbuilds • Sat Apr 18 04:45

Which one is the most underrated here? https://t.co/zJzBMnfopY

G
GitTrend0x
@GitTrend0x
📅
Apr 18, 2026
5d ago
🆔06928613

Hermes 一丢 Agent,全网又卷出 5 个新进化体! Nous Research 的 hermes-agent(96k+ stars)底层太能打了:持久记忆 + 自动提炼技能 + 跨会话成长,社区直接当 DNA 疯狂 remix。 Atlas 上已经 90+ 项目,这次我挑了5 个最新最炸的进化体(全避开老面孔),AI 玩家看了会沉默,Agent 爱好者看了会原地 fork: 1️⃣ hermes-webui(https://t.co/EYpeGa1g4w) 浏览器/Web + 手机端 UI,把 Hermes 进化成“随时随地指挥”的 Web 版。暗黑风响应式,X 上直呼“手机党终于翻身了!” 2️⃣ hermes-dashboard(https://t.co/Y3insgftNF) 实时 Web 监控台 + 自动 Wiki:多 Agent 并行、工具调用、灵魂状态一屏看尽,还把 memory 自动转知识库。生产环境标配,Atlas 安全审查通过! 3️⃣ hermesclaw(https://t.co/WJzA56oSjv) 官方主 repo 重点推的 WeChat 桥接器,让 Hermes 和 OpenClaw 共用一个微信号,双向无缝。中文玩家狂喜:“微信生态直接被 Hermes 入侵!” 4️⃣ hermes-hudui(https://t.co/V3AZcUQCVK) 原 TUI HUD 的 Web 进化版,浏览器里实时看 Agent “在想啥”、持久记忆流动。灵魂监控 2.0,推特名场面:“终于看到 Agent 的内心戏了”。 5️⃣ awesome-hermes-agent(https://t.co/2QVFSb4Dlb) 社区维护的“进化树索引”:技能、插件、集成、教程全收录(已近 900 stars)。Atlas 列为 Guides 头牌,想 fork 就从这里抄作业。 🟢 为什么这些新项目这么爆? Hermes 从不给你黑箱,它给的是可 hack、可扩展、可自我改进的底层循环。你 fork 它,不是在用工具,而是在和 Agent 一起递归长大。 2026 年开源 Agent 的正确姿势,就是把 Hermes 当骨架,自己长肉!

@GitTrend0x • Fri Apr 17 14:10

Hermes 一丢 Agent,全网程序员集体进化了! Nous Research 扔出 hermes-agent(90k+ stars),核心就一个词:自我进化。它不是玩具,而是带持久记忆、自动提炼技能、跨会话成长的底层骨架。 结果?社区直接把它当 DNA,短短几周卷出 80+ 进化体,生态总星 10 万+。这才是开源的最高境界:一个 Agent,变成全世界的进化树。 我挑了 4 个正在 X 上刷屏的 “Hermes 系进化体”,AI 玩家看了会沉默,爱好者看了会狂喜: 1️⃣ Hermes Ecosystem Map / Hermes Atl

Media 1Media 2
+3 more
🖼️ Media
🔁SpirosMargaris retweeted
S
spark
@sparkjsdev
📅
Apr 14, 2026
9d ago
🆔82816449
0.32

Spark 2.0 is here! 🚀 We’re redefining what’s possible on the web with a streamable LoD system for 3D Gaussian Splatting. Built on Three.js, you can now stream massive 100M+ splat worlds to any device from mobile to VR using WebGL2. All open-source. Dive into the tech 👇 https://t.co/VOd6V0Wz1s

❤️2,010
likes
🔁314
retweets
G
GenAI_is_real
@GenAI_is_real
📅
Apr 18, 2026
5d ago
🆔33936633
0.42

as someone who works on making LLMs run faster and cheaper every day, i can confidently say the question of whether theyre conscious has exactly zero impact on whether theyre useful. we dont need our inference stack to be conscious, we need it to be correct, fast, and affordable. the consciousness debate is fascinating philosophy but its a distraction from the actual engineering problems that determine whether AI creates value. the gravity formula doesnt need to exert weight to help you build a bridge @Hesamation

@

🔁jxnlco retweeted
M
Mercor
@mercor_ai
📅
Apr 17, 2026
6d ago
🆔85510894
0.38

We ran @AnthropicAI Claude Opus 4.7 (High) on APEX-SWE, our benchmark for real-world software engineering work. It scores 41.3% pass@1, placing 2nd on the leaderboard. It is only 0.2% away from GPT 5.3 Codex (High). https://t.co/4U6pKtrvI0

❤️84
likes
🔁2
retweets
L
Lonely__MH
@Lonely__MH
📅
Apr 18, 2026
5d ago
🆔25978571

🚀我靠!Ollama 原生支持 Hermes Agent 了! 一行命令就搞定: ollama launch hermes 本地部署,简直爽歪歪😂 检测本地可以跑什么模型可以用 llmfit 或者在线检测👉 https://t.co/qVYlJomz2Q

@ollama • Fri Apr 17 23:26

ollama launch hermes Ollama 0.21 includes supports Hermes Agent, the self-improving AI agent built by @NousResearch.

Media 1
🖼️ Media
A
AnjneyMidha
@AnjneyMidha
📅
Apr 17, 2026
6d ago
🆔21556533
0.40

*New Lecture* Stanford @CS153Systems '26, Session 5 (Full Video) Unified Intelligence with Amit Jain (@gravicle) from @LumaLabsAI 01:32 Luma's Origin Story 05:33 Differentiable World Learning 06:36 From 3D Capture To Video 10:40 Dream Machine Flywheel 13:48 Inside The Luma Factory 23:04 Unified Models Explained 32:29 Future Architectures 34:02 Skills and Tools 41:04 Creativity and Exploration 43:03 Sora Shutdown 47:08 GANs Diffusion and Hybrids 51:19 Hollywood Business Model Reset 55:01 What Makes Video Models Useful

🔁random_walker retweeted
G
Gillian Hadfield
@ghadfield
📅
Apr 17, 2026
6d ago
🆔29570505
0.34

Glad to be a part of this initiative to develop open-world evaluations for AI. We need the ability to assess just how capable agents are becoming in order to anticipate and respond to the impact they can have on real world systems and transactions. An agent that can successfully act on the general instruction “build an app and get it posted in the App Store” is one that brings us closer to an economy of agents, with significant implications for how markets behave and need regulating https://t.co/JIJ7fSydiT

❤️6
likes
🔁3
retweets
🔁random_walker retweeted
P
Peter Kirgis
@PKirgis
📅
Apr 17, 2026
6d ago
🆔49231354
0.34

Yesterday, we announced CRUX, a project that aims to conduct regular “open-world evaluations,” where we will be testing the ability of AI agents to complete long-horizon tasks in messy, real-world environments. @sayashk's post dives into the details; here are a few of my own thoughts about why this is worth doing.

🔁2
retweets
P
PKirgis
@PKirgis
📅
Apr 17, 2026
6d ago
🆔49231354
0.36

Yesterday, we announced CRUX, a project that aims to conduct regular “open-world evaluations,” where we will be testing the ability of AI agents to complete long-horizon tasks in messy, real-world environments. @sayashk's post dives into the details; here are a few of my own thoughts about why this is worth doing.

@sayashk • Thu Apr 16 17:49

Benchmarks are saturated more quickly than ever. How should frontier AI evaluations evolve? In a new paper, we argue that the AI community is already converging on an answer: Open-world evaluations. They are long, messy, real-world tasks that would be impractical for benchmarks.

A
ashen_one
@ashen_one
📅
Apr 17, 2026
6d ago
🆔12381770

okok we officially have GLM 5.1 running on a 256gb mac studio with hermes agent next is linking it to hermes to see how good it is 🗣️ https://t.co/BQTlLiL3jm

Media 1
🖼️ Media
W
winglian
@winglian
📅
Apr 17, 2026
6d ago
🆔07519134

@togethercompute @realDanFu But looking at the reported metrics, the looped 770M model isn’t really close to the 1.3B model. https://t.co/NkFfNjj8eR

Media 1
🖼️ Media
🔁unknown_user retweeted
U
unknown_user
@unknown_user
📅
Apr 17, 2026
6d ago
🆔20431022
0.36

This paper makes a strong case for open-world evaluations as a complement to traditional benchmarks, particularly for realistic, long-horizon, open-ended settings! Glad the AISI SoE team could contribute to this effort.

❤️18
likes
🔁5
retweets
H
HedgieMarkets
@HedgieMarkets
📅
Apr 17, 2026
6d ago
🆔69546306

🦔Goldman Sachs reports that companies are blowing past their AI inference budgets by orders of magnitude, with inference costs in engineering now approaching 10% of total headcount costs and potentially reaching parity with salaries within several quarters. KPMG surveyed 2,100 senior leaders and found US companies plan to spend an average of $178 million on AI over the next 12 months, with Asia-Pacific firms budgeting $245 million and EMEA $157 million. The two reports together show companies are spending more than planned and intend to spend even more. My Take Inference costs approaching headcount parity is an extraordinary number that most finance teams did not model when they approved their AI strategies twelve months ago. The compute crunch, electrical component shortages, and GPU spot prices up 48% in two months are all flowing into corporate operating costs faster than anyone budgeted for, and Goldman's trajectory suggests it accelerates from here. What I find hard to reconcile is that $178 million average sitting alongside enterprise data showing eight in ten workers are either avoiding AI tools or not using them at all. Companies are committing to nine-figure inference budgets while their own employees aren't using what's already been deployed. I've watched this dynamic build all year and my honest read is that a significant portion of this spending is driven by competitive fear rather than demonstrated returns. Nobody wants to be the company that didn't invest in AI when everyone else did. That's how bubbles get funded, and at some point boards are going to demand a number that justifies it. Hedgie🤗

Media 1
🖼️ Media
M
Modular
@Modular
📅
Apr 17, 2026
6d ago
🆔83832691
0.40

TileTensor is Mojo's new tensor type for GPU kernels. The short version: fully static layouts = 8-byte runtime footprint = less register pressure. We saw 5% throughput gains on @AMD MI300X MHA just from the type change. Our Part 1 blog post covers the design and how it compares to CuTe. Part 2 will cover the Mojo internals that made it possible. https://t.co/0iaHqi0fda

🔁_akhaliq retweeted
W
Siyuan Hu
@who_s_yuan
📅
Apr 17, 2026
6d ago
🆔38134025

Thank you @_akhaliq for sharing our work! We also have a 24/7 live stream for GameWorld: https://t.co/eFfD4a437G. Watch the agents play games in real time.🕹️

Media 1
❤️2
likes
🔁1
retweets
🖼️ Media
🔁Sanemavcil retweeted
T
Tom Dörr
@tom_doerr
📅
Apr 17, 2026
6d ago
🆔18317218

Offline-first AI agent for Raspberry Pi https://t.co/iapUnKRhXI https://t.co/FtE8vK8kSu

Media 1
❤️69
likes
🔁9
retweets
🖼️ Media
🔁_akhaliq retweeted
V
Victor M
@victormustar
📅
Apr 17, 2026
6d ago
🆔46958899
0.32

Sharing my current setup to run Qwen3.6 locally in a good agentic setup (Pi + llama.cpp). Should give you a good overview of how good local agents are today: # Start llama.cpp server: llama-server \ -hf unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_XL \ --jinja \ --chat-template-kwargs '{"preserve_thinking":true}' \ --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0 # Configure Pi: { "providers": { "llama-cpp": { "baseUrl": "http://127.0.0.1:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_XL" } ] } } }

❤️243
likes
🔁35
retweets
🔁SpirosMargaris retweeted
A
Antonio Vieira Santos
@AkwyZ
📅
Apr 15, 2026
8d ago
🆔87065813
0.38

The Strange Origin of AI’s ‘Reasoning’ Abilities https://t.co/lXyZw8U4u4 #TechNews @ArturHabant @elaniazito @IanLJones98 @CurieuxExplorer @Shi4Tech @enilev @Fabriziobustama @mvollmer1 @AnthonyRochand @JolaBurnett @lyakovet @debashis_dutta @3itcom @ahier @Analytics_699 @antgrasso @CathCervoni @chidambara09 @DigitalColmer @dinisguarda @DimitriHommel @EvanKirstel @FrRonconi @GlenGilmore @gvalan @HeinzVHoenen @ipfconline1 @jeancayeux @jorgecunha @kalydeoo @nafisalam @Nicochan33 @pierrepinna @PawlowskiMario @puneetsinghal22 @ralph_ohr @RLDI_Lamy @rshevlin @sarbjeetjohal @SpirosMargaris @StefanoDeCupis @tewoz @thomas_dettling @Ym78200 @aure79lien @jblefevre60

❤️18
likes
🔁16
retweets
W
winglian
@winglian
📅
Apr 17, 2026
6d ago
🆔90116000
0.32

@davidpwalter @genclone @__tinygrad__ Are the bugs on the tinygrad side or just your local implementation?

W
who_s_yuan
@who_s_yuan
📅
Apr 17, 2026
6d ago
🆔38134025

Thank you @_akhaliq for sharing our work! We also have a 24/7 live stream for GameWorld: https://t.co/eFfD4a437G. Watch the agents play games in real time.🕹️

@_akhaliq • Thu Apr 16 15:52

GameWorld Towards Standardized and Verifiable Evaluation of Multimodal Game Agents paper: https://t.co/IfbTgfNnSM https://t.co/gL3BURxzkV

Media 1
🖼️ Media
T
tetsuoai
@tetsuoai
📅
Apr 17, 2026
6d ago
🆔05260148

grok 4.3 beta can use an ubuntu shell and a persistent file layer to generate artifacts grok wrote python to encode the xai / grok logo into audio, i gave it the script back and had it render a spectrogram video from that signal, and use the grok_files tool to save the mp4 into the product's files layer i opened the file from the files panel and played it myself this is getting crazy

🖼️ Media
S
stevibe
@stevibe
📅
Apr 17, 2026
6d ago
🆔94658539

Introducing HermesAgent-20, a new Bench Pack for BenchLocal. 20 scenarios extracted straight from the Hermes Agent source code, run against a REAL Hermes instance. The actual workload you'd put your model through. Why I built BenchLocal in the first place: most benchmarks are too abstract. We use local LLMs for practical work, and finding the right model for YOUR task efficiently is the single most important thing, especially when you're constrained to what fits on your machine. BenchLocal is a framework: providers, models, side-by-side comparison, all in one UI. Bench Packs are the unit of testing: ToolCall-15 and BugFind-15 shipped first, and when I launched the BenchLocal 0.1.0, added StructOutput, ReasonMath, InstructFollow, DataExtract. Now, HermesAgent-20 is the newest. Bench Packs install like VS Code extensions. The SDK is open, write your own, share it, grow the ecosystem. Here's the goal: a community-built, practical evaluation layer for the local LLM space. Early numbers on HermesAgent-20: > GLM 5.1 — 85 > Gemma4 31B — 83 > Qwen3.5 27B — 79 > MiniMax M2.7 — 76 Upgrade to the latest BenchLocal to install HermesAgent-20 (SDK update required).

🖼️ Media
F
Flomerboy
@Flomerboy
📅
Apr 17, 2026
6d ago
🆔89252458
0.36

🧵 My tips for getting the best results out of Claude Design! I’m on the verticals team at Anthropic which means I serve 7 different products. Claude Design makes it possible! 1. Set up your design system and your core screens. An hour of setup and refinement here is worth it

@claudeai • Fri Apr 17 15:03

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day. https:

🔁Sanemavcil retweeted
O
OpenClaw🦞
@openclaw
📅
Apr 16, 2026
7d ago
🆔02752638
0.34

OpenClaw 2026.4.15 🦞 🤖 Anthropic Opus 4.7 support 🗣️ Gemini TTS in bundled 🧠 Slimmer context + bounded memory reads 🔧 Codex transport self-heal, safer tool/media handling ✨ Pile of update/channel fixes Good boring release. https://t.co/jiLmr1Bxep

❤️1,606
likes
🔁155
retweets
W
WesRoth
@WesRoth
📅
Apr 17, 2026
6d ago
🆔06671566

Nous Research rolled out the "Tool Gateway" within the Nous Portal, offering a unified ecosystem for AI agent development and deployment. A single paid Nous Portal subscription now natively includes access to over 300 models alongside built-in tooling for web scraping, browser automation, image generation, text-to-speech, and a cloud terminal backend.

@NousResearch • Thu Apr 16 20:39

Tool Gateway is now live in Nous Portal. No separate accounts, no API key juggling. All you need is one subscription, and everything works. A paid Nous Portal subscription now includes access to 300+ models and a growing set of third-party tools. Launching with: → Web scraping

🖼️ Media
D
dair_ai
@dair_ai
📅
Apr 17, 2026
6d ago
🆔92880892

// Skill Learning for Autonomous Web Agents // Web agents can navigate a page, but ask them to repeat a checkout flow they already completed, and they start from scratch every time. This work introduces WebXSkill, a skill learning framework where web agents extract reusable skills from synthetic trajectories. Each skill pairs a parameterized action program with step-level natural language guidance, making it both executable by the runtime and interpretable by the agent. Two deployment modes let the agent either auto-execute skills as atomic tool calls (grounded mode) or follow them as step-by-step instructions while retaining autonomy to adapt (guided mode). Results: - On WebArena, WebXSkill improves task success rate by up to 9.8 points over baselines (69.5% vs 59.7%). - On WebVoyager, grounded mode reaches 86.1%, a 14.2-point gain over vanilla agents. Skills even transfer across environments: guided mode using only WebArena-extracted skills scores 85.1% on WebVoyager. Stronger models benefit more from grounded execution, while weaker models gain more from guided mode, suggesting the deployment strategy should match model capability. Paper: https://t.co/KAMYMLXywg Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1
🖼️ Media
Y
YuvrajS9886
@YuvrajS9886
📅
Apr 17, 2026
6d ago
🆔48226677
0.42

Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis — trying combination of quality rewards with length penalty! Completed all of the following combination rewards! >METEOR + BLEU >BLEU + ROUGE-L >METEOR + ROUGE-L All the code and wandb charts in the comments --- Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis — trying combination of quality rewards with length penalty! Completed all of the following combination rewards! >METEOR + BLEU >BLEU + ROUGE-L >METEOR + ROUGE-L All the code and wandb charts in the comments --- Setup: 3x Mac Minis in a cluster running MLX. One node drives training using GRPO, two push rollouts via vLLM. Trained two variants: → length penalty only (baseline) → length penalty + quality reward (BLEU, METEOR and/or ROUGE-L ) --- Eval: LLM-as-a-Judge (gpt-5) Used DeepEval to build a judge pipeline scoring each summary on 4 axes: → Faithfulness — no hallucinations vs. source → Coverage — key points captured → Conciseness — shorter, no redundancy → Clarity — readable on its own

A
Adam_Fish
@Adam_Fish
📅
Apr 17, 2026
6d ago
🆔01580794
0.38

Webflow's CMS API can't publish code blocks. Tables aren't in the API at all!? So I built a Playwright robot that clicks buttons in the Designer for us. In 2026, your API is your product. https://t.co/sA7Csu21uO

← PreviousPage 2 of 61Next →