Your curated collection of saved posts and media

Recent Top

Showing 32 posts · last 14 days · by score

🖼️ Media

T

Teknium

@Teknium

📅

Apr 15, 2026

8d ago

🆔84184482

⭐0.42

@lirex **Sudo password prompting is a separate mechanism** in `terminal_tool.py`. When Hermes detects `sudo` in a command, it looks for a password to pipe via `sudo -S`. The resolution order: 1. `SUDO_PASSWORD` env var in `~/.hermes/.env` → auto-pipes, no prompt 2. Previously entered password (cached for session) → reuses silently 3. Interactive prompt (CLI only) → asks the user with 45s timeout 4. None of the above → runs command as-is (fails if OS actually needs a password) If they have **passwordless sudo** (NOPASSWD in sudoers), the simplest fix is to add `SUDO_PASSWORD` to their `.env` — even setting it to a dummy value works, because the env var being present tells Hermes "I have this handled, don't prompt." With NOPASSWD configured, sudo ignores the piped password anyway. They can do this through `hermes setup` (it asks about sudo during the tool configuration step) or manually: ``` # In ~/.hermes/.env SUDO_PASSWORD=dummy ``` For fully unrestricted operation overall, they'd want both: - `--yolo` flag (or `approvals.mode: off` in config.yaml) → skips dangerous command approvals - `SUDO_PASSWORD` in `.env` → skips sudo password prompts

View Details View on X ↗

Y

YuvrajS9886

@YuvrajS9886

📅

Apr 17, 2026

6d ago

🆔48226677

⭐0.42

Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis — trying combination of quality rewards with length penalty! Completed all of the following combination rewards! >METEOR + BLEU >BLEU + ROUGE-L >METEOR + ROUGE-L All the code and wandb charts in the comments --- Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis — trying combination of quality rewards with length penalty! Completed all of the following combination rewards! >METEOR + BLEU >BLEU + ROUGE-L >METEOR + ROUGE-L All the code and wandb charts in the comments --- Setup: 3x Mac Minis in a cluster running MLX. One node drives training using GRPO, two push rollouts via vLLM. Trained two variants: → length penalty only (baseline) → length penalty + quality reward (BLEU, METEOR and/or ROUGE-L ) --- Eval: LLM-as-a-Judge (gpt-5) Used DeepEval to build a judge pipeline scoring each summary on 4 axes: → Faithfulness — no hallucinations vs. source → Coverage — key points captured → Conciseness — shorter, no redundancy → Clarity — readable on its own

View Details View on X ↗

W

wandb

@wandb

📅

Apr 14, 2026

9d ago

🆔25294602

⭐0.30

@crypto_fyy @googlegemma @arena We're working on optimizing KV Cache!

View Details View on X ↗

R

RisingSayak

@RisingSayak

📅

Apr 10, 2026

13d ago

🆔02430290

We've been studying what it takes to get NVFP4 & MXFP8 deliver good speedups on modern flow models for image & video gen. on B200 🕵️‍♂️ Today, I'm excited to share those findings! Bringing some cool recipes through Diffusers and TorchAO with `torch.compile` 🔥 Hop in ⬇️ https://t.co/gSd1Kwnu0l

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Apr 17, 2026

6d ago

🆔77991535

⭐0.40

Marin is using quantile balancing from @Jianlin_S (who developed RoPE, which was also a good idea) to train our current 1e23 FLOPs MoE. The idea is elegant: assigning tokens to experts by solving a linear program. No hyperparameters to tune. Yields stable training.

@classiclarryd • Wed Apr 15 16:26

Researchers' brilliant ideas often get lost in the sea of endless SOTA claims on weak baselines. At Marin we battle-test ideas in an open arena, where anyone's idea can be promoted to the next hero run. One that recently rose up was @Jianlin_S MoE Quantile Balancing, used in our

View Details View on X ↗

C

cursor_ai

@cursor_ai

📅

Apr 14, 2026

9d ago

🆔39740909

We've been developing a multi-agent system that builds and maintains complex software autonomously. Recently, we partnered with NVIDIA to apply it to optimizing CUDA kernels. In 3 weeks, it delivered a 38% geomean speedup across 235 problems. https://t.co/0YvbXrzVfe

🖼️ Media

View Details View on X ↗

O

omarsar0

@omarsar0

📅

Apr 17, 2026

6d ago

🆔79696027

LLM agents loop, drift, and get stuck on hard reasoning tasks up to 30% of the time. Current fixes are either too blunt (hard step limits) or too expensive (LLM-as-judge adding 10-15% overhead per step). New research proposes a smarter middle ground. The work introduces the Cognitive Companion, a parallel monitoring architecture with two variants: an LLM-based monitor and a novel Probe-based monitor that detects reasoning degradation from the model's own hidden states at zero inference overhead. The Probe-based Companion trains a simple logistic regression classifier on hidden states from layer 28. It reads the model's internal representations during the existing forward pass, requiring no additional model calls. A single matrix multiplication is all it takes to flag when reasoning quality is declining. Why does it matter? The LLM-based Companion reduced repetition on loop-prone tasks by 52-62% with roughly 11% overhead. The Probe-based variant achieved a mean effect size of +0.471 with zero measured overhead and AUROC 0.840 on cross-validated detection. But the results also reveal an important nuance: companions help on loop-prone and open-ended tasks while showing neutral or negative effects on structured tasks. Models below 3B parameters also struggled to act on companion guidance at all. This suggests the future isn't universal monitoring but selective activation, deploying cognitive companions only where reasoning degradation is a real risk. Paper: https://t.co/K2vqDADwU8 Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

🖼️ Media

View Details View on X ↗

I

iScienceLuvr

@iScienceLuvr

📅

Apr 16, 2026

7d ago

🆔74118755

⭐0.36

cane we PLEASE get some medical benchmarks reported? OpenAI does it, even Meta does it. I'd recommend MedXpertQA and/or HealthBench-Hard

@claudeai • Thu Apr 16 14:29

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision. https://t.co/PtlRdpQcG

View Details View on X ↗

🔁sleepinyourhat retweeted

_

cat

@_catwu

📅

Apr 16, 2026

7d ago

🆔05178822

⭐0.36

Opus 4.7 is live in Claude Code today! The model performs best if you treat it like an engineer you're delegating to, not a pair programmer you're guiding line by line. Here are three workflow shifts we recommend for this model 🧵 https://t.co/bD5JO1xDMS

❤️387

likes

🔁28

retweets

View Details View on X ↗

J

jpschroeder

@jpschroeder

📅

Apr 16, 2026

7d ago

🆔97466896

Is Opus 4.7 good? I suggest you A/B test prompts between Codex and Claude for a while. Good time to mention this is easy to do in https://t.co/ImLyLY82pL https://t.co/Nb4Hr9lvh8

🖼️ Media

View Details View on X ↗

O

OpenAI

@OpenAI

📅

Apr 16, 2026

7d ago

🆔06062670

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks. https://t.co/UEEsYBDYfo

🖼️ Media

View Details View on X ↗

T

theo

@theo

📅

Apr 14, 2026

9d ago

🆔18647683

⭐0.30

Why are you still using React when you can vibe code something better in a day?

View Details View on X ↗

🔁wandb retweeted

N

Nicholas Charriere

@nichochar

📅

Apr 13, 2026

10d ago

🆔68130344

⭐0.32

If you want to build a self-improving harness, the first step is instrumentation. There are tools now that help you do this as "drop-in" plugins into claude code, very cool!

❤️7

likes

🔁3

retweets

View Details View on X ↗

🔁omarsar0 retweeted

D

DAIR.AI

@dair_ai

📅

Apr 09, 2026

14d ago

🆔23311785

⭐0.36

Introducing our Claude Code Cheat Sheet. Keep track of all the latest Claude Code commands, shortcuts, and best practices. All in one place. Easy to navigate. https://t.co/esazLftGnv

❤️91

likes

🔁16

retweets

View Details View on X ↗

M

marimo_io

@marimo_io

📅

Apr 14, 2026

9d ago

🆔35253654

⭐0.38

Stop babysitting your agent. marimo-pair gives coding agents a live view of your notebook. Variables, errors, UI sliders — if you can interact with it, so can the agent. https://t.co/ruVka0EanC

View Details View on X ↗

M

modal

@modal

📅

Apr 16, 2026

7d ago

🆔18344802

⭐0.42

Next Tuesday 12pm EST: @erikdunteman will break down the custom agent harness we launched with Modal sandboxes + @OpenAIDevs Agent SDK. Sandboxes, parallel coding agents, context mgmt, and more. Register here: https://t.co/HAIsKAJY6I

@erikdunteman • Thu Apr 16 20:36

Yesterday we launched our custom agent harness built for parallel background coding tasks, built on @modal sandboxes and @OpenAIDevs Agent SDK. I'll be talking in greater depth about harness design, sandboxes, context management, and more this Tuesday, link below https://t.co/mY

View Details View on X ↗

W

weave_wb

@weave_wb

📅

Apr 16, 2026

7d ago

🆔32775511

You ever run a benchmark and end up with 40 log files, zero clarity, and a laptop that sounds like a jet engine? @runloopai + W&B Weave fixes this 🧵 https://t.co/K5hVq6RkfG

🖼️ Media

View Details View on X ↗

🔁ylecun retweeted

G

gum

@gum1h0x

📅

Apr 08, 2026

15d ago

🆔58202022

⭐0.34

ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n

❤️1,200

likes

🔁109

retweets

View Details View on X ↗

D

dair_ai

@dair_ai

📅

Apr 16, 2026

7d ago

🆔21895729

Coding agents learn from experience, but that knowledge stays locked in silos. Solve a thousand SWE tasks, and none of that wisdom helps with competitive coding. What if memories could transfer across domains? The work introduces Memory Transfer Learning, a framework where coding agents share a unified memory pool across 6 heterogeneous benchmarks. They test four memory formats ranging from raw execution traces to high-level insights, and find that cross-domain memory improves average performance by 3.7%. Why does it matter? The transferable value isn't task-specific code. It's meta-knowledge: validation routines, structured action workflows, safe interaction patterns with execution environments. Algorithmic strategy transfer accounts for only 5.5% of the gains. The real benefit comes from procedural guidance on how to act, not what to code. Abstraction dictates transferability: high-level insights generalize well, while low-level execution traces often cause negative transfer by anchoring agents to incompatible implementation details. Paper: https://t.co/XPD5kczsoZ Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

🖼️ Media

View Details View on X ↗

C

cocktailpeanut

@cocktailpeanut

📅

Apr 16, 2026

7d ago

🆔57964024

Generate FULLY CONTROLLABLE 3D assets from a SINGLE image, locally on your PC. Made a 1-click launcher for the official Anigen Gradio app, and a dedicated viewer. Crazy this is now possible. What you're seeing here came from one image. Requires: NVIDIA GPU 6GB VRAM

@yanpei_cao • Tue Apr 14 16:45

Static 3D generation isn't enough. We need assets ready for animation. Our new #SIGGRAPH work, AniGen, takes a single image and generates the 3D shape, skeleton, and skinning weights all at once. Code is fully open-sourced! Kudos to @KyrieIr31012755 and @VastAIResearch 🧵(1/4) h

🖼️ Media

View Details View on X ↗

K

KSimback

@KSimback

📅

Apr 16, 2026

7d ago

🆔70309706

I put 2 separate instances of Hermes agents into a chat, holy sh!t this is fun >1 agent is builder, 1 is strategist >each on separate models >gave them some shared context >enabled bot2bot andadded each bot to the other's TG allowlist >put 3 of us in a gc >started with a simple post asking each to confirm if they can see each other's messages >about 10 handshakes later they just started building Sometimes you just need to FAFO with these things and see what happens, pretty sure this will become an infinite loop so may need to step in

🖼️ Media

View Details View on X ↗

K

k_taka

@k_taka

📅

Apr 14, 2026

9d ago

🆔06756937

⭐0.32

Codexについて @seratch_ja さんに先週インタビューする機会があったので、その話をベースに、Codexの最近の状況をまとめてみました。基本のところからハーネスエンジニアリングのさわりまで入っています。また、直近で事例が増えた感じの「Codex Use Cases」の紹介も後半のコラムで触れておきました。

@gihyojp • Tue Apr 14 00:02

『週間アクティブユーザー300万人にのぼるCodex、OpenAI Japanの瀬良氏に聞く「開発スタイル」の変化』by @k_taka 公開 https://t.co/dbOThSVKl0

View Details View on X ↗

P

PyTorch

@PyTorch

📅

Apr 16, 2026

7d ago

🆔33842758

🎤 Take the stage at #PyTorchCon North America! We are looking for technical deep dives & production stories for our return to San Jose this Oct 20-21. Check out our "Preparing to Submit" guide to help craft your proposal. 🗓️ Deadline: June 7 Apply now: https://t.co/hLlKK7WxLD https://t.co/leYJj7nDfR

🖼️ Media

View Details View on X ↗

🔁jxnlco retweeted

J

James Sun

@JamesZmSun

📅

Apr 16, 2026

7d ago

🆔86365212

⭐0.32

We are super excited to launch the in-app browser inside Codex with comment mode! View any web pages & iterate with your agent quickly with just point and click. Codex will automatically capture a screenshot, the DOM element, and feed it as precise context to your next chat. No more switching between browsers, dragging screenshots, and wrangling with underspecified prompts. It's great for front-end development of apps/pages, but also very useful if you have documentation pulled up on the side and just want to ask a question!

❤️97

likes

🔁8

retweets

View Details View on X ↗

_

__Rhodium__

@__Rhodium__

📅

Apr 14, 2026

9d ago

🆔34386998

Won best edge AI at the @ycombinator and @innate_bot hackathon! We built a local VLM multi-rover orchestrator for Mars exploration. On-device navigation and automated fault detection & recovery across odometry, stereo vision, and lidar. Thanks for hosting, @ax_pey! https://t.co/GNkSNAMxRN

+1 more

🖼️ Media

View Details View on X ↗

F

fchollet

@fchollet

📅

Apr 16, 2026

7d ago

🆔70689971

⭐0.42

There's a broadly held misconception in AI that methods that scale well are simple methods -- even, that simple methods usually scale. This is completely wrong. Pretty much none of the truly simple methods in ML scale well. SVM, kNN, random forests are some of the simplest methods out there, and they don't scale at all. Meanwhile "train a transformer via backprop and gradient descent" is a very high-entropy method, easily 10x more complex than random forest fitting. But it scales very well. Further, given a simple method that doesn't scale, it is usually the case that you alter it to make it scale by adding a lot of complication. For instance, take a simple a simple combinatorial search-based method (not scalable at all) -- you can make it scale by adding deep learning guidance (which blows up complexity). Scalability usually belongs to high-entropy, complex systems.

View Details View on X ↗

E

emollick

@emollick

📅

Apr 16, 2026

7d ago

🆔26824062

⭐0.38

It is not well-explained, but with the adaptive switch off, I get no thinking. I can set thinking levels in Claude Code, but not in Claude Cowork. AI companies keep seeming to assume that coding/technical work is the only kind of important intellectual work out there (it is not)

View Details View on X ↗

P

PyTorch

@PyTorch

📅

Apr 10, 2026

13d ago

🆔65385031

PyTorch Foundation is expanding its #OpenSourceAI stack with #Safetensors, #ExecuTorch, and #Helion to improve model security, inference, and performance portability, writes Meredith Shubel for @thenewstack. @sparkycollier: Bringing Safetensors into the fold is “an important step towards scaling production-grade AI models.” ExecuTorch becomes a part of #PyTorch Core to expand on-device inference capabilities. Safetensors and Helion join @vllm_project, @DeepSpeedAI, and @raydistributed as foundation-hosted projects. Read Meredith Shubel’s coverage at @thenewstack here: https://t.co/ZoyWbP6Vji @huggingface @Meta

🖼️ Media

View Details View on X ↗

🔁Sanemavcil retweeted

T

Tom Dörr

@tom_doerr

📅

Apr 17, 2026

6d ago

🆔18317218

Offline-first AI agent for Raspberry Pi https://t.co/iapUnKRhXI https://t.co/FtE8vK8kSu

❤️69

likes

🔁9

retweets

🖼️ Media

View Details View on X ↗

🔁sama retweeted

A

Ari Weinstein

@AriX

📅

Apr 16, 2026

7d ago

🆔43388444

This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal. https://t.co/5kjwGMDpvd

❤️588

likes

🔁30

retweets

🖼️ Media

View Details View on X ↗

G

geminicli

@geminicli

📅

Apr 15, 2026

8d ago

🆔20554319

⭐0.38

Long time in the making... Subagents! 🧠✨ Each subagent comes with a separate context window, custom system instructions, and curated set of tools. • Create specialized expert agents 🤖 • Keep the main agent focused and context clean ✨ • Delegate work to parallel agents at the same time👥 Read the blog below for details 👇

@JackWoth98 • Wed Apr 15 16:04

Subagents have arrived in Gemini CLI! 🤖🚀 Create your own custom subagents in @geminicli! Subagents are specialized, expert agents that the main agent can delegate work to. 📦- Subagents have their own set of tools, MCP servers, system instructions, and context window. 🏷️- Use @a

View Details View on X ↗

🔁ylecun retweeted

K

Dawid Moczadło

@kannthu1

📅

Apr 09, 2026

14d ago

🆔06950031

⭐0.36

I looked at their prompts, It's complete bs They are literally providing all of the insight to the LLM upfront > Are there any security vulnerabilities in this code? Consider the behavior of the SEQ_LT/SEQ_GT macros with sequence number wraparound. If you find issues, explain how an attacker might trigger them. They are providing ALL required facts to the LLM, and they only ask the LLM to connect the dots The real challenge for LLMs would be to get those insights first THAT IS THE WHOLE CHALLENGE IN CYBERSECURITY; TO HAVE DEEP INSIGHT This test proves nothing; don't make any conclusions about OSS models being good for security based on this

❤️666

likes

🔁47

retweets

View Details View on X ↗

← PreviousPage 3 of 34Next →