Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
T
Teknium
@Teknium
๐Ÿ“…
Apr 15, 2026
8d ago
๐Ÿ†”84184482
โญ0.42

@lirex **Sudo password prompting is a separate mechanism** in `terminal_tool.py`. When Hermes detects `sudo` in a command, it looks for a password to pipe via `sudo -S`. The resolution order: 1. `SUDO_PASSWORD` env var in `~/.hermes/.env` โ†’ auto-pipes, no prompt 2. Previously entered password (cached for session) โ†’ reuses silently 3. Interactive prompt (CLI only) โ†’ asks the user with 45s timeout 4. None of the above โ†’ runs command as-is (fails if OS actually needs a password) If they have **passwordless sudo** (NOPASSWD in sudoers), the simplest fix is to add `SUDO_PASSWORD` to their `.env` โ€” even setting it to a dummy value works, because the env var being present tells Hermes "I have this handled, don't prompt." With NOPASSWD configured, sudo ignores the piped password anyway. They can do this through `hermes setup` (it asks about sudo during the tool configuration step) or manually: ``` # In ~/.hermes/.env SUDO_PASSWORD=dummy ``` For fully unrestricted operation overall, they'd want both: - `--yolo` flag (or `approvals.mode: off` in config.yaml) โ†’ skips dangerous command approvals - `SUDO_PASSWORD` in `.env` โ†’ skips sudo password prompts

Y
YuvrajS9886
@YuvrajS9886
๐Ÿ“…
Apr 17, 2026
6d ago
๐Ÿ†”48226677
โญ0.42

Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis โ€” trying combination of quality rewards with length penalty! Completed all of the following combination rewards! >METEOR + BLEU >BLEU + ROUGE-L >METEOR + ROUGE-L All the code and wandb charts in the comments --- Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis โ€” trying combination of quality rewards with length penalty! Completed all of the following combination rewards! >METEOR + BLEU >BLEU + ROUGE-L >METEOR + ROUGE-L All the code and wandb charts in the comments --- Setup: 3x Mac Minis in a cluster running MLX. One node drives training using GRPO, two push rollouts via vLLM. Trained two variants: โ†’ length penalty only (baseline) โ†’ length penalty + quality reward (BLEU, METEOR and/or ROUGE-L ) --- Eval: LLM-as-a-Judge (gpt-5) Used DeepEval to build a judge pipeline scoring each summary on 4 axes: โ†’ Faithfulness โ€” no hallucinations vs. source โ†’ Coverage โ€” key points captured โ†’ Conciseness โ€” shorter, no redundancy โ†’ Clarity โ€” readable on its own

W
wandb
@wandb
๐Ÿ“…
Apr 14, 2026
9d ago
๐Ÿ†”25294602
โญ0.30

@crypto_fyy @googlegemma @arena We're working on optimizing KV Cache!

R
RisingSayak
@RisingSayak
๐Ÿ“…
Apr 10, 2026
13d ago
๐Ÿ†”02430290

We've been studying what it takes to get NVFP4 & MXFP8 deliver good speedups on modern flow models for image & video gen. on B200 ๐Ÿ•ต๏ธโ€โ™‚๏ธ Today, I'm excited to share those findings! Bringing some cool recipes through Diffusers and TorchAO with `torch.compile` ๐Ÿ”ฅ Hop in โฌ‡๏ธ https://t.co/gSd1Kwnu0l

Media 1
๐Ÿ–ผ๏ธ Media
P
percyliang
@percyliang
๐Ÿ“…
Apr 17, 2026
6d ago
๐Ÿ†”77991535
โญ0.40

Marin is using quantile balancing from @Jianlin_S (who developed RoPE, which was also a good idea) to train our current 1e23 FLOPs MoE. The idea is elegant: assigning tokens to experts by solving a linear program. No hyperparameters to tune. Yields stable training.

@classiclarryd โ€ข Wed Apr 15 16:26

Researchers' brilliant ideas often get lost in the sea of endless SOTA claims on weak baselines. At Marin we battle-test ideas in an open arena, where anyone's idea can be promoted to the next hero run. One that recently rose up was @Jianlin_S MoE Quantile Balancing, used in our

C
cursor_ai
@cursor_ai
๐Ÿ“…
Apr 14, 2026
9d ago
๐Ÿ†”39740909

We've been developing a multi-agent system that builds and maintains complex software autonomously. Recently, we partnered with NVIDIA to apply it to optimizing CUDA kernels. In 3 weeks, it delivered a 38% geomean speedup across 235 problems. https://t.co/0YvbXrzVfe

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Apr 17, 2026
6d ago
๐Ÿ†”79696027

LLM agents loop, drift, and get stuck on hard reasoning tasks up to 30% of the time. Current fixes are either too blunt (hard step limits) or too expensive (LLM-as-judge adding 10-15% overhead per step). New research proposes a smarter middle ground. The work introduces the Cognitive Companion, a parallel monitoring architecture with two variants: an LLM-based monitor and a novel Probe-based monitor that detects reasoning degradation from the model's own hidden states at zero inference overhead. The Probe-based Companion trains a simple logistic regression classifier on hidden states from layer 28. It reads the model's internal representations during the existing forward pass, requiring no additional model calls. A single matrix multiplication is all it takes to flag when reasoning quality is declining. Why does it matter? The LLM-based Companion reduced repetition on loop-prone tasks by 52-62% with roughly 11% overhead. The Probe-based variant achieved a mean effect size of +0.471 with zero measured overhead and AUROC 0.840 on cross-validated detection. But the results also reveal an important nuance: companions help on loop-prone and open-ended tasks while showing neutral or negative effects on structured tasks. Models below 3B parameters also struggled to act on companion guidance at all. This suggests the future isn't universal monitoring but selective activation, deploying cognitive companions only where reasoning degradation is a real risk. Paper: https://t.co/K2vqDADwU8 Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

Media 1
๐Ÿ–ผ๏ธ Media
I
iScienceLuvr
@iScienceLuvr
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”74118755
โญ0.36

cane we PLEASE get some medical benchmarks reported? OpenAI does it, even Meta does it. I'd recommend MedXpertQA and/or HealthBench-Hard

@claudeai โ€ข Thu Apr 16 14:29

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision. https://t.co/PtlRdpQcG

๐Ÿ”sleepinyourhat retweeted
_
cat
@_catwu
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”05178822
โญ0.36

Opus 4.7 is live in Claude Code today! The model performs best if you treat it like an engineer you're delegating to, not a pair programmer you're guiding line by line. Here are three workflow shifts we recommend for this model ๐Ÿงต https://t.co/bD5JO1xDMS

โค๏ธ387
likes
๐Ÿ”28
retweets
J
jpschroeder
@jpschroeder
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”97466896

Is Opus 4.7 good? I suggest you A/B test prompts between Codex and Claude for a while. Good time to mention this is easy to do in https://t.co/ImLyLY82pL https://t.co/Nb4Hr9lvh8

Media 1Media 2
๐Ÿ–ผ๏ธ Media
O
OpenAI
@OpenAI
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”06062670

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks. https://t.co/UEEsYBDYfo

๐Ÿ–ผ๏ธ Media
T
theo
@theo
๐Ÿ“…
Apr 14, 2026
9d ago
๐Ÿ†”18647683
โญ0.30

Why are you still using React when you can vibe code something better in a day?

๐Ÿ”wandb retweeted
N
Nicholas Charriere
@nichochar
๐Ÿ“…
Apr 13, 2026
10d ago
๐Ÿ†”68130344
โญ0.32

If you want to build a self-improving harness, the first step is instrumentation. There are tools now that help you do this as "drop-in" plugins into claude code, very cool!

โค๏ธ7
likes
๐Ÿ”3
retweets
๐Ÿ”omarsar0 retweeted
D
DAIR.AI
@dair_ai
๐Ÿ“…
Apr 09, 2026
14d ago
๐Ÿ†”23311785
โญ0.36

Introducing our Claude Code Cheat Sheet. Keep track of all the latest Claude Code commands, shortcuts, and best practices. All in one place. Easy to navigate. https://t.co/esazLftGnv

โค๏ธ91
likes
๐Ÿ”16
retweets
M
marimo_io
@marimo_io
๐Ÿ“…
Apr 14, 2026
9d ago
๐Ÿ†”35253654
โญ0.38

Stop babysitting your agent. marimo-pair gives coding agents a live view of your notebook. Variables, errors, UI sliders โ€” if you can interact with it, so can the agent. https://t.co/ruVka0EanC

M
modal
@modal
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”18344802
โญ0.42

Next Tuesday 12pm EST: @erikdunteman will break down the custom agent harness we launched with Modal sandboxes + @OpenAIDevs Agent SDK. Sandboxes, parallel coding agents, context mgmt, and more. Register here: https://t.co/HAIsKAJY6I

@erikdunteman โ€ข Thu Apr 16 20:36

Yesterday we launched our custom agent harness built for parallel background coding tasks, built on @modal sandboxes and @OpenAIDevs Agent SDK. I'll be talking in greater depth about harness design, sandboxes, context management, and more this Tuesday, link below https://t.co/mY

W
weave_wb
@weave_wb
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”32775511

You ever run a benchmark and end up with 40 log files, zero clarity, and a laptop that sounds like a jet engine? @runloopai + W&B Weave fixes this ๐Ÿงต https://t.co/K5hVq6RkfG

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”ylecun retweeted
G
gum
@gum1h0x
๐Ÿ“…
Apr 08, 2026
15d ago
๐Ÿ†”58202022
โญ0.34

ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n

โค๏ธ1,200
likes
๐Ÿ”109
retweets
D
dair_ai
@dair_ai
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”21895729

Coding agents learn from experience, but that knowledge stays locked in silos. Solve a thousand SWE tasks, and none of that wisdom helps with competitive coding. What if memories could transfer across domains? The work introduces Memory Transfer Learning, a framework where coding agents share a unified memory pool across 6 heterogeneous benchmarks. They test four memory formats ranging from raw execution traces to high-level insights, and find that cross-domain memory improves average performance by 3.7%. Why does it matter? The transferable value isn't task-specific code. It's meta-knowledge: validation routines, structured action workflows, safe interaction patterns with execution environments. Algorithmic strategy transfer accounts for only 5.5% of the gains. The real benefit comes from procedural guidance on how to act, not what to code. Abstraction dictates transferability: high-level insights generalize well, while low-level execution traces often cause negative transfer by anchoring agents to incompatible implementation details. Paper: https://t.co/XPD5kczsoZ Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1
๐Ÿ–ผ๏ธ Media
C
cocktailpeanut
@cocktailpeanut
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”57964024

Generate FULLY CONTROLLABLE 3D assets from a SINGLE image, locally on your PC. Made a 1-click launcher for the official Anigen Gradio app, and a dedicated viewer. Crazy this is now possible. What you're seeing here came from one image. Requires: NVIDIA GPU 6GB VRAM

@yanpei_cao โ€ข Tue Apr 14 16:45

Static 3D generation isn't enough. We need assets ready for animation. Our new #SIGGRAPH work, AniGen, takes a single image and generates the 3D shape, skeleton, and skinning weights all at once. Code is fully open-sourced! Kudos to @KyrieIr31012755 and @VastAIResearch ๐Ÿงต(1/4) h

๐Ÿ–ผ๏ธ Media
K
KSimback
@KSimback
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”70309706

I put 2 separate instances of Hermes agents into a chat, holy sh!t this is fun >1 agent is builder, 1 is strategist >each on separate models >gave them some shared context >enabled bot2bot andadded each bot to the other's TG allowlist >put 3 of us in a gc >started with a simple post asking each to confirm if they can see each other's messages >about 10 handshakes later they just started building Sometimes you just need to FAFO with these things and see what happens, pretty sure this will become an infinite loop so may need to step in

Media 1
๐Ÿ–ผ๏ธ Media
K
k_taka
@k_taka
๐Ÿ“…
Apr 14, 2026
9d ago
๐Ÿ†”06756937
โญ0.32

Codexใซใคใ„ใฆ @seratch_ja ใ•ใ‚“ใซๅ…ˆ้€ฑใ‚คใƒณใ‚ฟใƒ“ใƒฅใƒผใ™ใ‚‹ๆฉŸไผšใŒใ‚ใฃใŸใฎใงใ€ใใฎ่ฉฑใ‚’ใƒ™ใƒผใ‚นใซใ€Codexใฎๆœ€่ฟ‘ใฎ็Šถๆณใ‚’ใพใจใ‚ใฆใฟใพใ—ใŸใ€‚ๅŸบๆœฌใฎใจใ“ใ‚ใ‹ใ‚‰ใƒใƒผใƒใ‚นใ‚จใƒณใ‚ธใƒ‹ใ‚ขใƒชใƒณใ‚ฐใฎใ•ใ‚ใ‚Šใพใงๅ…ฅใฃใฆใ„ใพใ™ใ€‚ใพใŸใ€็›ด่ฟ‘ใงไบ‹ไพ‹ใŒๅข—ใˆใŸๆ„Ÿใ˜ใฎใ€ŒCodex Use Casesใ€ใฎ็ดนไป‹ใ‚‚ๅพŒๅŠใฎใ‚ณใƒฉใƒ ใง่งฆใ‚ŒใฆใŠใใพใ—ใŸใ€‚

@gihyojp โ€ข Tue Apr 14 00:02

ใ€Ž้€ฑ้–“ใ‚ขใ‚ฏใƒ†ใ‚ฃใƒ–ใƒฆใƒผใ‚ถใƒผ300ไธ‡ไบบใซใฎใผใ‚‹Codexใ€OpenAI Japanใฎ็€ฌ่‰ฏๆฐใซ่žใใ€Œ้–‹็™บใ‚นใ‚ฟใ‚คใƒซใ€ใฎๅค‰ๅŒ–ใ€by @k_taka ๅ…ฌ้–‹ https://t.co/dbOThSVKl0

P
PyTorch
@PyTorch
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”33842758

๐ŸŽค Take the stage at #PyTorchCon North America! We are looking for technical deep dives & production stories for our return to San Jose this Oct 20-21. Check out our "Preparing to Submit" guide to help craft your proposal. ๐Ÿ—“๏ธ Deadline: June 7 Apply now: https://t.co/hLlKK7WxLD https://t.co/leYJj7nDfR

๐Ÿ–ผ๏ธ Media
๐Ÿ”jxnlco retweeted
J
James Sun
@JamesZmSun
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”86365212
โญ0.32

We are super excited to launch the in-app browser inside Codex with comment mode! View any web pages & iterate with your agent quickly with just point and click. Codex will automatically capture a screenshot, the DOM element, and feed it as precise context to your next chat. No more switching between browsers, dragging screenshots, and wrangling with underspecified prompts. It's great for front-end development of apps/pages, but also very useful if you have documentation pulled up on the side and just want to ask a question!

โค๏ธ97
likes
๐Ÿ”8
retweets
_
__Rhodium__
@__Rhodium__
๐Ÿ“…
Apr 14, 2026
9d ago
๐Ÿ†”34386998

Won best edge AI at the @ycombinator and @innate_bot hackathon! We built a local VLM multi-rover orchestrator for Mars exploration. On-device navigation and automated fault detection & recovery across odometry, stereo vision, and lidar. Thanks for hosting, @ax_pey! https://t.co/GNkSNAMxRN

Media 2
+1 more
๐Ÿ–ผ๏ธ Media
F
fchollet
@fchollet
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”70689971
โญ0.42

There's a broadly held misconception in AI that methods that scale well are simple methods -- even, that simple methods usually scale. This is completely wrong. Pretty much none of the truly simple methods in ML scale well. SVM, kNN, random forests are some of the simplest methods out there, and they don't scale at all. Meanwhile "train a transformer via backprop and gradient descent" is a very high-entropy method, easily 10x more complex than random forest fitting. But it scales very well. Further, given a simple method that doesn't scale, it is usually the case that you alter it to make it scale by adding a lot of complication. For instance, take a simple a simple combinatorial search-based method (not scalable at all) -- you can make it scale by adding deep learning guidance (which blows up complexity). Scalability usually belongs to high-entropy, complex systems.

E
emollick
@emollick
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”26824062
โญ0.38

It is not well-explained, but with the adaptive switch off, I get no thinking. I can set thinking levels in Claude Code, but not in Claude Cowork. AI companies keep seeming to assume that coding/technical work is the only kind of important intellectual work out there (it is not)

P
PyTorch
@PyTorch
๐Ÿ“…
Apr 10, 2026
13d ago
๐Ÿ†”65385031

PyTorch Foundation is expanding its #OpenSourceAI stack with #Safetensors, #ExecuTorch, and #Helion to improve model security, inference, and performance portability, writes Meredith Shubel for @thenewstack. @sparkycollier: Bringing Safetensors into the fold is โ€œan important step towards scaling production-grade AI models.โ€ ExecuTorch becomes a part of #PyTorch Core to expand on-device inference capabilities. Safetensors and Helion join @vllm_project, @DeepSpeedAI, and @raydistributed as foundation-hosted projects. Read Meredith Shubelโ€™s coverage at @thenewstack here: https://t.co/ZoyWbP6Vji @huggingface @Meta

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”Sanemavcil retweeted
T
Tom Dรถrr
@tom_doerr
๐Ÿ“…
Apr 17, 2026
6d ago
๐Ÿ†”18317218

Offline-first AI agent for Raspberry Pi https://t.co/iapUnKRhXI https://t.co/FtE8vK8kSu

Media 1
โค๏ธ69
likes
๐Ÿ”9
retweets
๐Ÿ–ผ๏ธ Media
๐Ÿ”sama retweeted
A
Ari Weinstein
@AriX
๐Ÿ“…
Apr 16, 2026
7d ago
๐Ÿ†”43388444

This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal. https://t.co/5kjwGMDpvd

โค๏ธ588
likes
๐Ÿ”30
retweets
๐Ÿ–ผ๏ธ Media
G
geminicli
@geminicli
๐Ÿ“…
Apr 15, 2026
8d ago
๐Ÿ†”20554319
โญ0.38

Long time in the making... Subagents! ๐Ÿง โœจ Each subagent comes with a separate context window, custom system instructions, and curated set of tools. โ€ข Create specialized expert agents ๐Ÿค– โ€ข Keep the main agent focused and context clean โœจ โ€ข Delegate work to parallel agents at the same time๐Ÿ‘ฅ Read the blog below for details ๐Ÿ‘‡

@JackWoth98 โ€ข Wed Apr 15 16:04

Subagents have arrived in Gemini CLI! ๐Ÿค–๐Ÿš€ Create your own custom subagents in @geminicli! Subagents are specialized, expert agents that the main agent can delegate work to. ๐Ÿ“ฆ- Subagents have their own set of tools, MCP servers, system instructions, and context window. ๐Ÿท๏ธ- Use @a

๐Ÿ”ylecun retweeted
K
Dawid Moczadล‚o
@kannthu1
๐Ÿ“…
Apr 09, 2026
14d ago
๐Ÿ†”06950031
โญ0.36

I looked at their prompts, It's complete bs They are literally providing all of the insight to the LLM upfront > Are there any security vulnerabilities in this code? Consider the behavior of the SEQ_LT/SEQ_GT macros with sequence number wraparound. If you find issues, explain how an attacker might trigger them. They are providing ALL required facts to the LLM, and they only ask the LLM to connect the dots The real challenge for LLMs would be to get those insights first THAT IS THE WHOLE CHALLENGE IN CYBERSECURITY; TO HAVE DEEP INSIGHT This test proves nothing; don't make any conclusions about OSS models being good for security based on this

โค๏ธ666
likes
๐Ÿ”47
retweets