Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
R
rasbt
@rasbt
๐Ÿ“…
Jun 27, 2026
7d ago
๐Ÿ†”01463137

I put together a new article on setting up local coding agents with open-weight models. Everything runs 100% locally. I thought it might be useful putting this together because many people asked me about my setup in the past, and I thought it would also motivate people to get started tinkering with local models for serious work (yes, things got incredibly capable this year with better LLMs and better harnesses). So, here's a walkthrough of how to connect a local LLM to a local coding harness (could be Claude Code or Codex, which you may already be familiar with). I also included some assessment notes that are useful as a checklist to select between and consider certain LLMs over others: - Checking RAM usage at long contexts to see if the model is suitable for real work - Measuring prefill and decoding tok/sec to see whether it's fast enough to not be annoying - Making sure the model has sufficient tool-calling capabilities in theory - Assessing whether the model can solve some more challenging tasks when used in a coding harness. Of course, there are always more specialized tools that can squeeze a bit more performance out of things, but I hope this is a good starter kit that stays flexible; that is you can easily switch to newer models as they are released or even tap into cloud models in your familiar harness if the current ones are not sufficient enough for a given task.

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”ai_fast_track retweeted
J
Jiwei Li
@JiweiLi1
๐Ÿ“…
Jun 25, 2026
9d ago
๐Ÿ†”60712475
โญ0.34

Excited to share Ornith, our latest family of open-source models specialized for agentic coding. Ornith achieves SOTA performance among open-source models of comparable size on a variety of coding benchmarks (Terminal-Bench 2.1, SWE, NL2Repo, OpenClaw, SWE Atlas, etc) Feedback is deeply appreciated! ๐Ÿ“–Tech Blog: https://t.co/MiaaDExj9B ๐Ÿค—Huggingface: https://t.co/eDtzanc5Vp

โค๏ธ562
likes
๐Ÿ”42
retweets
๐Ÿ”ylecun retweeted
P
Photoroom
@photoroom_ML
๐Ÿ“…
Jun 12, 2026
22d ago
๐Ÿ†”70417811
โญ0.36

๐Ÿš€ Meet PRX Pixel. Our new open-source 7B text-to-image model that generates images directly in pixel space. After months of pretraining on hundreds of millions of images, supervised fine-tuning, and preference alignment, we're excited to share a first public preview. The weights are already available, and we're currently working on integrating the model directly into Diffusers ๐Ÿค—to make the model even easier to use. Test it yourself in the demo below. And as always, we'll be sharing the full story behind the model through a series of technical blog posts covering the entire training recipe. Link in the comments ๐Ÿ‘‡

โค๏ธ343
likes
๐Ÿ”52
retweets
J
JiweiLi1
@JiweiLi1
๐Ÿ“…
Jun 25, 2026
9d ago
๐Ÿ†”60712475

Excited to share Ornith, our latest family of open-source models specialized for agentic coding. Ornith achieves SOTA performance among open-source models of comparable size on a variety of coding benchmarks (Terminal-Bench 2.1, SWE, NL2Repo, OpenClaw, SWE Atlas, etc) Feedback is deeply appreciated! ๐Ÿ“–Tech Blog: https://t.co/MiaaDExj9B ๐Ÿค—Huggingface: https://t.co/eDtzanc5Vp

@ornith_ โ€ข Thu Jun 25 14:15

Aloha! ๐ŸŒบ Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding

Media 1
๐Ÿ–ผ๏ธ Media
D
dair_ai
@dair_ai
๐Ÿ“…
Jun 28, 2026
5d ago
๐Ÿ†”93564941

Why do RL runs on LLMs blow up even when the recipe looks right? GEOALIGN, from the Alibaba team behind Qwen, points at the rollouts. A handful of bad batches push the policy in incoherent directions, and most stability tuning just damps the symptom. This work curates rollouts by their geometry, removing the samples that make update directions conflict before they destabilize training. Why does it matter? If instability is largely a bad-batch problem, rollout curation is a lower-effort lever than another round of KL or clip tuning. You fix the data going into the update rather than fighting the optimizer. Paper: https://t.co/tUAYC57MVy Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1
๐Ÿ–ผ๏ธ Media
D
Designarena
@Designarena
๐Ÿ“…
Jul 02, 2026
1d ago
๐Ÿ†”66509130

BREAKING: Gemini Omni Flash by @GoogleDeepMind is 1st overall on Video Arena with an Elo of 1404. Gemini Omni Flash establishes a 101 point Elo gap over Seedance 2.0 Mini by @BytePlusGlobal in 2nd place, one of the largest leaps weโ€™ve ever seen on Video Arena. This establishes Google as the worldโ€™s leading video generation lab, with a leap of 7 positions from their Veo series. Congratulations to the @GoogleDeepMind team on this accomplishment!

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”random_walker retweeted
S
Sayash Kapoor
@sayashk
๐Ÿ“…
Jul 02, 2026
2d ago
๐Ÿ†”57204695
โญ0.34

Update on our long-horizon AI R&D evals: In April, we launched CRUX, a project to regularly run open-world evaluations. These long, messy, real-world tests of what AI agents can actually do. Our second evaluation is underway, and we ask: AI agents automate AI research? There is a lot of interest in studying AI research automation. But most of the systems built so far follow one of three patterns. 1) keep a human in the loop to guide the agent and course-correct along the way. 2) focus on narrow problems where ground truth is clear and progress is easy to verify, as in AutoResearch. 3) use scaffolds engineered for one specific type of research question, so strong results may say more about the scaffold than about the agent's general research ability. These efforts are helpful, but a lot of AI research is much broader. Success is not immediately clear or verifiable. Researchers need to test and reject promising hypotheses, backtrack, consider new or unconventional approaches, and do a lot more to make progress on answering research questions. In CRUX #2, we are trying to test whether agents can answer novel, open-ended AI research questions. - One major risk in such a task is contamination. We want the agent to have access to the internet and all the tools it needs to solve the task, so we can't use research questions from publicly available papers. At the same time, we want high quality papers to serve as the source of challenging research questions. - To address this, we partnered with AI researchers from UKAISI, UToronto, Princeton, and other institutions who have written high-quality papers that arenโ€™t yet public, so thereโ€™s no risk of contamination. - The authors pose open-ended research questions without giving away answers. The agent must produce a NeurIPS-quality paper and a reproducible codebase, which the authors of the papers then review. - We built a general-purpose scaffold on OpenClaw and Opus 4.8. (We would have loved to use Fable 5, but given the filters on AI R&D capabilities, we don't want to confound results.) - Agents get generous resource budgets set in consultation with the original authors, such as access to VMs, GPUs, and any other compute needed to answer the question. They also have $3,000 in API credits per paper. We evaluate them on week-long time horizons to make progress on answering the research question, far more than typical agent evals. - The agent needs to manage its own budget. It can track its spend and stay within its limits, and it can modify its scaffold and reasoning effort as it sees fit. - In addition to the final artifacts, such as the paper's code, we are also evaluating the agent's trajectories in depth. When we announced CRUX, we planned to conduct an open-world eval every month. Given the scope and ambition of this project, we have spent a lot more time making sure we are confident in our setup and results. That said, the early results we have are exciting, and we look forward to sharing them soon.

โค๏ธ47
likes
๐Ÿ”11
retweets
R
RekaAILabs
@RekaAILabs
๐Ÿ“…
Jun 30, 2026
4d ago
๐Ÿ†”33038475

๐ŸŽฎ๐Ÿ•น๏ธ๐Ÿ–ฅ๏ธ CS2-10k is now available on @huggingface ๐Ÿš€ 600,000+ egocentric gameplay videos. 10,000+ hours. Every frame paired with the exact keyboard, mouse, and 3D position data that produced it. If you're working on world models, action-conditioned video generation, or egocentric navigation, this is ready to download and use today.

๐Ÿ–ผ๏ธ Media
I
IBuzovskyi
@IBuzovskyi
๐Ÿ“…
Jun 30, 2026
3d ago
๐Ÿ†”10618322

HERMES AGENT NOW READS THE WEB UP TO 60X FASTER AND 49X CHEAPER. CLEAN CONTENT STRAIGHT TO THE AGENT. LARGE PAGES PAGED ON DEMAND. @NousResearch scraping backends used to return raw content that got processed redundantly before reaching the agent. that pipeline is gone. now: backends pass clean content directly. large pages save locally and page on demand. same quality. fraction of the time and cost. HOW WEB_EXTRACT HANDLES LARGE PAGES: size-driven processing. no wasted tokens. under 5,000 chars: โ†’ returned as-is. no LLM call. full markdown reaches the agent. 5,000 to 500,000 chars: โ†’ single-pass summary via auxiliary model. capped at ~5,000 chars of output. keeps quotes, code blocks, key facts. 500,000 to 2,000,000 chars: โ†’ chunked into 100K-char pieces. each chunk summarized in parallel. final synthesis: ~5,000 chars. over 2,000,000 chars: โ†’ refused with a hint to use web_crawl with focused extraction instructions. the summary is a content compressor, not a paraphraser. if summarization fails, Hermes falls back to the first ~5,000 chars of raw content. no useless error messages. ROUTE EXTRACTION TO A CHEAP MODEL: by default, web_extract uses your main model. on Opus that means every long page burns premium tokens on summarization. set in Desktop app, Dashboard, or config.yaml: auxiliary: web_extract: provider: openrouter model: google/gemini-3-flash-preview timeout: 360 extraction summaries on Gemini Flash. reasoning stays on your premium model. this alone cuts web research costs significantly. 8 BACKEND PROVIDERS: Firecrawl (default): search + extract + crawl. 500 free credits/month. SearXNG: free, self-hosted, search-only. no API key. Brave Search: 2,000 free queries/month. search-only. DDGS (DuckDuckGo): free, no key needed. search-only. Tavily: search + extract + crawl. 1,000 free searches/month. Exa: search + extract. 1,000 free searches/month. Parallel: search + extract. paid. xAI (Grok): search-only. LLM-generated results via Grok. search-only providers pair with Firecrawl/Tavily/Exa for extract capability. PER-CAPABILITY SPLIT: use different providers for search vs extract: SearXNG (free) for search. Firecrawl for extract. free searches. paid extraction only when needed. configure via hermes tools or config.yaml. FREE SELF-HOSTED SEARCH (SEARXNG): zero API costs. zero rate limits. privacy-respecting metasearch across 70+ engines. docker compose up -d set SEARXNG_URL in .env. enable JSON format in settings.yml. Hermes connects automatically. pair with Firecrawl for extract and you have search for free with paid extraction only on demand. NOUS PORTAL SUBSCRIBERS: web search and extract included through the Tool Gateway via managed Firecrawl. no API key needed. no separate billing. hermes setup --portal enables everything. WHEN YOU NEED RAW CONTENT: if the LLM summary drops important fields (structured data, tables, specific formatting): use browser_navigate + browser_snapshot instead. returns the live accessibility tree without auxiliary-model rewriting. full Hermes architecture deep-dive in the article ๐Ÿ‘‡

@IBuzovskyi โ€ข Sun Jun 21 09:38

https://t.co/VxyyeQCimO

๐Ÿ–ผ๏ธ Media
Y
yifannnwu
@yifannnwu
๐Ÿ“…
Jun 30, 2026
4d ago
๐Ÿ†”23050636

Introducing SWE-Together: a multi-turn benchmark built from real userโ€“agent coding sessions. Coding agents are often benchmarked like exam-takers: given the full spec up front, then graded on the final code. But real coding help is a conversation โ€” users clarify goals, add constraints, and correct course along the way. SWE-Together turns real coding work into a reproducible, verifiable benchmark: 109 repo-level tasks curated from 11,260 recorded sessions, replayed with a reactive LLM user simulator that preserves the original userโ€™s intent. We evaluate agents as collaborators, not just patch generators: final pass rate and how many user interventions were needed to get there. In this evaluation snapshot, claude-opus-4.8 currently leads among the 7 agents we tested โ€” achieving the highest pass rate while requiring the fewest user interventions. ๐Ÿ“„ Paper: https://t.co/Zp5BSPpLTJ ๐Ÿ’ป Code: https://t.co/NPgxCMLdHi ๐ŸŒ Website: https://t.co/BK50zRGReE

Media 1Media 2
๐Ÿ–ผ๏ธ Media
D
deredleritt3r
@deredleritt3r
๐Ÿ“…
Jun 30, 2026
3d ago
๐Ÿ†”76903821
โญ0.40

Notice that Sonnet 5 scores worse than Opus 4.8 on every single benchmark (except GDPval, on which it's 3 points higher - nothing material). This is in line with my suspicion that we have an unofficial moratorium on frontier model releases in the U.S. until the Fable 5/GPT-5.6 situation is resolved.

@claudeai โ€ข Tue Jun 30 18:00

Sonnet 5 is a substantial improvement over Sonnet 4.6 on reasoning, tool use, coding, and knowledge work. Its performance is close to Opus 4.8, at lower prices. https://t.co/VOISbk14Lk

S
sara_drag
@sara_drag
๐Ÿ“…
Jun 30, 2026
4d ago
๐Ÿ†”64815937

Is Muon as good as they say? We looked beyond training speed and found a hidden cost: Muon loses the simplicity bias of older optimizers like gradient descent โ€” and this matters for generalization. https://t.co/t85NhOtCsG

Media 1
๐Ÿ–ผ๏ธ Media
L
LiorOnAI
@LiorOnAI
๐Ÿ“…
Jul 01, 2026
2d ago
๐Ÿ†”67365167
โญ0.42

You now convert any LLM into a faster one without retraining from scratch. NVIDIA just did this to their 30B model. Here's the trick: 1. Duplicate the model into two copies 2. Freeze one copy, it just reads the prompt and remembers context 3. Train the other copy to write chunks of text at once instead of one word at a time 4. Run them together The frozen copy barely costs anything (it's already trained). The new copy only needed ~8% of the original training data to learn the new trick. Result: 2.4x faster generation, keeping ~99% of the original quality.

@NVIDIAAI โ€ข Wed Jul 01 19:00

We took a 30B model and split it in two to write tokens in parallel instead of one at a time. Introducing Nemotron-Labs-TwoTower: a diffusion language model from NVIDIA Research adapted from Nemotron-3-Nano-30B-A3B. Hereโ€™s how it works: one half holds the context, the other writ

S
SungjinAhn_
@SungjinAhn_
๐Ÿ“…
Jul 01, 2026
3d ago
๐Ÿ†”14237320

๐Ÿš€ We introduce Neural Theorizer (NEO) โ€” a new type of world model that learns to theorize the world from observation, without language or LLM supervision. Selected as an ICML 2026 oral presentation โ€” 0.7% of submitted papers. The paper asks: "What does it mean to understand the world and build a world model?" Todayโ€™s world models are often trained to predict the future: the next frame, next latent state, or next observation. But is prediction enough? We argue that a world model should be a theory-building system: one that discovers reusable primitives, composes them into executable explanations, and transfers those explanations to novel phenomena. NEO is our first step toward this vision โ€” a World Theory Model that learns explicit, compositional theories from raw observation. This work was led by my wonderful students: Doojin Baek*(@doojin_a_baek), Gyubin Lee* (@gyubin0521), Junyeob Baek (@JunyeobB), and Hosung Lee (@HosungLee_). For more details, take a look at the paper โ€” and if youโ€™re attending ICML, letโ€™s talk there! ๐Ÿ“„ arXiv: https://t.co/TGMXLLfzP7 ๐ŸŒ Project page: https://t.co/aLJywp8rfq

Media 2
๐Ÿ–ผ๏ธ Media
H
HuggingPapers
@HuggingPapers
๐Ÿ“…
Jun 30, 2026
4d ago
๐Ÿ†”89339131

Microsoft just released a new GUI agent on Hugging Face Sico-Evolution jumps from 39.8% to 82.9% Task Success Rate Outperforming GPT-5.4, Claude Opus 4.6, and Claude Opus 4.7 All from a 4B parameter model https://t.co/UNSCLF8VPT

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”HamelHusain retweeted
E
elie
@eliebakouch
๐Ÿ“…
Jun 23, 2026
11d ago
๐Ÿ†”01697382

every infra piece you need to know to do RL on GLM-5 https://t.co/pvevY6zYUD https://t.co/rhky5OvmMk

Media 1
โค๏ธ336
likes
๐Ÿ”31
retweets
๐Ÿ–ผ๏ธ Media
E
eliebakouch
@eliebakouch
๐Ÿ“…
Jun 23, 2026
11d ago
๐Ÿ†”01697382

every infra piece you need to know to do RL on GLM-5 https://t.co/pvevY6zYUD https://t.co/rhky5OvmMk

@PrimeIntellect โ€ข Tue Jun 23 02:15

Today we're releasing prime-rl v0.6.0 โ€” enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-5-minute step time. https://t.co/Vg8LhLs

Media 1Media 2
๐Ÿ–ผ๏ธ Media
S
SergioPaniego
@SergioPaniego
๐Ÿ“…
Jun 29, 2026
5d ago
๐Ÿ†”47342920

one command and you have a private vllm server on HF infra point a coding agent straight at your own model, then spin it down when you're done blog (by @QGallouedec) belowโคต๏ธ https://t.co/F9i10NSOSG

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Jun 29, 2026
4d ago
๐Ÿ†”10826006

Semantic search alone doesn't cut it. Neither does brute-force grep. Agents need both. Today we're shipping the Retrieval Harness in LlamaParse Index: semantic search, server-side grep, and file-level navigation working together in a single agent reasoning loop. ๐Ÿฆ™๐ŸŒค๏ธ Grep a file, list what's in an index, read past a chunk boundary, run hybrid search with reranking โ€” all as native agent tools. Now in beta across all paid tiers. Full breakdown in the blog ๐Ÿ‘‡ Learn More: https://t.co/q86Lu6tdOI

๐Ÿ–ผ๏ธ Media
H
HamelHusain
@HamelHusain
๐Ÿ“…
Jul 01, 2026
3d ago
๐Ÿ†”83022614

Next up in the series is @GoAbiAryan on LLM inference optimization with a hands on exercise! Tomorrow 11am PT Sign up here: https://t.co/W07DVVBCLt recordings also sent to everyone who registers Abi is a legend when it comes to inference, highly recommend this one https://t.co/pGIoXk6LJF

@HamelHusain โ€ข Tue Jun 23 17:01

Tomorrow @sh_reya and I kick off this free AI product engineering mini-course. Topics covered over 12 talks: 1. Design/UX & Evals 2. Retrieval 3. When & how to use open models effectively With these legends: @TheZachMueller @bclavie @xeophon @GoAbiAryan @barrowjoseph @willccbb

Media 1
๐Ÿ–ผ๏ธ Media
V
vllm_project
@vllm_project
๐Ÿ“…
Jun 18, 2026
16d ago
๐Ÿ†”49885492
โญ0.40

Huge milestone from the @anyscalecompute + @googlecloud GKE teams ๐ŸŽŠ Ray Serve LLM provides up to 4.4x higher throughput on prefill-heavy workloads and 24x on decode-heavy workloads than previous versions. Three optimizations made this possible on the Ray Serve LLM + vLLM stack: โญ๏ธDirect streaming with a control-plane-only endpoint picker โญ๏ธ A new vLLM Ray V2 executor backend โญ๏ธHAProxy ingress for routing at the speed of C Ray's primitives for fault tolerance, observability, and portability across K8s and VMs are a great foundation as inference deployments get more complex. Congrats to the team! Try the new Ray V2 executor today in vLLM with --distributed-executor-backend ray.

@seiji_________ โ€ข Thu Jun 18 16:00

Today we are excited to announce, in partnership with the GKE team at Google Cloud (@googlecloud), a major milestone in Ray Serve LLMโ€™s production serving capability. Ray Serve LLM now matches high performance, rust-based routing frameworks such as vllm-router (@vllm_project) in

F
fchollet
@fchollet
๐Ÿ“…
Jun 23, 2026
10d ago
๐Ÿ†”55028265
โญ0.38

With agentic coding, complexity compounds in a mechanical way: unnecessary code ends up in the codebase, moves to the context window, degrades the model's reasoning abilities, leads to more unnecessary code (often to fix issues arising from the unnecessary code). It's exponential

G
giffmana
@giffmana
๐Ÿ“…
Feb 20, 2025
499d ago
๐Ÿ†”42036468

o3-mini-high figured out the issue with @SakanaAILabs CUDA kernels in 11s. It being 150x faster is a bug, the reality is 3x slower. I literally copy-pasted their CUDA code into o3-mini-high and asked "what's wrong with this cuda code". That's it! Proof: https://t.co/whmF5fvHVr Fig1: o3-mini's answer. Fig2: Their orig code is wrong in subtle way. The fact they run benchmarking TWICE with wildly different results should make them stop and think. Fig3: o3-mini's fix. Code is now correct. Benchmarking results are consistent. 3x slower.

Media 1Media 2
+2 more
๐Ÿ–ผ๏ธ Media
B
ben_burtenshaw
@ben_burtenshaw
๐Ÿ“…
Jul 02, 2026
2d ago
๐Ÿ†”75706032

the wildest part of this intelligence per watt paper (71.3% of chat queries could be local) is that the model is only a gpt-oss 20b. which is about a year old! compared to the current batch of small moe models (gemma 4, liquid LFM, Qwen-3.6, etc.) this is nothing. https://t.co/d4Oem5d35t

Media 1
๐Ÿ–ผ๏ธ Media
G
gerardsans
@gerardsans
๐Ÿ“…
Jun 21, 2026
13d ago
๐Ÿ†”02002516
โญ0.44

Original paper: https://t.co/oka6G5cnMB Refutations already in the literature: Gong et al. (2026) showed Patchscopes are unreliable: injected states overridden by model priors (faithfulness drops sharply). The โ€œlayer 6 beliefโ€ is just a partial vector sum the rest of the pass overwrites. https://t.co/Z2Dxtvgmnt The architecture is unchanged. The interpretive frame drifted. Time to review the math. Apply null hypotheses thoroughly. Leave anthropomorphic narratives behind for good.

H
HuggingPapers
@HuggingPapers
๐Ÿ“…
Jul 03, 2026
1d ago
๐Ÿ†”61346092

DART: one-shot VLA adaptation under environmental shifts Seoul National University researchers show that weight space arithmetic can isolate domain shifts from task knowledge, letting you adapt a robot policy to new cameras or embodiments with a single demonstration. https://t.co/oxsgrUH2eO

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”AravSrinivas retweeted
H
Bread๐Ÿž
@himself65
๐Ÿ“…
Jun 20, 2026
13d ago
๐Ÿ†”90570743
โญ0.34

GLM is pretty solid โ€” it gets about an 80% pass rate on our internal financial benchmark. By contrast, DeepSeek v4, Kimi and MiniMax are below 5%. (Considered Opus 4.8 as baseline & judger)

โค๏ธ72
likes
๐Ÿ”5
retweets
T
Teknium
@Teknium
๐Ÿ“…
Jun 23, 2026
11d ago
๐Ÿ†”69627944
โญ0.36

@bradmillscan @kstellana @NousResearch No, we do not believe in model routing. Anything that breaks the cache means you're paying 20x more.

๐Ÿ”ai_fast_track retweeted
A
Qwen
@Alibaba_Qwen
๐Ÿ“…
Jun 24, 2026
10d ago
๐Ÿ†”42719867
โญ0.34

๐Ÿ“ฃ๐Ÿ“ฃ Meet Qwen-AgentWorld โ€” a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation. ๐Ÿค” LLMs are trained to be better agents โ€” better at acting in environments. But nobody has trained them to model the environments themselves. ๐Ÿ—บ๏ธ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes: 1๏ธโƒฃ Build a foundation model for environment simulation โ€” outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench 2๏ธโƒฃ Investigate how world modeling enhances agent training: ๐Ÿ”ฌ Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments ๐Ÿง  Learning to predict environments (LWM warm-up) makes agents stronger โ€” remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning ๐Ÿ“‘ Paper: https://t.co/Jx2l5RKq71 ๐Ÿ“– Blog: https://t.co/7tVcKyhsx2 ๐Ÿ’ป GitHub: https://t.co/B5Lvb1UZCn ๐Ÿค— HuggingFace: https://t.co/Kw3QBL1TM5 ๐Ÿงฉ ModelScope: https://t.co/YBnGYgMWWI

โค๏ธ4,705
likes
๐Ÿ”783
retweets
H
hardmaru
@hardmaru
๐Ÿ“…
Jun 23, 2026
11d ago
๐Ÿ†”93144318

Sakana Fugu Technical Report https://t.co/6e6WuA8FVB Release Notes: https://t.co/7xWGpOicFN https://t.co/g2yaZvex35

Media 1
๐Ÿ–ผ๏ธ Media
N
NVIDIAAI
@NVIDIAAI
๐Ÿ“…
Jun 24, 2026
10d ago
๐Ÿ†”25418828

The rise of MoE models introduced new challenges in training, and @huggingface's Transformers v5 brought first-class support for solving them. Now, NeMo AutoModel builds on top of v5. Part of the NeMo framework for building models at scale, NeMo AutoModel brings optimizations to a broad set of model families through support for Expert Parallelism, DeepEP, and TransformerEngine kernels with a few lines of code. We found NeMo AutoModel brings a 3.4 to 3.7x higher training throughput for popular MoE models. You can read more here: https://t.co/TNlBsKWwrJ

Media 1Media 2
๐Ÿ–ผ๏ธ Media
๐Ÿ”jeremyphoward retweeted
D
Dmytro Dzhulgakov
@dzhulgakov
๐Ÿ“…
Jun 25, 2026
9d ago
๐Ÿ†”38384918

you may have heard that glm-5.2 at 280 token/s is cool, how about 318 and we still have room to go https://t.co/4g0dI6CEzd

Media 1
โค๏ธ662
likes
๐Ÿ”29
retweets
๐Ÿ–ผ๏ธ Media