Your curated collection of saved posts and media

Showing 10 posts ยท last 14 days ยท by score
โž• Add New Post
S
s_batzoglou
@s_batzoglou
๐Ÿ“…
Jul 03, 2026
21h ago
๐Ÿ†”05156064

OK, Fable 5 is VERY strong in my first small benchmark test. I tested the following models on a reasoning task, induction. (Details in my manuscript on arXiv appearing in ICML.) 50 challenge problems, to keep the task manageable in terms of costs. Fable 5 blows the competition. Caveat: it has a high rate of empty responses. At thinking effort high, it returns almost all empty (and bills max tokens). At medium, it returns more than half empty. So I did two rounds on medium, and then one on low effort and reached 45/50 responses. (The whole task cost $188 for 50 problems.) Regarding the GPT models: interestingly, GPT-5.5 is pathological in not returning answers. I ran two rounds of it on xhigh and two rounds on high. The completion rates respectively are 9/50 and 17/50, and the correct answers are extremely low, much worse performance than GPT-5.4 and GPT-5.2. So I won't be running any more experiments with GPT-5.5 on this task. (It is strong on other tasks.) Another note, on Grok models: the original, and now unavailable Grok 4, is very strong. Again with low completion rate. I ran about 3-4 rounds to get 25/50. Grok 4.3 is much weaker in comparison (even weaker than Grok 4.1 fast) but returns answers more often. Other notably strong performers are Gemini 3.5 Flash (way better than Gemini 3.1 Pro) and DeepSeek v4 Pro. But no model matches Fable 5. Great job, @anthropic!

Media 1
๐Ÿ–ผ๏ธ Media
H
HuggingPapers
@HuggingPapers
๐Ÿ“…
Jun 22, 2026
12d ago
๐Ÿ†”49977030

Ai2 just released TMax 27B on Hugging Face A 27B terminal agent that hits 42.7% on Terminal Bench 2.0, rivaling models 40ร— its size. https://t.co/LfCksOXL9L

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
N
NVIDIA AI
@NVIDIAAI
๐Ÿ“…
Jul 01, 2026
3d ago
๐Ÿ†”01480067
โญ0.38

We took a 30B model and split it in two to write tokens in parallel instead of one at a time. Introducing Nemotron-Labs-TwoTower: a diffusion language model from NVIDIA Research adapted from Nemotron-3-Nano-30B-A3B. Hereโ€™s how it works: one half holds the context, the other writes the tokens, with both reusing the pretrained model instead of training a new one from scratch. We found it kept 98.7% of the original modelโ€™s quality at 2.42ร— faster generation.

โค๏ธ2,550
likes
๐Ÿ”294
retweets
G
googlegemma
@googlegemma
๐Ÿ“…
Jul 01, 2026
2d ago
๐Ÿ†”88974274

โ€œAgentic kernel optimization is the future of on-device inferenceโ€ @xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!! https://t.co/xPuh5OLGEt

๐Ÿ–ผ๏ธ Media
N
neilmovva
@neilmovva
๐Ÿ“…
Jun 25, 2026
9d ago
๐Ÿ†”13148747

Samir Menon @blintzbase and I are thrilled to announce Sail @sailresearchco ! We build infrastructure for long-horizon agents: inference served at unbeatable prices-per-token for open models, plus sandboxes designed to run for days, weeks, or longer. We've raised $80M, w/ our seed led by @Sequoia and series A led by @KleinerPerkins. We're using this capital to build the most efficient infrastructure for long-horizon agents. What makes agents so different? Unlike a human waiting at a keyboard (top priority: speed), agents need scale, reliability, and sustainable cost. Sail finds this efficiency everywhere in the stack: we carefully choose our chips, write custom inference engines, and run a global controller that fully utilizes every computer in our fleet. Tight integration from silicon to API lets Sail open up the cost / latency frontier to our customers - the most patient agents can now access 10x more intelligence per dollar. We're excited to be working with great companies like @parallelweb, @detaildotdev,@Jackandjillai, and @quadrillion_ai to deploy long-horizon agents with trillions of tokens. Our team is thoughtful in our engineering craft and relentlessly ambitious in our pursuit of peak performance. We previously trained at companies like NVIDIA, OpenAI, Google, and so many trading firms. Now we're ready to do the work that will define our careers, in the most compute intensive market of all time. Welcome to the era of abundant intelligence. We can't wait to build with you!

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”johnrobinsn retweeted
M
Mahesh Venkitachalam
@mkvenkit
๐Ÿ“…
Jun 27, 2026
7d ago
๐Ÿ†”39741373
โญ0.32

Googleโ€™s Tensor Processing Unit (TPU) uses the systolic array architecture - an idea from 1978 - to accelerate matrix multiplication with far less memory movement. Fun to build a small scale version on an FPGA. Links to original paper and TPU design: https://t.co/cEznMoForH

โค๏ธ713
likes
๐Ÿ”60
retweets
S
SergioPaniego
@SergioPaniego
๐Ÿ“…
Jun 29, 2026
5d ago
๐Ÿ†”47342920

one command and you have a private vllm server on HF infra point a coding agent straight at your own model, then spin it down when you're done blog (by @QGallouedec) belowโคต๏ธ https://t.co/F9i10NSOSG

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Jun 29, 2026
5d ago
๐Ÿ†”10826006

Semantic search alone doesn't cut it. Neither does brute-force grep. Agents need both. Today we're shipping the Retrieval Harness in LlamaParse Index: semantic search, server-side grep, and file-level navigation working together in a single agent reasoning loop. ๐Ÿฆ™๐ŸŒค๏ธ Grep a file, list what's in an index, read past a chunk boundary, run hybrid search with reranking โ€” all as native agent tools. Now in beta across all paid tiers. Full breakdown in the blog ๐Ÿ‘‡ Learn More: https://t.co/q86Lu6tdOI

๐Ÿ–ผ๏ธ Media
๐Ÿ”_akhaliq retweeted
R
Reka
@RekaAILabs
๐Ÿ“…
Jun 30, 2026
4d ago
๐Ÿ†”33038475
โญ0.34

๐ŸŽฎ๐Ÿ•น๏ธ๐Ÿ–ฅ๏ธ CS2-10k is now available on @huggingface ๐Ÿš€ 600,000+ egocentric gameplay videos. 10,000+ hours. Every frame paired with the exact keyboard, mouse, and 3D position data that produced it. If you're working on world models, action-conditioned video generation, or egocentric navigation, this is ready to download and use today.

โค๏ธ48
likes
๐Ÿ”17
retweets
๐Ÿ”PyTorch retweeted
V
vLLM
@vllm_project
๐Ÿ“…
Jun 18, 2026
16d ago
๐Ÿ†”49885492
โญ0.34

Huge milestone from the @anyscalecompute + @googlecloud GKE teams ๐ŸŽŠ Ray Serve LLM provides up to 4.4x higher throughput on prefill-heavy workloads and 24x on decode-heavy workloads than previous versions. Three optimizations made this possible on the Ray Serve LLM + vLLM stack: โญ๏ธDirect streaming with a control-plane-only endpoint picker โญ๏ธ A new vLLM Ray V2 executor backend โญ๏ธHAProxy ingress for routing at the speed of C Ray's primitives for fault tolerance, observability, and portability across K8s and VMs are a great foundation as inference deployments get more complex. Congrats to the team! Try the new Ray V2 executor today in vLLM with --distributed-executor-backend ray.

โค๏ธ109
likes
๐Ÿ”25
retweets