Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
A
arankomatsuzaki
@arankomatsuzaki
πŸ“…
Apr 10, 2026
13d ago
πŸ†”98191322

proj: https://t.co/fCl3QBpnoE abs: https://t.co/i50YCzUKax repo: https://t.co/ml7dOTqYy1

Media 1
πŸ–ΌοΈ Media
G
gerardsans
@gerardsans
πŸ“…
Apr 10, 2026
13d ago
πŸ†”82203756
⭐0.40

@plainionist The industry has misled the public with terms like β€œIn-Context Learning”, β€œSkills” or β€œReasoning”. It’s trivial to prove no real learning is happening: the model’s weights are never changed. Remove the context, and the alleged β€œlearning” disappears instantly. Puff.

P
PrismML
@PrismML
πŸ“…
Apr 16, 2026
7d ago
πŸ†”82896134

Today we’re announcing Ternary Bonsai: Top intelligence at 1.58 bits Using ternary weights {-1, 0, +1}, we built a family of models that are 9x smaller than their 16-bit counterparts while outperforming most models in their respective parameter classes on standard benchmarks. We’re open-sourcing the models under the Apache 2.0 license in three sizes: 8B (1.75 GB), 4B (0.86 GB), and 1.7B (0.37 GB).

Media 1
πŸ–ΌοΈ Media
A
arankomatsuzaki
@arankomatsuzaki
πŸ“…
Apr 10, 2026
13d ago
πŸ†”86539809

Squeeze Evolve: A Unified Framework for Verifier-Free Evolution Across AIME 2025, GPQA-Diamond, ARC-AGI-V2, MMMU-Pro, etc: - Up to ~3x API cost reduction - Up to ~10x increase in fixed-budget serving throughput https://t.co/MoTgGNAYns

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Apr 16, 2026
7d ago
πŸ†”46327406

Most serving stacks run FLUX.2 as four separate stages with Python overhead between each one. We collapsed all four into a single fused execution graph using MLIR-based compilation. On @AMD MI355X, that means a 3.8x speedup over torch.compile, 1024x1024 images in under 3.5 seconds, and a deployment container under 700MB. We ran the same pipeline on Blackwell, too. AMD delivers equivalent generation quality at a 5.5x lower cost. @clattner_llvm is presenting the full breakdown at AMD AI DevDay. Register: https://t.co/Pa1e36BTZn

Media 1
πŸ–ΌοΈ Media
H
hayden_prairie
@hayden_prairie
πŸ“…
Apr 15, 2026
8d ago
πŸ†”13537927

We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! πŸ§΅πŸ‘‡

Media 1
πŸ–ΌοΈ Media
_
_akhaliq
@_akhaliq
πŸ“…
Apr 10, 2026
13d ago
πŸ†”70916384

DMax Aggressive Parallel Decoding for dLLMs paper: https://t.co/y421NkegRD https://t.co/Y7Ut9Gxly8

πŸ–ΌοΈ Media
πŸ”_akhaliq retweeted
R
Red Hat AI
@RedHat_AI
πŸ“…
Apr 17, 2026
6d ago
πŸ†”02520952
⭐0.36

Qwen3.6-35B-A3B just dropped. Red Hat AI has an NVFP4 quantized checkpoint ready. 35B params, 3B active, quantized with LLM Compressor. Preliminary GSM8K Platinum: 100.69% recovery (slightly above baseline). Early release. Let us know what you think! https://t.co/i5Fc4P7NVN

❀️5
likes
πŸ”3
retweets
R
rasbt
@rasbt
πŸ“…
Apr 11, 2026
12d ago
πŸ†”58256302

@DanielWulikk Have to think a bit about how to best visualize it, but if you are interested, I have a working from-scratch code implementation of Gemma 4 E2B in the meantime to see how per-layer embeddings are implemented: https://t.co/jyiq1vyJnH https://t.co/fVrSBWHNHl

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Apr 14, 2026
9d ago
πŸ†”66941696
⭐0.44

We sat down with Kyle Caverly, an AI Performance Engineer on the MAX serve team, to walk through what actually happens inside an inference server from prompt to response. All the code discussed is open source. https://t.co/iwSQ4QA5F1

I
iScienceLuvr
@iScienceLuvr
πŸ“…
Apr 10, 2026
13d ago
πŸ†”90441301

abs: https://t.co/tSLRNXAgY3 code: https://t.co/ldWVGmuC4O blog post: https://t.co/yI0NmdaKHm

Media 1Media 2
πŸ–ΌοΈ Media
πŸ”tri_dao retweeted
R
Dan Fu
@realDanFu
πŸ“…
Apr 15, 2026
8d ago
πŸ†”49941304
⭐0.32

πŸ“’ Super excited to announce Parcae! We've been thinking about scaling laws and the "right" way to get more FLOPs. Turns out layer looping - with the right parameterization - gives you a new axis to scale! Parcae matches Transformers 2x their size (w/ the same data), and outperforms prior formulations of looped models. But - you need the right parameterization to get these gains against strong Transformer baselines. Looped models are famously unstable to train, with tons of loss spikes and hyperparameter sensitivity. The main technical challenge with looped models is residual explosion - if you're passing the activations through the same layers over and over, some otherwise benign parameterizations cause huge instability. Our key idea: we can think of the residual stream of a model as a time-varying dynamical system - the same fundamentals behind SSMs like Mamba and S4. Then a few modest modifications to classic Transformers (stable diagonalization of injection params, LN before embeddings) can stabilize the looped models. The resulting models are more stable to train, but also reach higher quality. It's strong enough to start to derive new scaling laws. Classically - we know you need to scale parameters with data to be FLOP-optimal. With Parcae, we find a third axis - given fixed parameters, you additionally want to scale FLOPs by looping as you scale data. Super excited to see how these ideas hold, and what we can do with looped models! Check out @hayden_prairie's great explainer thread below, and see links for our paper, blog, and models. Joint w/ @zacknovack and @BergKirkpatrick, and a fun collab between @togethercompute and my lab at @ucsd_cse. Enjoy!

❀️100
likes
πŸ”21
retweets
G
gerardsans
@gerardsans
πŸ“…
Apr 10, 2026
13d ago
πŸ†”14916027
⭐0.42

@trq212 This framing degrades technical literacy. Your prompt isn't "communication", it's tokenized, embedded as vectors, processed through frozen weights. No "agent" receives it. No bandwidth grows. Anthropic's own leaked system prompt: 16,739 words of context steering. That's engineering, not dialogue. Research shows anthropomorphic AI discourse creates self-fulfilling alignment degradation. You're not "talking to agents." You're structuring conditional probability queries. High-leverage? Yes. Interpersonal? Never. If the goal is public understanding, not marketing, stop treating inference as a β€œteam member”. That's technically inaccurate. Which makes it dishonest the moment someone with enough expertise verifies it.

J
johnowhitaker
@johnowhitaker
πŸ“…
Apr 11, 2026
12d ago
πŸ†”09752285
⭐0.38

Also, how cool that this is so easy now? This was a few careful asks to Codex, which worked for ~128k tokens/1h to do everything - sourcing the data, embedding with clip (via @replicate), making an exploratory search tool for refining + filtering, and whipping up the final app.

V
victormustar
@victormustar
πŸ“…
Apr 17, 2026
6d ago
πŸ†”46958899

Sharing my current setup to run Qwen3.6 locally in a good agentic setup (Pi + llama.cpp). Should give you a good overview of how good local agents are today: # Start llama.cpp server: llama-server \ -hf unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_XL \ --jinja \ --chat-template-kwargs '{"preserve_thinking":true}' \ --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0 # Configure Pi: { "providers": { "llama-cpp": { "baseUrl": "http://127.0.0.1:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_XL" } ] } } }

@Alibaba_Qwen β€’ Thu Apr 16 13:23

⚑ Meet Qwen3.6-35B-A3B:Now Open-SourceοΌπŸš€πŸš€ A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. πŸ”₯ Agentic coding on par with models 10x its active size πŸ“· Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Pow

Media 1
πŸ–ΌοΈ Media
A
aryagm01
@aryagm01
πŸ“…
Apr 12, 2026
10d ago
πŸ†”21521117

dflash-mlx: DFlash speculative decoding, ported to Apple Silicon. Qwen3-4B at 186 tok/s on a MacBook. 4.6Γ— faster than plain MLX-LM. Exact greedy decoding: output matches plain target decoding. https://t.co/VxfyworgAe

πŸ–ΌοΈ Media
E
e_volkmann
@e_volkmann
πŸ“…
Apr 08, 2026
15d ago
πŸ†”30881771

Introducing gyaradax πŸ‰: A JAX solver for local flux-tube gyrokinetics with custom CUDA kernels for acceleration. This entire code was vibecoded by @ggalletti_ and me in a month. Validated against GKW (CPU-only Fortran code) with 10x speedups. Details and code in the replies. https://t.co/22PrHjItR5

πŸ–ΌοΈ Media
πŸ”Scobleizer retweeted
M
Mihir Prabhudesai
@mihirp98
πŸ“…
Apr 16, 2026
7d ago
πŸ†”50250400
⭐0.36

What if AI learned physics the way Newton did – by experiencing it? We built Sim2Reason: train LLMs inside virtual worlds governed by real physics laws, zero human annotation. Result: +5–10% improvement on International Physics Olympiad, zero-shot. 🧡 https://t.co/euXSyXmZVY

❀️190
likes
πŸ”29
retweets
S
SergioPaniego
@SergioPaniego
πŸ“…
Apr 15, 2026
8d ago
πŸ†”87167544

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy and… it's already supported in TRL, built by @krasul. you can really feel the pace of development in the team 🐎 paper by @onloglogn, @richard_baihe, @UnderGroundJeg, Navdeep Jaitly, @trebolloc, @YizheZhangNLP at Apple 🍎 how it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed you can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder): https://t.co/zizfISD6bq or benchmark a checkpoint with the eval script: https://t.co/mKlafTyKSe one neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train Γ— T_eval, so a broad band of configs works well. even very noisy samples still help want to dig deeper? paper: https://t.co/aj1ZAcr8Mw trainer docs: https://t.co/TNVz93kZi9

Media 1Media 2
+3 more
πŸ–ΌοΈ Media
S
saprmarks
@saprmarks
πŸ“…
Apr 16, 2026
7d ago
πŸ†”33220879

Cool to see that Meta conducted and published a pre-deployment investigation of Muse Spark behaviors like reward hacking, honesty, and evaluation awareness! https://t.co/i1Yy7HsEup

@summeryue0 β€’ Tue Apr 14 22:55

πŸš€ Muse Spark Safety & Preparedness Report for Meta AI is out. We start with our pre-deployment assessment under Meta's Advanced AI Scaling Framework, covering chemical and biological, cybersecurity, and loss of control risks. Our assessment flagged potentially elevated chem/bio

Media 1
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Apr 16, 2026
6d ago
πŸ†”46363016

We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results πŸ§‘β€πŸ”¬: - Opus 4.7 is a general improvement over Opus 4.6. It has gotten much better at charts compared to the previous iteration - Opus 4.7 is quite good at tables, though not quite as good as Gemini 3 flash - Opus 4.7 wins on content faithfulness across all techniques (including ours) - Using Opus 4.7 as an OCR solution is expensive at ~7c per page!! For comparison, our agentic mode is 1.25c and cost-effective is ~0.4c by default. Take a look at these results and more on ParseBench! https://t.co/tYiSOMbd6p

@llama_index β€’ Thu Apr 16 21:11

Anthropic says Opus 4.7 hits 80.6% on Document Reasoning β€” up from 57.1%. But "reasoning about documents" β‰  "parsing documents for agents." We ran it on ParseBench. β†’ Charts: 13.5% β†’ 55.8% (+42.3) β€” huge β†’ Formatting: 64.2% β†’ 69.4% (+5.2) β†’ Content: 89.7% β†’ 90.3% (+0.6) β†’ T

Media 1
πŸ–ΌοΈ Media
T
thsottiaux
@thsottiaux
πŸ“…
Apr 16, 2026
7d ago
πŸ†”73879269

Codex just got a lot more powerful. Computer use, in-app browser, image generation and editing, 90+ new plugins to connect to everything, multi-terminal, SSH into devboxes, thread automations, rich document editing. Learns from experience and proactively suggestions work. And a ton more.

Media 1
πŸ–ΌοΈ Media
J
jonoringer
@jonoringer
πŸ“…
Apr 17, 2026
6d ago
πŸ†”26642084
⭐0.36

hermes @NousResearch agent with qwen3.5:35b-a3b on a 4090 is VERY good.. local models very impressive..

A
allen_explains
@allen_explains
πŸ“…
Apr 16, 2026
7d ago
πŸ†”49319172
⭐0.44

This 2-hour Stanford lecture breaks down how models like ChatGPT and Claude are actually built, clearer than what many people in top AI roles ever get exposed to. Save this and set aside two hours today. It might end up being the most valuable thing you learn all week. https://t.co/5u97uZCWxd

πŸ”huggingface retweeted
X
Xenova
@xenovacom
πŸ“…
Apr 16, 2026
7d ago
πŸ†”74874519
⭐0.36

Ternary Bonsai: state-of-the-art intelligence at 1.58 bits. The models are so small they can even run locally in your browser on WebGPU! ⚑️ Here's the 8B version (just ~2GB in size) running at 60 tokens per second on my M4 Max. Try the demo out yourself! πŸ‘‡

❀️38
likes
πŸ”7
retweets
πŸ”random_walker retweeted
D
Zi
@dongyangzi
πŸ“…
Apr 16, 2026
7d ago
πŸ†”91699085
⭐0.34

>We gave an AI agent an Apple Developer account, a Mac VM, and one task: build and publish an iOS app. It succeeded, at a cost of about $1,000. Great research on what it takes for an agent to do real world work. Some interesting areas of improvements:

❀️1
likes
πŸ”1
retweets
I
ivanleomk
@ivanleomk
πŸ“…
Apr 15, 2026
7d ago
πŸ†”83185445
⭐0.30

Go forth and hack friends - Apache 2.0 license :)

@osanseviero β€’ Wed Apr 15 20:57

Introducing TIPS v2 πŸ‘€Foundational text-image encoder πŸ“ΈCan be used as the base for different multimodal applications πŸ€—Apache 2.0 πŸ§‘β€πŸ³New pre-training recipes https://t.co/A6H93YJhNx

T
togethercompute
@togethercompute
πŸ“…
Apr 12, 2026
11d ago
πŸ†”78396546

MiniMax M2.7 is now on Together AI. Trained by letting it run its own RL loop, resulting in the highest open-source score on MLE Bench Lite. https://t.co/3E5mh9EDdz

Media 1
πŸ–ΌοΈ Media
A
akshat_b
@akshat_b
πŸ“…
Apr 15, 2026
8d ago
πŸ†”11880169

To show off what you can do with @OpenAI Agent SDK + @modal, we built an ML research agent (inspired by @karpathy). It can: - Spin up GPU sandboxes of any shape - Run a pool of subagents - Persist memory - Snapshot state for fork/resume Here it is playing Parameter Golf: https://t.co/r7QhvNmdEq

@modal β€’ Wed Apr 15 17:35

Agents need computers. And they need a lot of them. Modal is an official sandbox provider for the @OpenAI Agents SDK. https://t.co/Lu4cesspYq

Media 1
πŸ–ΌοΈ Media
M
mercor_ai
@mercor_ai
πŸ“…
Apr 17, 2026
5d ago
πŸ†”85510894

We ran @AnthropicAI Claude Opus 4.7 (High) on APEX-SWE, our benchmark for real-world software engineering work. It scores 41.3% pass@1, placing 2nd on the leaderboard. It is only 0.2% away from GPT 5.3 Codex (High). https://t.co/4U6pKtrvI0

Media 1
πŸ–ΌοΈ Media
R
random_walker
@random_walker
πŸ“…
Apr 16, 2026
7d ago
πŸ†”97428933
⭐0.36

More details in Sayash's thread here: https://t.co/xcNemmsEs1

@sayashk β€’ Thu Apr 16 17:49

Benchmarks are saturated more quickly than ever. How should frontier AI evaluations evolve? In a new paper, we argue that the AI community is already converging on an answer: Open-world evaluations. They are long, messy, real-world tasks that would be impractical for benchmarks.

H
htihle
@htihle
πŸ“…
Apr 10, 2026
13d ago
πŸ†”72323192

Gemma 4 31b scores 52.3% and is the strongest open model on on WeirdML, ahead of GLM 5 and gpt-oss-120b. This score is comparable to o3 and gemini 2.5 pro, and well ahead of qwen 3.5 27b at 39.5%. Gemma 4 is also significantly cheaper than other models with the same score. I ran this locally through ollama, with a 4-bit quant (q4_K_M). Full precision might score even better. The costs assume $0.14/$0.40/M.

@htihle β€’ Fri Jun 27 14:21

WeirdML v2 is now out! The update includes a bunch of new tasks (now 19 tasks total, up from 6), and results from all the latest models. We now also track api costs and other metadata which give more insight into the different models. The new results are shown in these two figur

Media 1Media 2
πŸ–ΌοΈ Media
Page 1 of 35Next β†’