Your curated collection of saved posts and media

Showing 10 posts Β· last 14 days Β· by score
βž• Add New Post
R
RedHat_AI
@RedHat_AI
πŸ“…
Apr 10, 2026
13d ago
πŸ†”97110649

Speculative decoding for Gemma 4 31B (EAGLE-3) A 2B draft model predicts tokens ahead; the 31B verifier validates them. Same output, faster inference. Early release. vLLM main branch support is in progress (PR #39450). Reasoning support coming soon. https://t.co/PoK8zbA7li

Media 1
πŸ–ΌοΈ Media
_
_akhaliq
@_akhaliq
πŸ“…
Apr 10, 2026
13d ago
πŸ†”64670416

Rethinking Generalization in Reasoning SFT A Conditional Analysis on Optimization, Data, and Model Capability paper: https://t.co/AFqLfOfK3R https://t.co/j7gHnlDofv

Media 1
πŸ–ΌοΈ Media
D
dair_ai
@dair_ai
πŸ“…
Apr 18, 2026
5d ago
πŸ†”60801113

NEW paper from Apple. Interesting idea: "Attention to Mamba". The paper introduces a two-stage recipe for cross-architecture distillation from Transformers into Mamba. Naive distillation collapses teacher performance. Their trick: first distill the transformer into a linearized-attention student using a kernel adaptation, then transfer that student into a pure Mamba with no attention blocks. On a 1B model trained on 10B tokens, the Mamba student hits 14.11 perplexity against a 13.86 Pythia-1B teacher, nearly matching quality at linear-time inference cost. If you can reliably convert trained transformers into state-space models without retraining from scratch, the entire open-weights ecosystem becomes cheaper to serve at long context. This is the kind of quiet infrastructure work that decides which architectures actually get deployed in agent stacks. Paper: https://t.co/h7k7OrG8Qj Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1
πŸ–ΌοΈ Media
S
simonw
@simonw
πŸ“…
Apr 19, 2026
5d ago
πŸ†”48022690
⭐0.40

Since Anthropic publish their system prompts we can generate a diff between Claude Opus 4.6 and 4.7 - here are my notes on what's changed https://t.co/IQHuvLGmwO

A
asparagui
@asparagui
πŸ“…
Apr 15, 2026
8d ago
πŸ†”44099639

@jeremyphoward Check this out! I used Lean4 to emit MLIR by way of StableHLO/IREE to train image recognition networks, with proofs for the backprop operations! https://t.co/HqYG6KflSO

Media 1
πŸ–ΌοΈ Media
πŸ”jeremyphoward retweeted
H
Hao AI Lab
@haoailab
πŸ“…
Apr 09, 2026
14d ago
πŸ†”08351116
⭐0.38

(1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: https://t.co/NxVSXKWEgI Code: https://t.co/6irFgQ7GeM Checkpoints: https://t.co/GsrzbJlRY8

❀️91
likes
πŸ”19
retweets
M
mishig25
@mishig25
πŸ“…
Apr 10, 2026
13d ago
πŸ†”57865692

Ran autoresearch on hf to see whether anything can beat MuonAdamW baseline Biggest takeaway: NS orthogonalization is a very strong attractor that absorbs most gradient modifications you throw at it. See all the artifacts at https://t.co/S5DY7MezUp https://t.co/XyIEMeZ4Ft

Media 1Media 2
πŸ–ΌοΈ Media
πŸ”huggingface retweeted
O
Eric ⚑️ Building...
@outsource_
πŸ“…
Apr 09, 2026
14d ago
πŸ†”28537121
⭐0.32

πŸš€ NEW GEMMA 4 31B TURBO DROPPED Runs on a SINGLE RTX 5090: ⚑️18.5 GB VRAM only (68% smaller) 🧠51 tok/s single decode πŸ’»1,244 tok/s batched πŸ€–15,359 tok/s prefill ← yes, fifteen thousand 🚨2.5Γ— faster than base model with basically zero quality loss. It hits Sonnet-4.5 level on hard classification tasks… at 1/600th the cost. Local models are shipping faster than we can test πŸ‘‡πŸ» πŸ”₯ HF: https://t.co/XUvVZBj9AX

❀️2,519
likes
πŸ”203
retweets
J
johnowhitaker
@johnowhitaker
πŸ“…
Apr 11, 2026
13d ago
πŸ†”10052378

@replicate Code if you'd like to replicate this without starting from scratch: https://t.co/AzqBsiOHk1 Please tag me so I can see your pretty results if you try this on any different taxa :)

Media 1
πŸ–ΌοΈ Media
πŸ”Sanemavcil retweeted
O
OpenClaw🦞
@openclaw
πŸ“…
Apr 11, 2026
12d ago
πŸ†”58742012
⭐0.34

OpenClaw 2026.4.10 🦞 🧠 Active Memory plugin πŸŽ™οΈ local MLX Talk mode πŸ€– Codex app-server harness plugin 🧾 Teams pins/reactions/read actions πŸ›‘οΈ SSRF hardening + launchd fixes stability, but with attitude🦞 https://t.co/PW7WDumTf1

❀️1,943
likes
πŸ”203
retweets