Your curated collection of saved posts and media

Showing 10 posts Β· last 14 days Β· by score
βž• Add New Post
πŸ”jxnlco retweeted
A
Akshat Bubna
@akshat_b
πŸ“…
Apr 15, 2026
8d ago
πŸ†”11880169
⭐0.38

To show off what you can do with @OpenAI Agent SDK + @modal, we built an ML research agent (inspired by @karpathy). It can: - Spin up GPU sandboxes of any shape - Run a pool of subagents - Persist memory - Snapshot state for fork/resume Here it is playing Parameter Golf: https://t.co/r7QhvNmdEq

❀️14
likes
πŸ”2
retweets
S
SergioPaniego
@SergioPaniego
πŸ“…
Apr 15, 2026
8d ago
πŸ†”87167544

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy and… it's already supported in TRL, built by @krasul. you can really feel the pace of development in the team 🐎 paper by @onloglogn, @richard_baihe, @UnderGroundJeg, Navdeep Jaitly, @trebolloc, @YizheZhangNLP at Apple 🍎 how it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed you can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder): https://t.co/zizfISD6bq or benchmark a checkpoint with the eval script: https://t.co/mKlafTyKSe one neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train Γ— T_eval, so a broad band of configs works well. even very noisy samples still help want to dig deeper? paper: https://t.co/aj1ZAcr8Mw trainer docs: https://t.co/TNVz93kZi9

Media 1Media 2
+3 more
πŸ–ΌοΈ Media
J
johnowhitaker
@johnowhitaker
πŸ“…
Apr 11, 2026
13d ago
πŸ†”25757486

Mini-project this pm inspired by all the spring blooms: Embedding ~5k tulip images from @inaturalist TIL: tulips in the wild are mostly red and yellow :) And that this is a lovely way to explore a group! https://t.co/nW6tr1T1ry

πŸ–ΌοΈ Media
S
simonw
@simonw
πŸ“…
Apr 19, 2026
5d ago
πŸ†”48022690
⭐0.40

Since Anthropic publish their system prompts we can generate a diff between Claude Opus 4.6 and 4.7 - here are my notes on what's changed https://t.co/IQHuvLGmwO

S
saprmarks
@saprmarks
πŸ“…
Apr 16, 2026
7d ago
πŸ†”33220879

Cool to see that Meta conducted and published a pre-deployment investigation of Muse Spark behaviors like reward hacking, honesty, and evaluation awareness! https://t.co/i1Yy7HsEup

@summeryue0 β€’ Tue Apr 14 22:55

πŸš€ Muse Spark Safety & Preparedness Report for Meta AI is out. We start with our pre-deployment assessment under Meta's Advanced AI Scaling Framework, covering chemical and biological, cybersecurity, and loss of control risks. Our assessment flagged potentially elevated chem/bio

Media 1
πŸ–ΌοΈ Media
S
simonw
@simonw
πŸ“…
Apr 16, 2026
7d ago
πŸ†”85306701

Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right https://t.co/kDlbnJv6YI

Media 1Media 2
πŸ–ΌοΈ Media
O
osanseviero
@osanseviero
πŸ“…
Apr 15, 2026
8d ago
πŸ†”47164735

Introducing TIPS v2 πŸ‘€Foundational text-image encoder πŸ“ΈCan be used as the base for different multimodal applications πŸ€—Apache 2.0 πŸ§‘β€πŸ³New pre-training recipes https://t.co/A6H93YJhNx

Media 1Media 2
πŸ–ΌοΈ Media
W
winglian
@winglian
πŸ“…
Apr 17, 2026
6d ago
πŸ†”16104719
⭐0.38

@reza_byt Doesn't tokenwise looped transformer have issues with pretraining since each token has a different depth and also has to learn the recursion depth?

R
RedHat_AI
@RedHat_AI
πŸ“…
Apr 10, 2026
13d ago
πŸ†”97110649

Speculative decoding for Gemma 4 31B (EAGLE-3) A 2B draft model predicts tokens ahead; the 31B verifier validates them. Same output, faster inference. Early release. vLLM main branch support is in progress (PR #39450). Reasoning support coming soon. https://t.co/PoK8zbA7li

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Apr 16, 2026
7d ago
πŸ†”52647859

Anthropic says Opus 4.7 hits 80.6% on Document Reasoning β€” up from 57.1%. But "reasoning about documents" β‰  "parsing documents for agents." We ran it on ParseBench. β†’ Charts: 13.5% β†’ 55.8% (+42.3) β€” huge β†’ Formatting: 64.2% β†’ 69.4% (+5.2) β†’ Content: 89.7% β†’ 90.3% (+0.6) β†’ Tables: 86.5% β†’ 87.2% (+0.7) β†’ Layout: 16.5% β†’ 14.0% (-2.5) β€” regressed Real chart gains, but at ~1.5Β’/page. Enterprise scale? Not yet. LlamaParse Agentic: 84.9% overall. ~1.2Β’/page. The frontier for general document understanding is long. No single model solves it. β†’ https://t.co/h7SpuTWYVn

Media 1
πŸ–ΌοΈ Media