Your curated collection of saved posts and media

Showing 9 posts · last 14 days · by score
➕ Add New Post
N
nichochar
@nichochar
📅
Apr 13, 2026
11d ago
🆔68130344
0.34

If you want to build a self-improving harness, the first step is instrumentation. There are tools now that help you do this as "drop-in" plugins into claude code, very cool!

@wandb • Fri Apr 10 21:11

Building with Claude Code? You need to see what's happening each turn. The new @weave_wb plugin traces every session automatically. Tool calls, subagents, inputs, outputs. All structured so you can debug faster. No code changes. Just install and go. https://t.co/i7IoktC7RC

M
MiniMax_AI
@MiniMax_AI
📅
Apr 12, 2026
12d ago
🆔97659000

We're delighted to announce that MiniMax M2.7 is now officially open source. With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%). You can find it on Hugging Face now. Enjoy!🤗 huggingface:https://t.co/ApWrahIl3o Blog: https://t.co/gAxeFsNdW4 MiniMax API: https://t.co/1dgbMx0Q7K

Media 1Media 2
+2 more
🖼️ Media
M
marimo_io
@marimo_io
📅
Apr 14, 2026
10d ago
🆔35253654
0.38

Stop babysitting your agent. marimo-pair gives coding agents a live view of your notebook. Variables, errors, UI sliders — if you can interact with it, so can the agent. https://t.co/ruVka0EanC

K
k_taka
@k_taka
📅
Apr 14, 2026
10d ago
🆔06756937
0.32

Codexについて @seratch_ja さんに先週インタビューする機会があったので、その話をベースに、Codexの最近の状況をまとめてみました。基本のところからハーネスエンジニアリングのさわりまで入っています。また、直近で事例が増えた感じの「Codex Use Cases」の紹介も後半のコラムで触れておきました。

@gihyojp • Tue Apr 14 00:02

『週間アクティブユーザー300万人にのぼるCodex、OpenAI Japanの瀬良氏に聞く「開発スタイル」の変化』by @k_taka 公開 https://t.co/dbOThSVKl0

A
ArthurMacwaters
@ArthurMacwaters
📅
Apr 18, 2026
5d ago
🆔47399449

> grok4.20-beta1 is a much smaller model than opus but is #1 ranked in medicine and healthcare > 4.3 and 4.4 will be much larger models, and likely will have a significant boost in performance on complex medical cases > this is massively important in providing accurate diagnostic guidance and advice to both providers and patients

@elonmusk • Sat Apr 18 19:49

@techdevnotes Supplemental training has been added to 4.3. Grok 4.4 will be twice the size (1T) with training data through early April. Probably ready for release in early May. Grok 4.5 will be 1.5T and hopefully out by late May.

Media 1
🖼️ Media
Z
zhijianliu_
@zhijianliu_
📅
Apr 15, 2026
9d ago
🆔80751751

🔥 DFlash x MLX is happening! Shoutout to @aryagm01 for the early work on this. We're building on the momentum. Native MLX support, more models (Qwen3.5), up to 4x faster. Lossless! 👉 https://t.co/wKcRoiaWZ3

Media 2
🖼️ Media
_
__Rhodium__
@__Rhodium__
📅
Apr 14, 2026
10d ago
🆔34386998

Won best edge AI at the @ycombinator and @innate_bot hackathon! We built a local VLM multi-rover orchestrator for Mars exploration. On-device navigation and automated fault detection & recovery across odometry, stereo vision, and lidar. Thanks for hosting, @ax_pey! https://t.co/GNkSNAMxRN

Media 2
+1 more
🖼️ Media
X
XiaoxuanMa_
@XiaoxuanMa_
📅
Apr 14, 2026
10d ago
🆔10435854

What if virtual humans could see, think, and act in 3D worlds like us?! We present Visually-Grounded Humanoid Agents 🎉 Our agents 👀perceive via RGB-D vision, 🧠plan with context-aware reasoning, and 🏃act with full-body motion in 3D scenes. Check 🔗 https://t.co/kd0zCu7W2h https://t.co/VQZnjE2Pnx

🖼️ Media
H
HelloSurgeAI
@HelloSurgeAI
📅
Apr 14, 2026
10d ago
🆔18177208
0.46

📄 Introducing GDP.pdf: an expert multimodal reasoning benchmark for the documents that run the world. 📄 We've spent years measuring AI against the extraordinary: proving theorems, solving AGI. But the global economy doesn't run on the extraordinary. It runs on paperwork. More precisely: unsexy, poorly scanned, densely formatted PDFs. Contracts, invoices, medical records, blueprints – the documents that actually run the world. GDP.pdf tests frontier models on their ability to handle real-world documents across ten professional industries: 🏗️ Construction: Can a model measure load-bearing walls on a blueprint? ⚖️ Law: Can it parse liability caps in a commercial lease? 💵 Finance: Can it Calculate margin profiles in a buy-side memo? The reality: every frontier model scored under 15%. GDP.pdf asks a critical question: If a $100B model can’t accurately reason about a drug interaction table in a PDF, is it actually ready for the enterprise? Right now, the answer is no. Check out the blog post and leaderboard below. 👇 Blog: https://t.co/0Wj97DBYTC Leaderboard: https://t.co/9CMY6JVPtj

← PreviousPage 14 of 106Next →